First impressions and learnings on the new BuildKit's supply chain security features
If you have come across recently the terms "SBOM" and "SLSA Provenance attestation" and you don't know what those are, or what their purpose is, welcome - you're not alone!
The recent release of BuildKit v0.11.0 has introduced new features that will help you secure the Supply Chain of container images. In this post, I want to help you understand the very basics of these concepts and how to get started with some practical examples. Also, I want to share my journey alongside the lessons I learned.
Create a BuildKit builder
First of all, you need a BuildKit builder instance whose version is >=v0.11.0
to be able to generate SBOM and SLSA Provenance attestations. Older versions of BuildKit won't support the generation of these artifacts.
You can create one as follows:
The --driver=docker-container
flag allows the creation of a managed and customizable BuildKit environment as a Docker container. Also, the container is booted after creation by using the --bootstrap
flag. It supports cache persistence, as it stores all the BuildKit state and related cache into a dedicated Docker volume.
Software Bill of Materials (SBOMs)
An SBOM is a comprehensive list of all the components, libraries, and dependencies that make up a software product. It includes information such as version numbers and licenses of each component.
Generating an SBOM for a Docker image
Generating an SBOM as part of building your Docker image is pretty straightforward by using the --sbom=true
flag.
docker buildx build --sbom=true -t felipecruz/buildkit-ssc-features:sbom .
...
=> [linux/amd64] generating sbom using docker.io/docker/buildkit-syft-scanner:stable-1 0.7s
...
Visualizing the SBOM
Now that the SBOM is generated, the next step is to see the content. Initially, I thought that by inspecting the local image I could find more information about the SBOM generated.
However, if you run docker image inspect felipecruz/buildkit-ssc-features:sbom
you won't find the attestation as part of the image configuration.
👨🏫 The first lesson learned
Attestations are stored as manifest objects in the image index, similar in style to OCI artifacts. This requires you to push the image to a registry, or export it to a local directory as we will see later.
Note that the in-toto attestation contains a "https://spdx.dev/Document" predicate, signifying that it is defining an SBOM for the image.
We can use docker buildx imagetools inspect
to see the manifest structure or https://explore.ggcr.dev/?image=felipecruz/buildkit-ssc-features:sbom to explore the contents interactively:
Because felipecruz/buildkit-ssc-features:sbom
is a multi-platform image that targets linux/amd64
and linux/arm64
, the output of inspecting the image in the registry will contain one SBOM per platform:
docker buildx imagetools inspect felipecruz/buildkit-ssc-features:sbom --format '{{ json .SBOM }}'
{
"linux/amd64": {
"SPDX": {
"SPDXID": "SPDXRef-DOCUMENT",
...
},
"linux/arm64": {
"SPDX": {
"SPDXID": "SPDXRef-DOCUMENT",
...
}
}
Therefore, to check the SBOM of a particular platform like linux/amd64
, you can use the following --format
expression:
docker buildx imagetools inspect felipecruz/buildkit-ssc-features:sbom --format '{{ json (index .SBOM "linux/amd64") }}'
{
"SPDX": {
"SPDXID": "SPDXRef-DOCUMENT",
"creationInfo": {
...
Alternatively, if you want to see the SBOM attestation without having to push the image to a registry, you can output the contents of the image to a local directory with the -o
flag:
docker buildx build --sbom=true -o ./image .
cat ./image/sbom.spdx.json
{
"_type": "https://in-toto.io/Statement/v0.1",
"predicateType": "https://spdx.dev/Document",
"subject": [
{
"name": "bin/busybox",
"digest": {
"sha256": "36d96947f81bee3a5e1d436a333a52209f051bb3556028352d4273a748e2d136"
}
},
...
}
Find vulnerabilities in the SBOM
Finding vulnerabilities is out of the scope of BuildKit, so you need to use an external tool such as grype. This OSS tool allows us to find vulnerabilities in Docker images, SBOMs, etc.
In the beginning, I was confused as to why grype
would fail to detect vulnerabilities in the SBOM generated by BuildKit.
cat ./image/sbom.spdx.json | grype -vv
...
[0000] DEBUG format syft-6-json returned err: could not extract syft schema form-lib=syft
[0000] DEBUG format cyclonedx-1-xml returned err: EOF form-lib=syft
[0000] DEBUG format cyclonedx-1-json returned err: not a valid CycloneDX document form-lib=syft
[0000] DEBUG format spdx-2-tag-value returned err: unable to decode spdx-tag-value: no colon found in '{' form-lib=syft
...
No vulnerabilities found
The errors above point out that grype
cannot parse the SBOM generated by BuildKit. Investigating the source code of the SBOM generator that uses BuildKit, I came across that the JSON-encoded SPDX document - the actual SBOM - is saved in the predicate
field.
👨🏫 The second lesson learned
My assumption was to believe the sbom.spdx.json
was an actual SBOM that follows a JSON-encoded SPDX format.
However, the sbom.spdx.json
generated by BuildKit is in fact an in-toto attestation. The predicate
property of the attestation contains a JSON-encoded SPDX document (the SBOM) whereas the subject
contains whatever software artifacts are to be associated with this SPDX document.
Finally, passing just the predicate
part of the attestation to grype
works as expected:
Multi-stage images
When using BuildKit to generate SBOMs, I’d have expected that the SBOM will take into account all the dependencies for all the intermediate stages that depend on the final stage.
👨🏫 The third lesson learned
Apparently, that's not the default case because it would be computationally more expensive and would take a longer time to produce the SBOM. You need to provide the following env. var to enable such behavior: BUILDKIT_SBOM_SCAN_STAGE=true
For instance, you may have a build stage that uses curl
to download a binary and a final stage where you copy that binary. It wouldn't be accurate to track the curl
dependency as part of the final stage when it's not used in that final stage.
SLSA Provenance Attestation
In the beginning, we saw that an SBOM is like a list of all the "ingredients" used in a recipe, along with the details like the quantities, versions, and sources of each ingredient.
On the other hand, SLSAs are like a certificate of authenticity for each ingredient. The software vendor provides this certificate to confirm that the ingredients used in the recipe are legitimate, unmodified, and comply with their associated licenses.
Generating an SLSA provenance attestation
The provenance attestation created by BuildKit describes how the build was created. Similarly to the SBOM, the SLSA provenance is attached to the image index, wrapped inside an in-toto attestation whose predicate contains the actual SLSA provenance:
To generate an SLSA provenance attestation pass the --provenance=true
flag when building your image:
docker buildx build --sbom=true --provenance=true -t felipecruz/buildkit-ssc-features .
Visualizing the SLSA provenance
Given the image is multi-platform, to visualize the content you need to target the platform-specific provenance. You can use buildx imagetools inspect
or interactively from explore.ggcr.dev:
docker buildx imagetools inspect felipecruz/buildkit-ssc-features --format '{{ json (index .Provenance "linux/amd64") }}'
{
"SLSA": {
"buildType": "https://mobyproject.org/buildkit@v1",
"builder": {
"id": ""
},
"invocation": {
"configSource": {
"entryPoint": "Dockerfile"
},
...
}
By inspecting the content we can see some interesting information that has been generated automatically, such as:
- Build timestamps: when the build was started and finished.
- Invocation info: how the build was invoked, which in my case was using the
dockerfile.v0
frontend,linux/amd64
as the environment platform andDockerfile
as the entry point. - The build materials: such as the Docker images used as part of the build, and the Git URLs of the repositories containing source code for the image, among others.
That information is just the minimum you get by default, however, using the --mode=max
generates significantly more information apart from the one I just mentioned above, such as descriptions of all build steps, with their source and layer mappings.
Reproducibility
BuildKit now supports reproducible builds by setting SOURCE_DATE_EPOCH
build argument or source-date-epoch
exporter attribute. This deterministic date will be used in image metadata instead of the current time (i.e. in the image config and layers).
SOURCE_DATE_EPOCH=0 docker buildx build ...
👨🏫The fourth lesson learned
I would have thought that, by running the same build twice without having modified any source files, I would get the same provenance attestation output and the reproducible
field would be set to true
automatically.
However, the reproducible
field in the attestation file was always set to false
. Unfortunately, even with SOURCE_DATE_EPOCH
set, BuildKit still can't automatically determine if the build is reproducible or not - e.g. race conditions in the build, using timing information, randomness from /dev/random
, etc.
After reading this, it seems the reproducible
is an input value that I can provide at build time based on whether I consider that the build is reproducible or not: --provenance=reproducible=true
.
Conclusion
Having SBOM and SLSA provenance attestations generated as part of your Docker build process is now very easy and convenient. I'm impressed of seeing how straightforward is to generate those by passing flags to the docker buildx build
command.
The important discovery for me was that BuildKit won't output the SBOMs and SLAs as some other tools could do, but they are wrapped inside in-toto attestations and, attached as manifests to the image root index.
I'm excited to see how BuildKit will continue evolving in securing the Software Supply Chain and supporting us (developers) in our efforts to provide more security-related information when distributing container images.