We’ve made a cool thing a while ago – a pipeline that builds base Docker images for our software projects. It runs every night, checks if image software was changed, and if it did so, increments version number and pushes the new image back to the registry. Not a rocket science, but still really neat. Whenever I look at the registry, I know that I’m looking at the latest and greatest. Since we have to use those images all the time, that’s a nice reoccurring uncertainty to remove.
However, when we do produce the new base image, developers sometimes ask me: “So yeah, what exactly changed between versions 1.5.7 and 1.5.8?”. Honestly, I have no idea. Maybe it was something in underlying Ubuntu image. Or some FIPS packages that we are putting on top of it. I can check pipeline logs, but, that’s probably won’t tell the full story.
So, indeed, what’s exactly changed?
Software Bill of Materials
It took me about 4 hours of developing the diff tool before I came up with the brilliant idea of checking with Google. Well, what do you know. There are already tools for that! Some are more useful than others, but unlike the beer in Soviet Union, there’s something to choose from.
1. container-diff
container-diff
seems to be the oldest tool in the collection. It’s been abandoned by Google a while ago, but the volunteers seem to keep it alive and kicking. It’s cross-platform, supports both analyzing a single and comparing two images, and can extract data about apt, rpm, node or pip packages. Neat.
For instance, if I run container-diff analyze --type apt ubuntu:bionic
on my machine, it’ll spit out the list of Ubuntu package names, baked into the latest ubuntu:bionic
image:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
container-diff analyze --type apt ubuntu:bionic # # -----Apt----- # # Packages found in ubuntu:bionic: # NAME VERSION SIZE # -adduser 3.116ubuntu1 624K # -apt 1.6.14 3.8M # -base-files 10.1ubuntu2.11 387K # -base-passwd 3.5.44 228K # -bash 4.4.18-2ubuntu1.3 1.6M # -bsdutils 1:2.31.1-0.4ubuntu3.7 264K # -bzip2 1.0.6-8.1ubuntu0.2 177K |
Sweet.
Unlike analyze
, diff
subcommand requires two images. It accepts the same arguments as analyze
does (e.g. one or more --type
), but this time produces a delta of installed packages.
For example, if I’d want to see how ubuntu:bionic
is different from ubuntu:focal
(I know it – one is LTS, and the other is not! (No, you naive bionic beaver, they both are LTS!)), I’d see this beauty:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
container-diff diff --type apt ubuntu:bionic ubuntu:focal # # -----Apt----- # # Packages found only in ubuntu:bionic: # NAME VERSION SIZE # -gcc-8-base 8.4.0-1ubuntu1~18.04 117K # -libapt-pkg5.0 1.6.14 3.1M # -libffi6 3.2.1-8 52K # ... # # Packages found only in ubuntu:focal: # NAME VERSION SIZE # -gcc-10-base 10.3.0-1ubuntu1~20.04 265K # -libapt-pkg6.0 2.0.9 3.2M # -libcrypt1 1:4.4.10-10ubuntu4 226K # -libffi7 3.3-4 65K # ... # # Version differences: # PACKAGE IMAGE1 (ubuntu:bionic) IMAGE2 (ubuntu:focal) # -adduser 3.116ubuntu1, 624K 3.118ubuntu2, 624K # -apt 1.6.14, 3.8M 2.0.9, 4.1M # -base-files 10.1ubuntu2.11, 387K 11ubuntu5.6, 392K # -base-passwd 3.5.44, 228K 3.5.47, 233K # -bash 4.4.18-2ubuntu1.3, 1.6M 5.0-6ubuntu1.2, 1.6M # ... |
Result! That would definitely satisfy changelog requirement. I’d run the tool from the pipeline before publishing the new image to the repo, and then I could send a message to Slack with the link to the newly created image and the changelog.
But I wonder if I can make it better using other tools.
2. dive
Apparently, not with this one. dive
is a beautiful interactive CLI tool to explore the layers of Docker image. It even has CI mode, that switches the tool into non-interactive report.
1 |
dive ubuntu:bionic |
But that’s way too low level and way too Docker centric for my task. I probably could use it sometimes to navigate through the image file system, but not for generating a changelog. Hard pass.
3. syft
syft
is very interesting. It’s yet another CLI tool for generating software bill of materials, for both container images and regular file systems. It understands both OS and software packages, and speaking about the latter, it supports way more types of software projects, than container-diff
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
syft packages ubuntu:bionic # ✔ Loaded image # ✔ Parsed image # ✔ Cataloged packages [89 packages] # # NAME VERSION TYPE # adduser 3.116ubuntu1 deb # apt 1.6.14 deb # base-files 10.1ubuntu2.11 deb # base-passwd 3.5.44 deb # bash 4.4.18-2ubuntu1.3 deb # bsdutils 1:2.31.1-0.4ubuntu3.7 deb # bzip2 1.0.6-8.1ubuntu0.2 deb # coreutils 8.28-1ubuntu1 deb # ... |
Out of curiosity, I compared outputs of container-diff
and syft
, and the results are pretty much the same.
syft
sometimes includes a bit more of version info, but the package list is identical.
Unlike container-info
, syft
doesn’t know how to compare two images, but to be honest, I never planned to use that feature of container-info
to begin with. You see, as a programmer, I’m used to looking at unified diff files. Like ones created by git show
. container-info diff
, on the other hand, uses a custom comparison table, which I’m not really interested in looking at. Just give me the feeling (in colours) of the magnitude of change, and I’m happy. I can drill down later.
So instead of relying on a tool making a comparison for me, I’ll do the comparison myself:
1 2 3 4 |
syft ubuntu:bionic-20220829 > 20220829.txt syft ubuntu:bionic-20220902 > 20220902.txt diff -u 20220829.txt 20220902.txt |
And since it’s the pipeline that’s going to run these commands, I can include both ‘before’ and ‘after’ bill of materials files into pipeline artifacts, and throw in a diff on top. The interested party would be just one click away from seeing the changelog through very familiar tools and formats.
Or one could vimdiff
the hell out of it – the results look even prettier:
1 |
vimdiff 20220829.txt 20220902.txt |
4. Bonus round: grype
As a cherry on top of the cake (and mainly out of my boredom), we can use syft
‘s evil twin-brother tool – grype
, which pulls known CVEs for a given image. Granted, scan results are very specific to the time when it was run, but if we include ‘before’ and ‘after’ scans, along with the diff file, that might give a good impression of how the security posture changed when we progressed from image version A to image version B:
1 2 3 4 |
grype ubuntu:bionic-20220829 > 20220829.txt grype ubuntu:bionic-20220902 > 20220902.txt diff -u 20220829.txt 20220902.txt |
And here we immediately see that diff
sucks when comparing the text files with uneven indentation. There’s workaround, though. We can store the output in CSV, at least for the sake of diffing. Just store grype-csv.tmpl
template file nearby and put the following content in it:
1 2 3 4 |
"Package","Version Installed","Vulnerability ID","Severity" {{- range .Matches}} "{{.Artifact.Name}}","{{.Artifact.Version}}","{{.Vulnerability.ID}}","{{.Vulnerability.Severity}}" {{- end}} |
Now, rerun grype
commands again, this time using the template:
1 2 3 4 |
grype ubuntu:bionic-20220829 -o template -t grype-csv.tmpl > 20220829.csv grype ubuntu:bionic-20220902 -o template -t grype-csv.tmpl > 20220902.csv diff -u 20220829.csv 20220902.csv | vim - |
That’s much better. Green font shows where to look at, lines begin with library name and end with “Medium”, and I don’t need to know anything else.
Fun fact – if you look at those changelogs closely, you’ll notice that the older 20220829 version uses newer packages and has fewer vulnerabilities than the newer 20220902. September Ubuntu release was actually a downgrade. The explanation is quite simple. If you were around Microsoft Azure services in early September (probably near other providers too), you must remember that mass DNS outage – half of our teams were restarting Ubuntu virtual machines and Kubernetes clusters, trying to resolve corrupted DNS cache. Apparently, that was caused by newly released systemd, which, after a few days of intense troubleshooting, Canonical decided to push back. You can clearly see that from the changelog.
Conclusion
That’s about it. I ended up using the pair of syft
and grype
in the way identical to what I described in points 3 and 4. Eventually we’ll have to build Software Bill of Materials for other projects, not just images, so sticking to these tools seems logical. Whenever the pipeline produces the new image, it’ll accompany it with before/after/diff files for packages and vulnerabilities. I probably could make it better by posting the summary of these changes to consumer’s Slack channel, but that’s a Day 2 project.
For now, I feel like I’ve accomplished something and earned the right to go back to hibernation for the rest of the year.
So, until the next time.