Creating changelog for Docker image

We’ve made a cool thing a while ago – a pipeline that builds base Docker images for our software projects. It runs every night, checks if image software was changed, and if it did so, increments version number and pushes the new image back to the registry. Not a rocket science, but still really neat. Whenever I look at the registry, I know that I’m looking at the latest and greatest. Since we have to use those images all the time, that’s a nice reoccurring uncertainty to remove.

However, when we do produce the new base image, developers sometimes ask me: “So yeah, what exactly changed between versions 1.5.7 and 1.5.8?”. Honestly, I have no idea. Maybe it was something in underlying Ubuntu image. Or some FIPS packages that we are putting on top of it. I can check pipeline logs, but, that’s probably won’t tell the full story.

So, indeed, what’s exactly changed?

Software Bill of Materials

It took me about 4 hours of developing the diff tool before I came up with the brilliant idea of checking with Google. Well, what do you know. There are already tools for that! Some are more useful than others, but unlike the beer in Soviet Union, there’s something to choose from.

1. container-diff

container-diff seems to be the oldest tool in the collection. It’s been abandoned by Google a while ago, but the volunteers seem to keep it alive and kicking. It’s cross-platform, supports both analyzing a single and comparing two images, and can extract data about apt, rpm, node or pip packages. Neat.

For instance, if I run container-diff analyze --type apt ubuntu:bionic on my machine, it’ll spit out the list of Ubuntu package names, baked into the latest ubuntu:bionic image:

Sweet.

Unlike analyze, diff subcommand requires two images. It accepts the same arguments as analyze does (e.g. one or more --type), but this time produces a delta of installed packages.

For example, if I’d want to see how ubuntu:bionic is different from ubuntu:focal (I know it – one is LTS, and the other is not! (No, you naive bionic beaver, they both are LTS!)), I’d see this beauty:

Result! That would definitely satisfy changelog requirement. I’d run the tool from the pipeline before publishing the new image to the repo, and then I could send a message to Slack with the link to the newly created image and the changelog.

But I wonder if I can make it better using other tools.

2. dive

Apparently, not with this one. dive is a beautiful interactive CLI tool to explore the layers of Docker image. It even has CI mode, that switches the tool into non-interactive report.

“dive” into Bionic internals

But that’s way too low level and way too Docker centric for my task. I probably could use it sometimes to navigate through the image file system, but not for generating a changelog. Hard pass.

3. syft

syft is very interesting. It’s yet another CLI tool for generating software bill of materials, for both container images and regular file systems. It understands both OS and software packages, and speaking about the latter, it supports way more types of software projects, than container-diff.

Out of curiosity, I compared outputs of container-diff and syft, and the results are pretty much the same.

container-diff vs. syft

syft sometimes includes a bit more of version info, but the package list is identical.

Unlike container-info, syft doesn’t know how to compare two images, but to be honest, I never planned to use that feature of container-info to begin with. You see, as a programmer, I’m used to looking at unified diff files. Like ones created by git show. container-info diff, on the other hand, uses a custom comparison table, which I’m not really interested in looking at. Just give me the feeling (in colours) of the magnitude of change, and I’m happy. I can drill down later.

So instead of relying on a tool making a comparison for me, I’ll do the comparison myself:

changelog for Docker images via unified diff

And since it’s the pipeline that’s going to run these commands, I can include both ‘before’ and ‘after’ bill of materials files into pipeline artifacts, and throw in a diff on top. The interested party would be just one click away from seeing the changelog through very familiar tools and formats.

Or one could vimdiff the hell out of it – the results look even prettier:

changelog via vimdiff

4. Bonus round: grype

As a cherry on top of the cake (and mainly out of my boredom), we can use syft‘s evil twin-brother tool – grype, which pulls known CVEs for a given image. Granted, scan results are very specific to the time when it was run, but if we include ‘before’ and ‘after’ scans, along with the diff file, that might give a good impression of how the security posture changed when we progressed from image version A to image version B:

Unified diff is not really effective on complex formatting

And here we immediately see that diff sucks when comparing the text files with uneven indentation. There’s workaround, though. We can store the output in CSV, at least for the sake of diffing. Just store grype-csv.tmpl template file nearby and put the following content in it:

Now, rerun grype commands again, this time using the template:

Diffing CSV files is more readable

That’s much better. Green font shows where to look at, lines begin with library name and end with “Medium”, and I don’t need to know anything else.

Fun fact – if you look at those changelogs closely, you’ll notice that the older 20220829 version uses newer packages and has fewer vulnerabilities than the newer 20220902. September Ubuntu release was actually a downgrade. The explanation is quite simple. If you were around Microsoft Azure services in early September (probably near other providers too), you must remember that mass DNS outage – half of our teams were restarting Ubuntu virtual machines and Kubernetes clusters, trying to resolve corrupted DNS cache. Apparently, that was caused by newly released systemd, which, after a few days of intense troubleshooting, Canonical decided to push back. You can clearly see that from the changelog.

Conclusion

That’s about it. I ended up using the pair of syft and grype in the way identical to what I described in points 3 and 4. Eventually we’ll have to build Software Bill of Materials for other projects, not just images, so sticking to these tools seems logical. Whenever the pipeline produces the new image, it’ll accompany it with before/after/diff files for packages and vulnerabilities. I probably could make it better by posting the summary of these changes to consumer’s Slack channel, but that’s a Day 2 project.

For now, I feel like I’ve accomplished something and earned the right to go back to hibernation for the rest of the year.

So, until the next time.

Leave a Reply

Your email address will not be published. Required fields are marked *