The mystery of package downgrade issue

In last six or so weeks Microsoft managed to release whole bunch of .NET Core 2.1 SDKs (Preview 2, Release Candidate 1, Early Access, RTM) and we tried all of them. By the end of these weeks my cluster of CI servers looked like a zoo. As everything was done in a hurry, there were servers with RC1 pretending to be Early Access ones. EA servers pretended to be RTM compatible, and the only RTM host we had was pretending to support everything. Don’t look at me funny. It happens.

The problem happened when I tried to cleanup the mess: removed P2, RC1 and EA SDK tags from release branches, deleted prerelease servers, forced remaining servers to tell exactly who they are and finally rolled out new VMs with latest and greatest .NET Core SDK 2.1 installed. Naturally, very first build failed.

The issue

Compilation error said that Detected package downgrade: Microsoft.NETCore.App from 2.1.1 to 2.1.0. In fact, it wasn’t even a compilation – the build failed during package restore phase.

Detected package downgrade

There was also a chance that it’s one off issue caused by some mysterious race condition, so I could retry the build job and in case of success pretend the error never happened (can’t deal with it now). But nope, I retried it twice and the build failed twice as well. It looks like I have to use that brainy thing again.

Troubleshooting

Poking around

It’s quite interesting that project builds locally just fine. It’s the same Ubuntu 16.04, the same code, the same SDK.. or is it? Quick dotnet --version on both hosts shows that local SDK’s version is 2.1.300, whereas one on the build server is 2.1.301. So Microsoft released a patch few days ago? Interesting. After newer version finds its way to my workstation, project no longer builds on it as well. Well, that’s a good sign.

I checked project files, but they looked pretty much as usual and nothing would suggest the cause of the conflict. What’s interesting, it took me some time to notice that dotnet build command actually works. It’s only dotnet publish -r ubuntu-x64 that doesn’t. What’s even more interesting, if I skipped the runtime parameter (-r), even publish worked. Not sure how it helps me now, but who knows.

Getting the logs

Having zero ideas about where to look for more hints, I had no other option but enable diagnostics output in build/publish commands and try to find out at what point they start to behave differently.

If you never used -v diag parameters in MSBuild or dotnet build  commands you probably should know that it produces a lot of output. No, like this – A LOT. For our ~90 projects solution it emits tens and tens of megabytes of unstructured text output. But if there’s something to find, it should be there.

So here it is: dotnet build -c Debug -v diag > works.txt for working build, dotnet publish -c Debug -v diag -r ubuntu-x64 > fails.txt for broken one, and vim + :diffthis to bring them side by side.

Build output diff

That’s… a lot of colours. And lines. Because publish build failed right during NuGet packages restore, it’s about 20 times smaller. That’s also a good thing – I can remove everything after Done executing task 'RestoreTask', which separates restore phase from the rest of the build, and significantly reduce the amount of text to deal with.

Chasing the differences

The error message was saying something about package downgrade and version 2.1.1 in it. Let’s look for it then.

vimdiff-2.1.1

About a dozen of matches later I do find a place where RuntimeFrameworkVersion property becomes different: 2.1.1 in faulty build vs 2.1.0 in successful one. As a side note, 2.1.0 is the version of runtime shipped by default with .NET Core SDK 2.1.300. The latest SDK at the moment – 2.1.301 comes with patched runtime – 2.1.1. It’s very easy to check:

The error message stated that Microsoft.NETCore.App 2.1.1  – the runtime – was conflicting with its 2.1.0 counterpart, so it really looks like something in our solution caused one part of it to target the latest runtime, and the other one – stick with base. OK, but where does RuntimeFrameworkVersion gets its value? Nowhere. In these particular log files it looks like its value comes from outer space and never gets explicitly assigned.

OK, another try. All these build properties are coming from .props and .targets files which are the part of .NET SDK. What if I search for the property assignment among them?

That’s interesting. If TargetLatestRuntimePatch property is set to true, then RuntimeFrameworkVersion will use LatestNetCorePatchVersion, which I believe is happening in our case. Here, it’s even in build logs:

TargetLatestRuntimePatch value

LatestNetCorePatchVersion value

OK, I think I see the picture here. The last question is when TargetLatestRuntimePatch becomes true?

Again, there’s nothing in logs, but in SDK itself I was able to find this:

Thinking part

It makes total sense now. When we compile using dotnet buildSelfContained becomes false and so does TargetLatestRuntimePatch, leaving RuntimeFrameworkVersion with its default version of 2.1.0. However, it all changes for dotnet publish. SelfContained is true, TargetLatestRuntimePatch is also true and therefore for .NET Core SDK 2.1.301 RuntimeFrameworkVersion becomes 2.1.1. For some reason at least one of our test projects still requires 2.1.0, thus causing the conflict. We didn’t have the issue with SDK 2.1.300, as that was the first one to come out, so Base and Latest runtime versions were the same.

So what’s next? How do I fix that? Well, there’re actually three choices. The true one and two  temporary remedies.

  1. Find the package causing runtime version downgrade and fix it.
  2. When SelfContained, explicitly set TargetLatestRuntimePatch in problematic projects to true, thus eliminating the conflict.
  3. Explicitly set TargetLatestRuntimePatch to false for ‘main’ project, so we always use the base version.

Eventually we decided to come up with forth solution: ignore SelfContained flag at all and always require latest runtime patch. After all, why would we want to stick with an old one?

Conclusion

Even though I’m not a fan of digging through dotnet and MSBuild internals, there’s some guilty pleasure in exercises like this. Long time ago I was dealing a lot with XSLT, which being an XML also was perfectly valid functional language with functions, recursion, patterns matching, etc. And it’s really something to see a program written in functional XML. Bizarre, but something. CSPROJ files along with .props and .targets files from MSBuild are also XML based and also carry a logic with them – assignments, conditionals, code imports and some form of functions with parameters. While a little bit archaic in nowadays, it’s still kind of cute. Ah, good old medieval days…

Leave a Reply

Your email address will not be published. Required fields are marked *