I’m still busy learning how to troubleshoot .NET Core apps on Linux. Now I more or less understand where to start if I need to analyze .NET Core memory usage, but that’s not the most frequent problem we usually have. Most of the time the main problem is high CPU usage, which obviously leads to the question “What this thing’s doing now??”.
On Windows I’d usually start with analyzing application logs, trace files or performance reports. If that wasn’t enough I’d download Perfview tool and profiled running executable itself. How many threads are there? What functions they execute the most? What their call stacks are? And so forth. The only downside of that tool is that I never read its documentation, so whole troubleshooting experience sometimes resembled a meditation on the monitor, worshiping the holy Google and wild guessing.
While logs and traces are obviously there on Linux as well, I was wondering, can I profile the app in the similar way I would do on Windows? The answer is “Oh, yes”. Monitor, worshiping, guessing – it’s all there.
There’re multiple tools to use out there, but the basic toolkit for profiling .NET Core app on Linux seems to be perf
utility along with lttng
and perfcollect
. Let’s have a look at all of them.
The playground (profiling starts later)
We’ll need a .NET Core demo app, Linux host and some dev tools installed on it. With the help of Vagrant and VirtualBox, I can have all of that.
Firstly, the demo project, which basically does some pointless math in a loop until we stop it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
using System; using System.Linq; namespace calc { class Program { static int UpperLimit => (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalMinutes; static int Sum() => Enumerable.Range(0, UpperLimit) .Aggregate((result, current) => unchecked(result + current)); static int Mul() => Enumerable.Range(0, UpperLimit) .Aggregate((result, current) => unchecked(result * current)); static void DoSomeMath() { Sum(); Mul(); } static void Main(string[] args) { while (true) { DoSomeMath(); } } } } |
Secondly, the Vagrantfile that will create Ubuntu VM and install .NET Core, perf
, lttng
and perfcollect
on it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
Vagrant.configure("2") do |config| config.vm.box = "ubuntu/xenial64" config.vm.network "forwarded_port", guest: 80, host: 8080 config.vm.provider "virtualbox" do |vb| vb.memory = "4096" end config.vm.provision "shell", inline: <<-SHELL # python 2.7 apt-get update apt-get install -y python2.7 vim # dotnet curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > microsoft.gpg mv microsoft.gpg /etc/apt/trusted.gpg.d/microsoft.gpg sh -c 'echo "deb [arch=amd64] https://packages.microsoft.com/repos/microsoft-ubuntu-xenial-prod xenial main" > /etc/apt/sources.list.d/dotnetdev.list' apt-get update apt-get install -y dotnet-sdk-2.1.3 # perf apt-get install -y linux-tools-`uname -r` # lttng apt-add-repository -y ppa:lttng/stable-2.10 apt-get update apt-get install -y lttng-tools lttng-modules-dkms liblttng-ust-dev # perfcollect # curl -OL http://aka.ms/perfcollect SHELL end |
Now, vagrant up
will bring this beauty to existence and we can begin.
Sampling function calls with perf
It’s surprising how easy it is to get started with perf
. Don’t get me wrong, you can spend evening reading the manuals, getting into underlying concepts and even trying to understand how to interpret the results. But in the same time there’s no complex setup and we’re just few commands away from getting something meaningful in response.
The Vagrantfile already installed perf
, so here’s what we can do. Let’s start our dotnet app in background and perf
it.
1 2 3 4 5 6 7 8 9 |
dotnet build #Microsoft (R) Build Engine version 15.5.179.9764 for .NET Core #... #Build succeeded. # 0 Warning(s) # 0 Error(s) dotnet bin/Debug/netcoreapp2.0/calc.dll & #[1] 13689 |
Starting the app in background returned the process ID – 13689, which is handy, because this is exactly what we need to pass to perf
. Along with that we’ll kindly ask it to collect call stacks (-g
) and then use its record
sub command to begin sampling.
Now start perf
, keep it running for a dozen of seconds or so, and then terminate it with ctrl+C
:
1 2 3 4 |
sudo perf record -p 13689 -g #^C[ perf record: Woken up 4 times to write data ] #failed to mmap file #[ perf record: Captured and wrote 0.899 MB perf.data ] |
And now, a drumroll – viewing the results:
1 |
sudo perf report |
Well… it’s colorful, but not quite useful. The table basically shows what functions were executing while we were taking our samples, and what percentage of samples they were found in. “Children” column tells how many samples child functions were running, and “Self” tells, well, how long it was running its own code. Duh.
If I was looking for bottlenecks, I’d probably pay close attention to entries whose “Self” value is high enough and close to its “Children” count, and I can clearly see 4 of such kind. What I don’t see is their names!
Fortunately, we can tell CLR to emit additional debugging info, which perf
will then use to convert function addresses to their names. The only thing we need to do for to happen is to set COMPlus_PerfMapEnabled
environmental variable to 1
either for dotnet
executable specifically, or to the whole shell session (export COMPlus..
):
1 2 3 4 5 6 7 |
COMPlus_PerfMapEnabled=1 dotnet bin/Debug/netcoreapp2.0/calc.dll & #[1] 13723 sudo perf record -p 13723 -g #^C[ perf record: Woken up 14 times to write data ] #failed to mmap file #[ perf record: Captured and wrote 3.378 MB perf.data ] |
There’s one thing though. We used sudo
to start perf
, so most likely its perf.data
result file will be owned by root
. However, CLR debugging data was generated by dotnet
, which runs as regular user. In normal day that would made perf report
a bit paranoid, but we can force it to ignore such nuances with -f
flag:
1 |
sudo perf report -f |
As you can see, some CoreCLR symbols are still missing, but all program’s methods and function names are there! By drilling down into parent-child tree I can see that the hottest functions are Aggregate and two lambdas we passed to them: one for multiplication and one for addition.
Viewing results with FlameGraph
perf
‘s tabular result is useful, but we can make it even better by converting profiling data into the flame chart. That’s fairly easy to do: we’ll need to download FlameGraph project and use two perl scripts from it: stackcollapse-perf
and flamegraph
:
1 2 3 4 |
# Get FlameGraph git clone --depth=1 https://github.com/BrendanGregg/FlameGraph # Convert perf data to flamegraph.svg sudo perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flamegraph.svg |
As I’m running a headless Ubuntu machine, it would be tricky to view SVG file on it. However, the VM has python2.7
installed, its 80
th port is forwarded to port 8080
of host machine (see Vagrantfile), so I can serve SVG file with python web server and view the picture through the browser on host machine:
1 |
sudo python -m SimpleHTTPServer 80 |
Cool! That’s way better. The graph is fully interactive, so we can see additional details when moving the mouse over the functions, or zoom into particular call stacks.
Collecting trace events with lttng
There’s certain type of events which would be tricky to collect with perf
. For instance, garbage collections, assembly loads, thrown or caught exceptions – we probably could see some traces of them in perf
report, but most likely they would be buried under some other data, or happened between the samples and therefore missed completely.
On the other hand dotnet runtime can publish different kinds of tracing events which we could consume with the tool called lttng
. The only things we’d need for that are lttng
itself and COMPlus_EnableEventLog
environment variable set to 1
.
Let’s change our demo code a little bit so it throws and immediately silences the exception. The challenge then would be to find out whether or not we can track those exceptions down. We actually had the same problem in production once, when one of our counters told us that number of thrown exceptions increased almost ten times between releases, but because they were caught and ignored we didn’t knew where they actually are.
So, the changes in the code:
1 2 3 4 5 6 7 8 9 10 11 |
//... DoSomeMath(); try { throw new Exception("Completely useless exception"); } catch { // Good luck finding it } //... |
Build/restart:
1 2 3 4 |
dotnet build # ... COMPlus_EnableEventLog=1 dotnet bin/Debug/netcoreapp2.0/calc.dll & # [1] 13824 |
To start collecting events we need to create a new lttng
session, enable events we’re going to listen to and actually start listening. Then, when we presumably collected enough of data, we’d stop collector and destroyed lttng
session.
It sounds pretty straightforward, but where do we find out which events are available in dotnet runtime? The good news is that perfcollect script, which also can be used for tracing and profiling, lists all of them. This is just the small extract of what it actually has:
1 2 3 4 5 6 7 8 |
DotNETRuntime:ExceptionThrown_V1 DotNETRuntime:ExceptionCatchStart DotNETRuntime:ExceptionCatchStop DotNETRuntime:ExceptionFinallyStart DotNETRuntime:ExceptionFinallyStop DotNETRuntime:ExceptionFilterStart DotNETRuntime:ExceptionFilterStop DotNETRuntime:ExceptionThrownStop |
As we’re interested in exception events only, we can use the mask to subscribe to all of them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
lttng create mysession #Session mysession created. #Traces will be written in /home/ubuntu/lttng-traces/mysession-20171226-060602 lttng enable-event --userspace --tracepoint DotNETRuntime:Exception* # UST event DotNETRuntime:Exception* created in channel channel0 lttng start # Tracing started for session mysession # Few seconds later lttng stop # Waiting for data availability # Tracing stopped for session mysession lttng destroy # Session mysession destroyed |
It would be tricky to analyze lttng binary reports, if we didn’t have babeltrace
utility. I wasn’t the one who installed it, so it probably came along with lttng itself lttng create
returned the path to the trace files (line 3), so without further ado:
1 2 3 4 5 |
babeltrace ~/lttng-traces/mysession-20171226-060602/ # ... # [06:06:36.109917648] (+0.503586579) ubuntu-xenial DotNETRuntime:ExceptionThrown_V1: { cpu_id = 0 }, { ExceptionType = "System.Exception", ExceptionMessage = "Completely useless exception", ExceptionEIP = 139875671834775, ExceptionHRESULT = 2148734208, ExceptionFlags = 16, ClrInstanceID = 0 } # [06:06:36.109950251] (+0.000032603) ubuntu-xenial DotNETRuntime:ExceptionCatchStart: { cpu_id = 0 }, { EntryEIP = 139873639728404, MethodID = 139873626968120, MethodName = "void [calc] calc.Program::Main(string[])", ClrInstanceID = 0 } # ... |
It doesn’t tell where exactly exception fired, it also doesn’t have a stack trace, but it does tell exception type, it’s message and the method name it was caught in! More than enough to track this thing down.
Sampling and tracing with perfcollect
In dotnet
‘s repository there’s a tool called perfcollect
, which gathers almost the same data as perf
and lttng
do. In fact, perfcollect
is nothing more than a bash script, which, you guessed it, calls perf
and lttng
. However, there might be a few reasons to use it:
- It
install
s almost all of its dependences by itself, includingperf
andlttng
, which works on all major flavors of Linux. - Like a PerfView on Windows, it puts everything it managed to collect into single zip archive, so it might be easier to move it between production and development machines.
- It can use .NET Core’s
crossgen
utility, which might extract a little bit more debugging symbols from native DLLs than you’d get by simply usingperf
andlttng
. perfcollect
hasview
subcommand, which under the hood usesperf
orlttng
to display collected data (it’s simpler to know just one command).perfcollect
‘s results file can be brought to Windows machine and analyzed with PerfView.
It’s also quite easy to use. As this is just a bash script, simply download it, make it executable and call install
to bring in all dependences.
1 2 3 4 |
curl -OL http://aka.ms/perfcollect chmod +x perfcollect sudo ./perfcollect install #... |
Because under the hood it still going to use perf
and lttng
, both of our environmental variables have to be set:
1 2 |
COMPlus_PerfMapEnabled=1 COMPlus_EnableEventLog=1 dotnet bin/Debug/netcoreapp2.0/calc.dll & # [1] 13622 |
And the profiling itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
sudo ./perfcollect collect mysession -pid 13622 #Collection started. Press CTRL+C to stop. #^C #...STOPPED. # #Starting post-processing. This may take some time. # #Generating native image symbol files #...SKIPPED #Crossgen not found. Framework symbols will be unavailable. #See https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md#resolving-framework-symbols for details. #Saving native symbols #...FINISHED #Exporting perf.data file #...FINISHED #Compressing trace files #...FINISHED #Cleaning up artifacts #...FINISHED # #Trace saved to mysession.trace.zip |
Collected data went to mysession.trace.zip
file and we can view it by perfview
itself:
1 |
./perfcollect view mysession.trace.zip |
1 2 3 4 5 |
./perfcollect view mysession.trace.zip -viewer lttng # ... # [03:54:15.882946185] (+?.?????????) ubuntu-xenial DotNETRuntime:ExceptionThrown_V1: { cpu_id = 1 }, { vpid = 13622, vtid = 13622, procname = "dotnet" }, { ExceptionType = "System.Exception", ExceptionMessage = "Completely useless exception", ExceptionEIP = 140623183495319, ExceptionHRESULT = 2148734208, ExceptionFlags = 16, ClrInstanceID = 0 } # [03:54:15.883036311] (+0.000090126) ubuntu-xenial DotNETRuntime:ExceptionCatchStart: { cpu_id = 1 }, { vpid = 13622, vtid = 13622, procname = "dotnet" }, { EntryEIP = 140621151601940, MethodID = 140621138841656, MethodName = "void [calc] calc.Program::Main(string[])", ClrInstanceID = 0 } # ... |
As you can see, perfcollect
used perf
and babeltrace
‘s viewers we’re already familiar with.
Getting extra debugging symbols with crossgen
If perfcollect
could find crossgen
utility, the data it collected would be a little bit more readable – we’d have debugging symbols from native libraries. However, looking for correct crossgen
and putting it into the right place might be tricky.
Firstly, the simplest way to get crossgen
is to build our app in self-contained mode. In other words – publish it. When we do so, NuGet will automatically download crossgen
and put it somewhere under ~/.nuget
directory:
1 2 3 |
dotnet publish -r ubuntu-x64 find ~/.nuget -name crossgen #/home/ubuntu/.nuget/packages/runtime.linux-x64.microsoft.netcore.app/2.0.0/tools/crossgen |
Secondly, we need to find the right place. perfcollect
will look for crossgen
in the same folder as libcoreclr.so
file used by our dotnet executable. The location of that file will depend. If it’s running as self-contained application, the right libcoreclr.so
will be in application publish
folder.
If it’s not self-contained, like in our case, we’ll use libcoreclr.so
from dotnet’s shared runtime directory. I’m using .NET Core SDK 2.1.3, which comes with runtime 2.0.4, so I’d expect the correct path would have 2.0.4 in it.
And it seems that I’m right:
1 2 3 |
sudo find / -name libcoreclr.so # ... # /usr/share/dotnet/shared/Microsoft.NETCore.App/2.0.4/libcoreclr.so |
I probably wouldn’t do that in production, but inside of disposable VM I don’t see any problems in copying crossgen
into runtime folder itself:
1 2 3 |
sudo cp \ /home/ubuntu/.nuget/packages/runtime.linux-x64.microsoft.netcore.app/2.0.0/tools/crossgen \ /usr/share/dotnet/shared/Microsoft.NETCore.App/2.0.4/ |
If we start perfcollect
again, it won’t complain on missing crossgen
anymore. However, visually I didn’t notice any difference in the output with or without it. Maybe it would get more obvious if I copied trace result to Windows machine and analyzed PerfView, or profiled more complex example. But even like this the output looks good enough.
Conclusion
The biggest challenge for me when I thought about how to profile .NET Core on Linux was what question should I even ask Google? The gap between what I usually do and Linux profiling/tracing ecosystem was just too big. More than that, I can’t say that there’re lots of articles about this topic out there. Maybe people just stopped making bugs altogether.
However, I was lucky to find a series of articles (like this one) by Sasha Goldshtein, which provided a very good place to start. In fact, he already covered these tools, which makes me think: is it a plagiarism to write about the tools somebody else already wrote about, or not yet? But anyway, if you’re wondering what else is out there, be sure to check his articles out.
Thank you for creating this article.
I’m trying to use perf to profile a net core 2.1.6 application. However, perf seems unable to find the framework dlls and throws a lot of errors like:
Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.1.6/Microsoft.AspNetCore.Server.Kestrel.Core.dll, continuing without symbols
Failed to open /usr/share/dotnet/shared/Microsoft.AspNetCore.App/2.1.6/System.IO.Pipelines.dll, continuing without symbols
...
Those files exist at those exact paths. What am I misunderstanding?
Nothing painfully obvious comes to mind. I’d assume that perf either doesn’t have access to those files, or it expects them to be in some other format. If the first case I’d try to check what permissions are required to access them. In the second one I’d google a lot 🙂 Probably some dev tools have to be installed or similar.
Worth case scenario I’d try the following. Example that I described – the Vagrantfile and c# code did work at the time of writing. You could recreate that example with exact tooling versions as described – dotnet SDK 2.1.3, etc. Make sure it still works, and then try updating the versions one by one and see when (if) it stops working. E.g. update dotnet, then perf then whatever else was there.
Approach from the opposite end would be trying to bring that example environment and put your ASP.NET Core app in there. Assuming and hoping it doesn’t depend on 2.1.6 and still works on 2.1.3, you could check if perf works and again, could try gradually upgrade the tooling versions to yours and try to understand what makes it to break.
Thanks for the response. So I found this resource
https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md#resolving-framework-symbols
and it appears that the framework needs a crossgen utility to generate symbols. We used the alternative approach suggested in that section of setting an env var to force the framework to be just in time compiled. This will cause the framework symbols to be included in the perf maps generated in
/tmp
.Thanks for the article! We have been working on profiling netcore running in Kubernetes and this was extremely helpful. More details here:
https://github.com/number101010/netcore-kubernetes-profiling
Hi
I am using perfview in Kubernets container.
It required some extra params in deployment yaml to give elevated permissions.
However tried both on linx vm as well as Kubernetes dotnetcore pod , that when passing “pid” , it fails to generate CPU specific traces (fails to create perf.data.txt ) and thus while copying in perfview , CPU stacks are empty.
Thus , can you help to understand more on this bubble graph (while using perf report) to understand more on memory , cPu , garbage collection etc ?