It’s been more than a year since I connected a small piece of JavaScript to collectd plugin and started to gather our CI’s monitoring data and store it in Graphite. Surprisingly, the whole thing worked like a charm. Even the JavaScript component.
However, the time comes when I need to collect even more data coming from inside of long running apps, so JavaScript + collectd pair is no longer an option. What might work is those apps sending their metrics directly to Graphite server. After all, it can accept data in plain string format via TCP, so that shouldn’t be hard.
And here comes statsd
But there’s even simpler way. If you check out the latest Graphite Docker image, you’ll notice that it comes with preinstalled tool called StatsD. It’s a little buffer between the app and Graphite, which accepts metrics via UDP, aggregates them, and then puts to Graphite on certain intervals. Because its UDP, applications can fire-and-forget their metrics extremely fast. Because StatsD does buffering and aggregation, we lessen the burden on Graphite itself and therefore can deal with more data sources.
Finally, StatsD is dead simple and can work with different backends, so applications can use it as a unified metrics receiver and stop caring about where the data is going to be saved.
StatsD and .NET Core
There’s nothing special in relationship of StatsD and .NET Core. But because I have to deal with .NET Core a lot now, I’m particularly interested how those two fit together.
The list of StatsD packages on NuGet is a little bit messy. Some of them work with .NET Framework only, the other’s hasn’t been maintained for quite a while. But I found one which is relatively alive and does the job: StatsdClient.
The best way to see how something works is to try to make it work by yourself, so let’s build something.
The grand example
Sample program
I have a simple .NET Core program that runs in two threads: one that does random amount of meaningless job (generating strings) in a loop, and the other one that monitors the rate and the number of garbage collections happening, and how much of the memory we use in total:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
static void Main(string[] args) { Parallel.Invoke( () => { // Worker thread while (true) { var sw = new Stopwatch(); sw.Start(); DoPointlessJob(); sw.Stop(); Console.WriteLine($"Done in {sw.ElapsedMilliseconds}ms"); Thread.Sleep(random.Next(500)); } }, () => { // Monitoring thread int lastGcCount = 0; while (true) { Console.WriteLine($"TotalMemory: {GC.GetTotalMemory(false)}"); var gcCount = GC.CollectionCount(0); Console.WriteLine($"GC Count (gen0): {gcCount} (+{gcCount - lastGcCount})"); lastGcCount = gcCount; Thread.Sleep(100); } } ); } |
It would be better to monitor GC and memory from outside (e.g. via EventTrace), but it also would make example more complex, so probably the next time.
Ok, let’s start the program to see something like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
# # TotalMemory: 137643528 # GC Count (gen0): 641 (+17) # TotalMemory: 141173416 # GC Count (gen0): 658 (+17) # Done in 1686ms # TotalMemory: 142680136 # GC Count (gen0): 663 (+5) # TotalMemory: 145614400 # GC Count (gen0): 674 (+11) # TotalMemory: 151649280 # |
Nice numbers, probably mean something. But they would mean much more if we could make a graph of it. Graphite could do that, so let’s install it.
Installing Graphite
Since I installed Graphite the last time, it’s got official Docker image, so instead of several download and configure steps this one-liner will do everything:
1 2 3 4 5 6 7 8 9 |
docker run -d\ --name graphite\ --restart=always\ -p 80:80\ -p 2003-2004:2003-2004\ -p 2023-2024:2023-2024\ -p 8125:8125/udp\ -p 8126:8126\ graphiteapp/graphite-statsd |
For this demo, however, I don’t need container name, restart policy and ports other than 80 (Graphite UI) and 8125/udp (StatsD), so we can simplify the command to something like this:
1 2 3 4 |
docker run -d \ -p 80:80 \ -p 8125:8125/udp \ graphiteapp/graphite-statsd |
Now we can open the browser and voilà, here’s Graphite and its glorious emptiness:
Time to feed some data to it.
Feeding the data via StatsdClient
Installing StatsdClient package introduces new Metrics
class, which I simply used to replace Console.WriteLine
‘s (and StopWatch
). Resulting code became even simpler than it was before:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
static void Main(string[] args) { Metrics.Configure(new MetricsConfig { StatsdServerName = "127.0.0.1", Prefix = "myApp" }); Parallel.Invoke( () => { // Thread 1 while (true) { Metrics.Time(() => DoPointlessJob(), "pointlessJob"); Thread.Sleep(random.Next(500)); } }, () => { // Thread 2 int lastGcCount = 0; while (true) { Metrics.GaugeAbsoluteValue("totalMemory", GC.GetTotalMemory(false)); var gcCount = GC.CollectionCount(0); Metrics.Counter("gcCount.gen0", gcCount - lastGcCount); lastGcCount = gcCount; Thread.Sleep(100); } } ); } |
Changes are pretty straightforward. Firstly, we need to tell StatsdClient where Graphite server is. Secondly, Graphite metrics names usually look somewhat like this: myserver.myapp.mysensor.mySensorComponent
, which then resolves into nice folder-like structure in Graphite UI. Metrics
‘s Prefix
property simply sets the first part of the name, and the latter will come when we actually start feeding the data.
Finally, Time
, GaugeAbsoluteValue
and Counter
are methods for sending the data to StatsD daemon, and we’ll look at these three a little bit closer.
GaugeAbsoluteValue
This methods accepts gauge
-like type of readings, which basically is raw sensor value: temperature, free disk space, total amount of memory, etc. Very same value that you just sent to the storage, you’ll see as is at a graph later. The name of our gauge is totalMemory
(line 21), and assuming the newer version of demo program has been running for a while, we’ll see a nice memory graph:
Counter
Counters are a little bit trickier, because they display not the value itself, but rather the rate it changes. E.g. X units per second/minute/etc. That’s much more informative for some metrics like number if incoming requests or triggered garbage collections.
At line 23 we record how many GCs happened since the last check and we do the check every 100ms (line 24). Recorded values varied between 5 and 17, so I think the rate per second that will be around 100-120 GCs. Checking the graph.. and it seems like I’m almost right.
Time
Using “Time” for metric name is a bit misleading, because it actually doesn’t have to be about time. It would also work for request bytes, or shopping card dollars. StatsD aggregates “Time” readings within its buffering interval (10s by default) and then sends some calculated statistics to Graphite, like:
- total count of samples collected during 10s interval,
- min sample value,
- max sample value,
- median,
- etc.
In our case we measured job duration, so at the graph we can see the longest, shortest, “usual” and several other durations:
Neat.
One more data type
There’s also Set
data type, which we haven’t used here. It’s actually very simple: if we record user ids visiting our web site, Set
would calculate number of unique visitors during aggregation interval (again, 10s by default). Nothing more, nothing less.
Conclusion
There’s no doubt I could send my metrics to Graphite directly. But as StatsD already comes preinstalled with Graphite’s Docker image and it’s so easy to use, it’s really looks like a default choice. Fan fact. When I was experimenting with it in .NET Core, there were not much of an experiment at all. I just added NuGet package to the project, replaced function calls and it just worked. And I know for sure I’d definitely make a mistake or two if I started with TcpClient and NetworkStreams for talking directly to Graphite.