I mentioned in previous post that collectd uses rrdtool for saving its data by default. It results .rrd file for each metric, which later can be rendered using very same rrdtool. RRD files are not something most of the people are familiar with and the tool itself isn’t particularly easy to use, so why such an easy to use tool as collectd would choose it?
For a number of reasons.
What is rrdtool
RRDtool is a command line program that is excellent in doing three things:
- creating Round-Robin Databases (RRDs),
- adding data to them,
- and creating graphs based on data in those databases.
What is Round-Robin Database
RRD is special kind of database which writes new data over the old one once it hits certain limit. For instance, if we designed database to keep 7 days worth of data, eighth day would be written over the first one, ninth would replace the second one, et cetera.
It is perfect for storing time series of data. If you think about it, in most cases we don’t need precise readings of CPU level for longer than a few days. We might also need hourly averages for a few weeks, and maybe daily average for a last year. You know, for historians. But anything earlier or more precise – personally, I don’t need it.
And here is where rrdtool strikes: it can create a database, for which we’ll tell what kind of data source it will use (e.g. series of numbers between 0 and 100 representing CPU percents, new value coming once per 10 seconds) and what kind of archives of to keep (e.g. 2 hours of per-minute averages, 24 hours of per 15 minutes averages, 365 days of per-day averages). The size of the database is known from the very beginning, so we won’t run out of disk space eventually. That’s brilliant.
Creating Round-Robin Database (RRD)
Creating RRD with rrdtool involves some dark magic, but once you get the spell right, it all starts to make sense. Let’s see the command and then try to understand why it works:
rrdtool create cpu.rrd \
--step 10 \
First line is pretty obvious – create something named cpu.rrd. In our case it’s round-robin database intended for storing CPU readings. The next parameter – --step 10 – specifies how often the new data will come. In our case it’s 10 – every ten seconds there will be something feeding us new CPU reading, or else rrdtool will do something about it. If data comes a little bit earlier or later, rrdtool will interpolate it up (down) to 10th second.
Remaining two lines describe the data source and define how we’ll archive its data.
Data source (DS)
Data source describes the data feed. It doesn’t mean that all of that data will be stored though – that what archives are for. There can be multiple data sources in the database, but for the sake of simplicity we’re using only one. Parameter we used before – DS:cpu:GAUGE:20:0:100 – literally says the following:
- define a data source
- called cpu,
- with type GAUGE.
- If no data comes during the heartbeat period of 20 seconds, store UNDEFINED.
- Data will vary between 0
- and 100.
GAUGE type tells rrdtool to take incoming values as is. Other three types – COUNTER, DERIVE and ABSOLUTE – would calculate the rate how value changes, taking previous value and step duration into account. With --step , heartbeat parameter and builtin interpolation logic every 10 seconds RRD will get a new data point, weather you like it or not. Data source calls such points Primary Data Points (PDP)
Round-robin archive (RRA)
RRA is aggregated window of data that came from the data source. Though there can be many archives for single data source, in our example we created only one RRA:AVERAGE:0.5:6:120 , which decrypts to:
- Create 120 items long
- round-robin archive (RRA)
- where every item is an AVERAGE
- of 6 data points from the data source.
- If more than half (>0.5) of those six are UNDEFINED, assume aggregated value UNDEFINED as well.
This basically means that we’ll keep 2 hour long (10s * 6 * 120) archive of 1 minute long (10s * 6) CPU averages. Every data point of an archive is called consolidated data point (CDP) and the AVERAGE is consolidation function (CF). Relationship between primary data points (PDPs), CFs and CDPs looks like this:
- CDP1 = CF(PDP1, PDP2, …, PDPn)
- CDP2 = CF(PDPn+1, …, PDP2n)
Apart from AVERAGE, there’re other consolidation functions like MIN, MAX, or LAST.
Adding data to RRD
Adding new data also happens through rrdtool. Because time is so important for data source, it should be provided along with the value itself. It can be “now”:
rrdtool update cpu.rrd N:51 # Now:51%
or number of seconds since ‘epoch’ (1970-01-01):
rrdtool update cpu.rrd 1482814719:52 # Tue, 27 Dec 2016 04:58:39 GMT:52%
or number of seconds ago:
rrdtool update cpu.rrd -- -15:3 # 15 seconds ago:3%
It’s also possible to use ‘U’ as update value, which stands for to UNDEFINED.
Even though you can add your data manually through a command line, doing that from other program would make more sense. E.g. using a cron job to launch a shell script that reads CPU info and feeds results to rrdtool update .
Creating graphs from RRD
Not only rrdtool can store the data, it also can create graphs from it. Command for doing that has bazillion of little arguments that control every aspect of the graph, but we’ll take only a few of them. Just enough to get the idea.
I have a collectd daemon running in one of local VMs that already produced few RRD files: cpu-system.rrd, cpu-user.rrd and some more. If we wanted to graph first two files, we could do it like this:
rrdtool graph cpu.png \ # Create graph cpu.png
-s 'end-30m' \ # for data range starting 30mins ago
-e 'now' \ # and until now
-w '700' -h '350' \ # width/height: 700/350
-u 40 \ # y-axe upper bound: 40
-t 'cpu-0' \ # title 'cpu-0'
-v 'Jiffies' # vertical title: 'Jiffies'
'DEF:user_avg=cpu-user.rrd:value:AVERAGE' \ # import from cpu-user.rrd
'CDEF:user_clean=user_avg,UN,0,user_avg,IF' \ # replace UNDEFINED with 0
'DEF:system_avg=cpu-system.rrd:value:AVERAGE' \ # import from cpu-system.rrd
'CDEF:system_clean=system_avg,UN,0,system_avg,IF' \ # replace UNDEFINED with 0
'CDEF:user_stack=system_clean,user_clean,+' \ # calculate new 'user' series built on top of 'system'
'AREA:user_clean#FFF000:user' \ # Draw yellow area for 'user_clean' definition with 'user' legend
'AREA:system_clean#FF0000:system' \ # Draw red area for 'system_clean' definition with 'system legend
'LINE1:user_clean#FF0000' # Draw thin red line on top of 'user_clean' area
This command results reasonably attractive graph:
First half of the arguments is more or less clear: set the title, choose image dimensions, etc. But the second half, the one that actually renders the data, is a bit trickier.
DEF stands for DEFinition and 'DEF:user_avg=cpu-user.rrd:value:AVERAGE' is equivalent to ‘from now on user_avg means one of the AVERAGE round-robin archives for data source named value in cpu-user.rrd file’. Because cpu-user.rrd can contain more than one RRA, rrdtool will pick one that suits best for graph’s time bounds.
CDEF on the other hand is Calculated DEFinition. It’s mathematical or logical expression in reverse polish notation that we can apply to the data series. In the example above we have three of those. Two for replacing UNDEFINED values in with zero ( user_clean=user_avg,UN,0,user_avg,IF ), and one for introducing user_stack definition ( user_stack=system_clean,user_clean,+ ) – a sum of system and user CPU readings, so we could draw CPU value in total.
Finally, AREA and LINE1 do the actual drawing for definitions we declared. They take definition name, color and, optionally, legend name. There’re also LINE2 and LINE3 that differ by line thickness.
rrdtool it nice little tool that solves the problem of storing time series of data effectively and provides powerful way to display it. Its command line nature makes it easy to embed rrdtool into other programs starting from shell scripts and ending with monitoring daemons. Taking all that into account, as well as fixed size of RRD files, I can see why collectd uses it as default output for its data.