Last time we talked out about Elaticsearch – a hybrid of NoSQL database and a search engine. Today we’ll continue with Elastic’s ELK stack and will take a look at the tool called Logstash.
So far we’ve been dealing with name-value kind of monitoring data. However, what works well for numeric readings isn’t necessarily useful for textual data. In fact, Grafana, Graphite and Prometheus are useless for other kind of monitoring records – logs and traces.
There’re many, many tools for dealing with those, but I decided to take a look at Elastic’s ELK stack: Elasticsearch, Logstash and Kibana – storage, data processor and visualization tool. And today we’ll naturally start with the first letter of the stack: “E”.
Elasticsearch is fast, horizontally scalable open source search engine. It provides HTTP API for storing and indexing JSON documents and with default configuration it behaves a little bit like searchable NoSQL database.
I don’t know if that’s a coincidence or not, but drastic changes in application metrics usually happen soon after a product upgrade was made. In fact, whenever I have to deal with new issue on production server, the first thing I do is checking if it was recently updated. No wonder it makes sense to record such events along with other monitoring data.
But assuming our monitoring data is in Graphite, how would we do that?
There’re two conceptually different approaches in collecting application metrics. There’s PUSH approach, when metrics storage sits somewhere and waits until metrics source pushes some data into it. For instance, Graphite doesn’t do any collection on its own, it waits until somebody like collectd does the delivery.
There’s second approach – PULL. In this approach metrics sources don’t try to be smart and just provide their readings on demand. Whoever needs those metrics can make a call, e.g. HTTP request, in order to get some.
Prometheus collects metrics using the second approach. Continue reading “Scraping application metrics with Prometheus”
Even though Graphite does very decent job in displaying individual metrics graphs, its dashboards support is quite limited. Of cause, we could take its powerful Render URL API and build anything we like in good old HTML, but on the other hand, there’s Grafana.
In the variety of collectd plugins there’s one ‘to rule them all’. If due to some course of events all collectd plugins except for Exec would be taken from you, you’d still be able to restore all its functionality with Exec.
As the name suggests, Exec starts external program or script and interprets its output as source of data. To be specific, it looks for lines that follow this scheme:
PUTVAL hostname/source-instance/datatype-instance [Interval=seconds] timestamp:value[:value..]
To be even more specific, these lines would work:
PUTVAL myhost/cpu-0/cpu-system interval=10 N:51
PUTVAL hostname/vm_count/gauge 1484012951:U
I mentioned in previous post that collectd uses rrdtool for saving its data by default. It results .rrd file for each metric, which later can be rendered using very same rrdtool. RRD files are not something most of the people are familiar with and the tool itself isn’t particularly easy to use, so why such an easy to use tool as collectd would choose it?
For a number of reasons. Continue reading “Quick intro to rrdtool”
Distributed apps introduce a challenge that we usually could avoid in monolithic ones: how do we say that app is performing well? I’m not talking about it being user-friendly or providing business value. How do you tell that components of your distributed app are actually running? Which services are overutilized? Underutilized? Run out of disk space?
There’re tools to get that answers and collectd is one of them.