Somehow I missed the news that starting from version 1.12 Docker containers support health checks. Such checks don’t just test if container itself is running, but rather is it doing the job right. For instance, it can ping containerized web server to see if it responds to incoming requests, or measure memory consumption and see if it’s reasonable. As Docker health check is a shell command, it can test virtually anything.
When the test fails few times in a row, problematic container will get into “unhealthy” state, which makes no difference in standalone mode (except for triggered
health_status event), but causes container to restart in Swarm mode. Continue reading “Docker health checks”
In previous post we created a small Consul cluster which kept track of 4 services in it: two
web services and two
db‘s. However, we didn’t tell Consul agents how to monitor those services, so they completely missed the fact that none of the services actually exists. So today we’re going to take a close look at Consul’s health checks and see what effect they have on service discoverability. Continue reading “Checking service health status with Consul”
Last time we talked out about Elaticsearch – a hybrid of NoSQL database and a search engine. Today we’ll continue with Elastic’s ELK stack and will take a look at the tool called Logstash.
Continue reading “Processing logs with Logstash”
So far we’ve been dealing with name-value kind of monitoring data. However, what works well for numeric readings isn’t necessarily useful for textual data. In fact, Grafana, Graphite and Prometheus are useless for other kind of monitoring records – logs and traces.
There’re many, many tools for dealing with those, but I decided to take a look at Elastic’s ELK stack: Elasticsearch, Logstash and Kibana – storage, data processor and visualization tool. And today we’ll naturally start with the first letter of the stack: “E”.
Elasticsearch is fast, horizontally scalable open source search engine. It provides HTTP API for storing and indexing JSON documents and with default configuration it behaves a little bit like searchable NoSQL database.
Continue reading “Quick intro to Elasticsearch”
I don’t know if that’s a coincidence or not, but drastic changes in application metrics usually happen soon after a product upgrade was made. In fact, whenever I have to deal with new issue on production server, the first thing I do is checking if it was recently updated. No wonder it makes sense to record such events along with other monitoring data.
But assuming our monitoring data is in Graphite, how would we do that?
Continue reading “Tracking application events in Graphite”
There’re two conceptually different approaches in collecting application metrics. There’s PUSH approach, when metrics storage sits somewhere and waits until metrics source pushes some data into it. For instance, Graphite doesn’t do any collection on its own, it waits until somebody like collectd does the delivery.
There’s second approach – PULL. In this approach metrics sources don’t try to be smart and just provide their readings on demand. Whoever needs those metrics can make a call, e.g. HTTP request, in order to get some.
Prometheus collects metrics using the second approach. Continue reading “Scraping application metrics with Prometheus”
Even though Graphite does very decent job in displaying individual metrics graphs, its dashboards support is quite limited. Of cause, we could take its powerful Render URL API and build anything we like in good old HTML, but on the other hand, there’s Grafana.
Continue reading “Building dashboards with Grafana”
In the variety of collectd plugins there’s one ‘to rule them all’. If due to some course of events all collectd plugins except for Exec would be taken from you, you’d still be able to restore all its functionality with Exec.
As the name suggests, Exec starts external program or script and interprets its output as source of data. To be specific, it looks for lines that follow this scheme:
PUTVAL hostname/source-instance/datatype-instance [Interval=seconds] timestamp:value[:value..]
To be even more specific, these lines would work:
PUTVAL myhost/cpu-0/cpu-system interval=10 N:51
PUTVAL hostname/vm_count/gauge 1484012951:U
What is Graphite
Graphite is an app that does three things:
- It receives monitoring data from other agents,
- saves it efficiently into the database, and
- displays data as graphs and dashboards in web UI
Continue reading “Quick intro to Graphite”
I mentioned in previous post that collectd uses rrdtool for saving its data by default. It results
.rrd file for each metric, which later can be rendered using very same rrdtool. RRD files are not something most of the people are familiar with and the tool itself isn’t particularly easy to use, so why such an easy to use tool as collectd would choose it?
For a number of reasons. Continue reading “Quick intro to rrdtool”