So last time I mentioned, that another Kubernetes compatible service mesh – Conduit – has chosen another approach to solve the problem. Instead of enabling the mesh at machine level via e.g. http_proxy env variable, it connects k8s pods or deployments to it one by one. I really like such kinds of ideas that make 180° turn on solving the problem, so naturally I wanted to see how exactly they did that. Continue reading “Service mesh implemented via iptables”
Somehow I missed the news that starting from version 1.12 Docker containers support health checks. Such checks don’t just test if container itself is running, but rather is it doing the job right. For instance, it can ping containerized web server to see if it responds to incoming requests, or measure memory consumption and see if it’s reasonable. As Docker health check is a shell command, it can test virtually anything.
When the test fails few times in a row, problematic container will get into “unhealthy” state, which makes no difference in standalone mode (except for triggered health_status event), but causes container to restart in Swarm mode. Continue reading “Docker health checks”
In previous post we created a small Consul cluster which kept track of 4 services in it: two web services and two db‘s. However, we didn’t tell Consul agents how to monitor those services, so they completely missed the fact that none of the services actually exists. So today we’re going to take a close look at Consul’s health checks and see what effect they have on service discoverability. Continue reading “Checking service health status with Consul”
So far we’ve been dealing with name-value kind of monitoring data. However, what works well for numeric readings isn’t necessarily useful for textual data. In fact, Grafana, Graphite and Prometheus are useless for other kind of monitoring records – logs and traces.
There’re many, many tools for dealing with those, but I decided to take a look at Elastic’s ELK stack: Elasticsearch, Logstash and Kibana – storage, data processor and visualization tool. And today we’ll naturally start with the first letter of the stack: “E”.
Elasticsearch is fast, horizontally scalable open source search engine. It provides HTTP API for storing and indexing JSON documents and with default configuration it behaves a little bit like searchable NoSQL database.
I don’t know if that’s a coincidence or not, but drastic changes in application metrics usually happen soon after a product upgrade was made. In fact, whenever I have to deal with new issue on production server, the first thing I do is checking if it was recently updated. No wonder it makes sense to record such events along with other monitoring data.
But assuming our monitoring data is in Graphite, how would we do that?
There’re two conceptually different approaches in collecting application metrics. There’s PUSH approach, when metrics storage sits somewhere and waits until metrics source pushes some data into it. For instance, Graphite doesn’t do any collection on its own, it waits until somebody like collectd does the delivery.
There’s second approach – PULL. In this approach metrics sources don’t try to be smart and just provide their readings on demand. Whoever needs those metrics can make a call, e.g. HTTP request, in order to get some.
Even though Graphite does very decent job in displaying individual metrics graphs, its dashboards support is quite limited. Of cause, we could take its powerful Render URL API and build anything we like in good old HTML, but on the other hand, there’s Grafana.
In the variety of collectd plugins there’s one ‘to rule them all’. If due to some course of events all collectd plugins except for Exec would be taken from you, you’d still be able to restore all its functionality with Exec.
As the name suggests, Exec starts external program or script and interprets its output as source of data. To be specific, it looks for lines that follow this scheme: