Docker health checks

Somehow I missed the news that starting from version 1.12 Docker containers support health checks. Such checks don’t just test if container itself is running, but rather is it doing the job right. For instance, it can ping containerized web server to see if it responds to incoming requests, or measure memory consumption and see if it’s reasonable. As Docker health check is a shell command, it can test virtually anything.

When the test fails few times in a row, problematic container will get into “unhealthy” state, which makes no difference in standalone mode (except for triggered health_status event), but causes container to restart in Swarm mode.

How to enable Docker health check

There’re at least four places where health check can be enabled:

  • in Dockerfile,
  • in docker run command,
  • in docker-compose or docker stack YAML file
  • and in docker service create command.

As bare minimum we should provide a shell command to execute as a health check which should exit with 0 error code for healthy state and 1 for unhealthy. Additionally, we can specify how often the check should run (--interval), for how long (--timeout), and how many unhealthy results in a row should we get (--retries) before container gets into unhealthy state. All of these three are optional.

Health check instruction in Dockerfile

Imagine we’d want to check if a web server inside of a container still responds to incoming requests. As Dockerfile’s HEALTHCHECK instruction has the following format – HEALTHCHECK [OPTIONS] CMD command, and assuming our check should happen every 5 seconds, take no longer than 10 seconds and should fail at least three times in a row for container to become unhealthy, here’s how our check would look like:

Because health check command it going to run from inside of a container, using 127.0.0.1 address for pinging the server is totally fine.

Health check in docker-compose YAML

It’s actually would look pretty much the same:

Health check in docker run and service create

Both docker run and docker service create commands share the same arguments for health checks and they are still very similar to ones you’d put into Dockerfile:

What’s more, both of them (as well as docker-compose YAML) can override or even disable the check previously declared in Dockerfile with --no-healthcheck=true.

Docker health check example

The victim

I created a small node.js web server which simply responds to any request with ‘OK’. However, the server also has a switch that toggles ON/OFF state without actually shutting server’s process down. Here’s how it looks:

So when the server is ON, it’ll listen at port 8080 and return OK to any request coming to that port. Making a call to port 8081 will shut the server down and another call will enable it back again:

Now let’s put that server.js into a Dockerfile with a health check, build an image and start it as a container:

Created container ID starts with ec3, which should be enough to identify it later, so now we can jump to health checks.

Monitoring container health status

Docker’s main command for checking container’s health is docker inspect. It produces huge JSON in response, but the only part we interested in is its State.Health property:

Not surprisingly, current status is ‘healthy’ and we even can see health checks logs in Log collection. However, after making a call to port 8081 and waiting for 3*5 seconds (to allow three checks to fail) the picture will change.

I waited a little bit longer than 15 seconds, so the health check managed to fail 4 times in a row (FailingStreak). And as expected, container’s status did change to ‘unhealthy’.

But as soon as at least one health check succeeds, Docker will put the container back to ‘healthy’ state:

Checking health status with Docker events

Along with inspecting container state directly, we also could’ve listen to docker events:

Docker events can be a little bit chatty that’s why I had to use the --filter. The command itself won’t exit right away and stay running, printing out events as they come.

Health status and Swarm services

In order to try how health checks affect Swarm services, I temporarily turned local Docker instance to Swarm mode by docker swarm init, and now I can do the following:

This puts a new service into the Swarm using locally built server  image. Docker wasn’t really happy with the fact that image is local and returned bunch of errors, but eventually it did return ID of newly created service:

curl 127.0.0.1:8080 will be working again and sending request to 8081 will, as usual, shut the server down. However, this time after a short while port 8080 will start working again without explicitly enabling the server. The thing is that as soon as Swarm manager noticed that container became unhealthy and therefore the whole service was no longer meeting desired state (‘running’), it shut down container completely and started a new one. We actually can see the traces of that by examining tasks collection for our server service:

As a little back story, every single Swarm container has a task assigned to it. When container dies, corresponding task gets shut down as well and Swarm creates a new pair of a task and a container. docker service ps displays the whole chain of tasks deaths and resurrections for given service and container. In our particular case server‘s initial task with id pj77brhfhsjm is marked as failed, and docker inspect even says why:

“Unhealthy container”, that’s why. But bottom line is that the service as a whole automatically recovered from unhealthy state with barely noticeably downtime.

Summary

Docker health checks is a cute little feature that allows attaching shell command to container and use it for checking if container’s content is alive enough. For containers in Docker engine’s standalone mode it just adds healthy/unhealthy attribute to it (plus health_status docker event when value changes), but in Swarm mode it actually will shutdown faulty container and create a new one with very little downtime.

5 thoughts on “Docker health checks

  1. Thanks for the healthcheck story 🙂
    I think you need to replace ‘|| echo 1’ by ‘|| exit 1’ after the curl command in some places.

  2. thanks a ton !! I am creating a company Docker concept and missing healthcheck was total showstopper. Funny how all docs tell that docker is SPOF free, which it is, but running services on swarm is far from SPOF free without health check.

Leave a Reply

Your email address will not be published. Required fields are marked *