Imagine your distributed app has two kinds of services: web
and db
. Both of them are replicated for higher availability, live on different hosts, go online and offline whenever they like. So, here’s a question: how do web
‘s find db
‘s?
Obvious solution would be to come up with some sort of reliable key-value storage, and whenever service comes online, it would register itself with the address in the store. But what happens when service goes offline? It probably could notify the store just before that, but c’mon, it’s internet: things can go offline without any warning. OK, then we could implement some sort of service health checks to ensure that they are still available… By the way, did you notice how quickly the simple idea of using external store for service discovery started to become a reasonably large infrastructure project?
Service discovery is something very hard to do. But we don’t have to – there’re tools for that, and Consul is one of them.
Service discovery with Consul
Last week we took a look at Consul’s Key-Value store as a source of distributed app configuration. But Consul is much more than that. It understands the concept of services and can collect and distribute the list of them through the cluster of Consul agents.
Here’s how it works. We put a small Consul agent on every server that is going to host at least one service. The agent is light, long living daemon which is responsible for knowing what services its host provides and whether or not they are healthy. Then it shares this information with Consul servers. Eventually, every Consul server within the cluster will have whole list of known and healthy services, so it can share it via DNS, HTTP API calls or plain web ui.
The plan
Today we’re going to build a small cluster of such Consul agents. Every one of them will know about a few services nearby and then we’ll query that info via HTTP and DNS queries.
I think three VMs will do: one for Consul server and two more for actual services and Consul agents to monitor them. As I have VirtualBox and docker-machine installed locally, I’ll use those for hosts management, but any other hypervisor and VM provisioner would do.
We’ll also need some services for Consul to discover. Obviously, we could’ve installed nginx and mysql on those hosts and call it a day, but as Consul agents don’t care if those services are real and we’ll implement service health checks in the next post, fake services will do as well.
And now, without further ado, let the fun begin.
Building up a cluster
Installing Consul server
First comes first: we need a VM. The following command will create a new Linux host called consul-server
using VirtualBox provider:
1 2 3 |
docker-machine create -d virtualbox consul-server # Running pre-create checks... # Creating machine... |
It’ll take about a minute or two to finish, but in the end we’ll get ourselves ready to use host with SSH and proper network configuration. Now, let’s SSH into it and download and ‘install’ Consul:
1 2 3 4 5 6 7 8 |
docker@consul-server:~$ wget https://releases.hashicorp.com/consul/0.7.5/consul_0.7.5_linux_amd64.zip # Connecting to releases.hashicorp.com (151.101.21.183:443) # .. docker@consul-server:~$ unzip consul_0.7.5_linux_amd64.zip # Archive: consul_0.7.5_linux_amd64.zip # ... docker@consul-server:~$ ./consul -v # Consul v0.7.5 |
Piece of cake.
Configure Consul server
And now goes the first tricky part. We need to configure a Consul agent that can do three things:
- behave like a server,
- have web UI for us,
- accept incoming connections from other agents in order to form a cluster.
In fact, all of these steps are pretty straightforward, it’s just too many of them.
Firstly, -server
command line switch will put the agent into the server mode. I’ve seen the docs, trust me on that.
Secondly, -ui
switch will enable web UI. Unfortunately it’ll listen on 127.0.0.1
IP address, and as it’s all happening inside of VM, I’d rather have it listening at VM’s public IP instead. We can get one either from inside of the VM by checking IP of eth1
network interface (e.g. ifconfig eth1
) or by asking docker-machine from the outside (e.g. docker-machine ip consul-server
).
After we found the IP, we can tell Consul’s UI to use specific address by -client %ip%
parameter.
Then, the server should be able to accept connections from other cluster members. We can tell it to listen to incoming connections at the IP we just found via -advertise %ip%
switch.
Finally, Consul will refuse to start without specifying directory to store its data. Just give it something through -data-dir
and it’ll shut up.
So, this is how server configuration command looks in the end:
1 2 3 4 |
./consul agent -server \ -ui -client 192.168.99.104 \ -data-dir /tmp/consul \ -advertise 192.168.99.104 |
Launch it, give the server some time to start and elect itself as cluster leader and now we can see its web UI at 8500 port at whatever IP address you provided:
“Nodes” elements represent individual hosts with Consul agents on them and so far we have only one. Not for long, though.
Installing consul agents
Time to create our server a company. Let’s create two more hosts – host-1
and host-2
and prepare Consul agents on them. The process is identical to installing the server up to the command that starts it, so I’ll just skip it and start with agent configuration.
There’re several differences in starting regular Consul agents. Firstly, those aren’t servers, so we can skip the -server
part. Then, we don’t actually need UI for them, so that part is also gone. They still need a data folder and public IP address to connect to them – this part stays.
Finally, agents need an address of existing cluster member to join it. My consul-server VM lives at 192.168.99.104
and passing this address to -retry-join
parameter should do the trick.
This is how agents start up scripts look like:
1 2 3 4 5 6 |
# host-1 docker@host-1:~$ ./consul agent -retry-join 192.168.99.104 -advertise 192.168.99.105 -data-dir /tmp/consul # ... # host-2 docker@host-2:~$ ./consul agent -retry-join 192.168.99.104 -advertise 192.168.99.106 -data-dir /tmp/consul # ... |
After it’s run, this is how web UI of the Consul server should look like:
Behold! We’ve built ourselves a cluster. Let’s add some life to it.
Setting up service definitions
Somebody needs to tell Consul agents about the services that run on their hosts, so agents can gossip (actual protocol name) about it with the server. There’re few ways how to do that, but we’ll just put some service definitions in agent’s config files.
Services definition file is a regular JSON file. For each service we could provide it’s name, IP address, port number and array of health check procedures, which Consul agent will run in order to make sure that service is still alive:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
{ "service": { "name": "redis", "tags": ["primary"], "address": "", "port": 8000, "enableTagOverride": false, "checks": [ { "script": "/usr/local/bin/check_redis.py", "interval": "10s" } ] } } |
However, all fields except for the “name” are optional, so we can create some sort of services.json
file as simple as this one and feed it to the agents:
1 2 3 4 5 6 7 |
{ "services": [{ "name": "web" }, { "name": "db" }] } |
1 2 3 4 5 |
docker@host-2:~$ ./consul agent \ -retry-join 192.168.99.104 \ -advertise 192.168.99.106 \ -data-dir /tmp/consul \ -config-file services.json |
After you do that to both agents, this is how “services” page will start to look at consul-server UI:
Consul server now knows about 5 services: 2 web’s, 2 db’s and the Consul itself. All services are reported healthy (why wouldn’t they – we skipped the health check), so services themselves or third parties can run a query or two in order to find each other.
Running discovery queries
We already covered the first type of discovery query – opening web UI and looking at “services” page entries. However, that’s not scalable enough. Alternatively, we could use HTTP API calls or DNS queries to get the same data but in more machine friendly fashion.
Service discovery through HTTP API
We can get the names of all registered services by making a simple HTTP call to our consul server:
1 2 3 4 5 6 |
curl http://192.168.99.104:8500/v1/catalog/services?pretty { "consul": [], "db": [], "web": [] } |
It doesn’t return where those services are, though, there’s another query for that.
Now, being equipped with a service name (e.g. db
) let’s make another HTTP call to find out who currently hosts this service:
1 2 3 4 5 6 7 8 9 10 11 |
curl http://192.168.99.104:8500/v1/catalog/service/db?pretty #[ # { # "Node": "host-1", # "Address": "192.168.99.105", # ... # }, # { # "Node": "host-2", # "Address": "192.168.99.106", #... |
Isn’t that good? It returned all available instances of db
service along with their host names and IP addresses. If any of the services goes offline, the output surely will reflect that.
Service discovery through DNS request
Consul server also acts as little DNS service. It listens at port 8600 and we could make a SRV record request to it via dig
utility. Assuming we’re still interested in location of db
service, here’s how that request would look like:
1 2 3 4 5 6 7 8 9 10 |
dig @192.168.99.104 -p 8600 db.service.consul SRV #... #;; ANSWER SECTION: #db.service.consul. 0 IN SRV 1 1 0 host-1.node.dc1.consul. #db.service.consul. 0 IN SRV 1 1 0 host-2.node.dc1.consul. # #;; ADDITIONAL SECTION: #host-1.node.dc1.consul. 0 IN A 192.168.99.105 #host-2.node.dc1.consul. 0 IN A 192.168.99.106 #... |
We skipped the port number for db
service in our service definition file, so it has zero value in response’s service record section. We also didn’t provide service IP addresses, but Consul was smart enough to put host IP into response’s A record.
What’s interesting, of you run this query again, services order will change, which acts as a simple load balancing mechanism.
Conclusion
Today we’ve covered a lot: creating virtual machines, building up Consul cluster, registering bunch of services and running few queries for them. However, all steps were pretty straightforward: start the executable, find out external IP address, provide JSON config with service definitions – not exactly a rocket science. But as a result we got scalable system that can keep the list of application services scattered across the web. Of cause, "name": "web"
doesn’t actually make it a web service and we completely skipped its health check, but installing actual nginx on the host wouldn’t change the configuration that much. As for health checks, we’ll take a close look at them next time.
2 thoughts on “Using Consul for Service Discovery”