Seeing how easy it was to provision one VM with Ansible, I can’t stop thinking: would it be as easy to deal with the whole cluster? After all, the original example I was trying to move to Ansible had three VMs: one Consul server and two worker machines. The server is ready, so adding two more machines sounds like an interesting exercise to do. So… let’s begin?
What we already have
So far we’ve built:
- Vagrantfile to create a VM,
- Ansible inventory file to know where to find it,
- Ansible playbook to provision it,
- systemd service definition file for Consul and
- init.json.j2 configuration template for him as well.
Just so we don’t have to jump back and forth between browser tabs I’ll put those files right here (that’s a lot of text).
1 2 3 4 5 6 7 8 9 |
Vagrant.configure("2") do |config| config.vm.box = "ubuntu/xenial64" config.vm.define "consul-server" do |machine| machine.vm.network "private_network", ip: "192.168.99.100" machine.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal" machine.vm.provision "ansible", playbook: "consul.yml" end end |
1 |
consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
- hosts: consul-server vars: consul_version: 0.9.2 consul_server_ip: 192.168.99.100 consul_config_dir: /etc/systemd/system/consul.d tasks: - name: Install unzip apt: name=unzip state=present become: true - name: Install Consul become: true unarchive: src: https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip remote_src: yes dest: /usr/local/bin creates: /usr/local/bin/consul mode: 0555 - name: Make Consul a service become: true copy: src: consul.service dest: /etc/systemd/system/consul.service - name: Ensure config directory exists become: true file: path: "{{ consul_config_dir }}" state: directory - name: Deploy consul config become: true template: src: init.json.j2 dest: "{{consul_config_dir}}/init.json" - name: Ensure consul's running become: true service: name=consul state=started |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[Unit] Description=consul agent Requires=network-online.target After=network-online.target [Service] EnvironmentFile=-/etc/sysconfig/consul Restart=on-failure ExecStart=/usr/local/bin/consul agent $CONSUL_FLAGS -config-dir=/etc/systemd/system/consul.d ExecReload=/bin/kill -HUP $MAINPID [Install] WantedBy=multi-user.target |
1 2 3 4 5 6 7 8 |
{ "server": true, "ui": true, "advertise_addr": "{{ consul_server_ip }}", "client_addr": "{{ consul_server_ip }}", "data_dir": "/tmp/consul", "bootstrap_expect": 1 } |
Hopefully, you just scrolled that down. As usual, vagrant up
will create and provision new VM, which in our case is Consul server. However, don’t do that now – we have other VMs to make.
Step 0. Add more VMs
Oh, Vagrant. Without you I’d have to create those by mouse, clicks and lack of understanding about what I’m doing. Yet here you are.
The code to create consul-server
VM already looked like a function, so making it a true function will allow me to reuse it for other cluster members. I also think that it worth removing Ansible provisioner from Vagrantfile just for now and applying the playbook manually with ansible-playbook
. As a downside, we also need to add ubuntu
user configuration back, otherwise the playbook won’t be able to connect to VM.
This is eventually what I’ve came up with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Vagrant.configure("2") do |config| config.vm.box = "ubuntu/xenial64" def create_consul_host(config, hostname, ip) config.vm.define hostname do |host| host.vm.hostname = hostname host.vm.network "private_network", ip: ip host.vm.provision "shell", inline: "echo ubuntu:ubuntu | chpasswd" host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal" end end create_consul_host(config, "consul-server", "192.168.99.100") create_consul_host(config, "consul-host-1", "192.168.99.101") create_consul_host(config, "consul-host-2", "192.168.99.102") end |
create_consul_host
is a function to create a VM ready to be ansibilized (is that a word?), and I just call it three times to create three identical VMs: consul-server
, consul-host-1
and consul-host-2
. vagrant up
will bring them to life and I don’t even need to check if they are OK. Of cause they are.
Step 1. Teach Ansible to trust
If you try to send any Ansible ad-hoc commands to newly created hosts (e.g. ansible all -i hosts -m ping
) it will cowardly refuse to execute them, as it never saw those hosts before. I used to manually confirm that it’s OK to talk to other hosts, but as their number grows, we need something more effective. For instance, configuration file with appropriate option in it.
Apparently, putting ansible.cfg
into current directory might solve all trust issues. Especially, if it has the following lines:
1 2 |
[defaults] host_key_checking = False |
In my case I also had to delete few entries from ~/.ssh/known_hosts
, as I already used some of IP addresses and seeing them again would made Ansible paranoid.
With the config file and three hosts running, we finally can execute something like ping
and see how all of them proudly respond with pong
:
1 2 3 4 5 |
ansible all -i hosts -m ping #consul-server | SUCCESS => { # "changed": false, # "ping": "pong" #} |
And they don’t. At least two worker hosts ignored the command, which is explainable, as I didn’t update inventory file and therefore Ansible has no idea about them. Oh well.
Step 2. Add new hosts to inventory file
Initial hosts
file was quite trivial and copy-pasting that line two times (and changing IP addresses, of cause) would definitely did the trick.
1 |
consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu |
However, the hosts will have different roles, so it makes sense to somehow reflect that in the file. Moreover, copy-pasting the same login and password three times is just silly.
Let’s organize those hosts into groups. For instance, consul-server
can be the sole member of servers
group, consul-host-1
and -2
will be the nodes
, and both of these groups will be members of a cluster
. In addition to that, we can put ssh login and pass variables to group variables section, so we don’t have to copy-paste them.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
consul-server ansible_host=192.168.99.100 consul-host-1 ansible_host=192.168.99.101 consul-host-2 ansible_host=192.168.99.102 [servers] consul-server [nodes] consul-host-[1:2] [cluster:children] servers nodes [cluster:vars] ansible_user=ubuntu ansible_ssh_pass=ubuntu |
Looks serious. I especially like the wildcard structure in the middle – [1:2]
, which saves me one line of a text.
This time pinging all
hosts works without a glitch:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
ansible all -i hosts -m ping #consul-host-1 | SUCCESS => { # "changed": false, # "ping": "pong" #} #consul-server | SUCCESS => { # "changed": false, # "ping": "pong" #} #consul-host-2 | SUCCESS => { # "changed": false, # "ping": "pong" #} |
Instead of all
I could use groups names and only a subset of the hosts would receive that command.
Having all configuration in place we finally can get to playbook file.
Step 3. Adapt playbook for multiple roles
We had six tasks for provisioning consul-server
.
- Install unzip
- Install Consul
- Make Consul a service
- Ensure config directory exists
- Deploy consul config
- Ensure consul’s running
The fifth one is going to be different for the server and its nodes, as it deploys role specific configuration file, but the rest will be the same for all Consul roles. As we’re allowed to put multiple plays in a playbook, we can organize cluster provisioning into four parts:
- Install Consul services on all VMs (tasks 1-4)
- Deploy Consul server configuration (task 5)
- Deploy Consul nodes configuration (task 5)
- Start all Consul agents (task 6)
Step 3.1 Install Consul services
For this step we’ll need to do some copy-pasting. In fact, lots of it. The first play is basically the whole consul.yml
we had before minus few things:
- “Deploy consul config” and “Ensure consul’s running” steps are gone.
- Instead of specific
consul-server
VMhosts
section (line 1) now targets the group calledcluster
(the one that we defined in inventory file, remember?) consul_server_ip
is also gone, as we don’t need it at the moment.- Consul itself got an update a week ago, so I changed
consul_ersion
(line 4) to0.9.3
.
This leaves us with something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
- hosts: cluster vars: consul_version: 0.9.3 consul_config_dir: /etc/systemd/system/consul.d tasks: - name: Install unzip apt: name=unzip state=present become: true # ... - name: Ensure config directory exists become: true file: path: "{{ consul_config_dir }}" state: directory |
Assuming that consul-server
, consul-host-1
and -2
are still running, we can install Consul on all three of them with the single command:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
ansible-playbook -i hosts consul.yml # #PLAY [cluster] *********************************************************** # #TASK [Gathering Facts] *************************************************** #ok: [consul-server] #ok: [consul-host-1] #ok: [consul-host-2] # ... #PLAY RECAP *************************************************************** #consul-host-1 : ok=5 changed=4 unreachable=0 failed=0 #consul-host-2 : ok=5 changed=4 unreachable=0 failed=0 #consul-server : ok=5 changed=4 unreachable=0 failed=0 |
You might be surprised how fast this works. The secret is that Ansible’s provisioning the hosts in parallel.
Step 3.2 Configuring consul-server
It could’ve been another pure copy-paste exercise, but I think we have some improvements to make along the way.
Firstly, let’s have a look at the only provisioning task that our second play will have:
1 2 3 4 5 |
- name: Deploy consul config become: true template: src: init.json.j2 dest: "{{consul_config_dir}}/init.json" |
init.json.j2
file name, which made perfect sense for single host provisioning, is getting unclear in multi-host configuration. Is it a server configuration or client’? server.init.json.j2
sounds like a better choice.
Then, “Deploy consul config” task uses consul_config_dir
variable, which was declared in the first play and therefore has limited scope. Should I also copy it into the second one? Nah, I don’t think so. Instead, we can make it global by moving to inventory file.
1 2 3 4 5 |
;... [cluster:vars] ansible_user=ubuntu ansible_ssh_pass=ubuntu consul_config_dir=/etc/systemd/system/consul.d |
Another thing is that template file itself relied on consul_server_ip
variable. I never liked that one, as it basically redeclared something already stored in inventory file. Seeing how we put consul_config_dir
variable into inventory file, can we do the opposite and use something that’s already there, like ansible_host
? Apparently we can, and putting ansible_host
into server.init.json.j2
instead of consul_server_ip
is perfect replacement for hardcoded IP address.
1 2 3 4 5 6 7 8 |
{ "server": true, "ui": true, "advertise_addr": "{{ ansible_host }}", "client_addr": "{{ ansible_host }}", "data_dir": "/tmp/consul", "bootstrap_expect": 1 } |
So this is how the second play is going to look in consul.yml
:
1 2 3 4 5 6 7 8 9 10 11 12 |
- hosts: cluster #.... - hosts: servers tasks: - name: Deploy consul server config become: true template: src: server.init.json.j2 dest: "{{consul_config_dir}}/init.json" |
In case you’ve forgotten, servers
is also one of the groups we declared in inventory file.
ansible-playbook -i hosts consul.yml
won’t do anything unusual except for copying configuration of consul server.
Step 3.3 Configuring Consul agents
This is going to be interesting. Configuration for consul agents was simple, just copy one more JSON into, let’s say, client.init.json.j2
, and we probably done.
1 2 3 4 5 |
{ "advertise_addr": "{{ ansible_host }}", "retry_join": ["{{ consul_server_ip }}"], "data_dir": "/tmp/consul" } |
We already know how ansible_host
works, so this takes care of advertise_addr
, but we also need to find consul_server_ip
which we’ve just got rid of. So what should we do? Redeclare it again?
In fact, we don’t have to. Ever wondered what “TASK [Gathering Facts]” means in Ansible output? Apparently, it’s implicit task that collects tons of useful information about hosts we’re going to provision: environmental variables, OS details, network interfaces, etc. What’s more, that data is grouped by the same groups we declared in inventory file, so assuming regular nodes
machines should know about existence of servers
group, we simply can lookup the IP in that collection.
The variable with automatically collected data is called hostvars
and this is how we can use it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
- hosts: cluster #... - hosts: servers #... - hosts: nodes tasks: - set_fact: consul_server={{ hostvars[inventory_hostname]['groups']['servers'][0] }} - set_fact: consul_server_ip={{ hostvars[consul_server]['ansible_all_ipv4_addresses'][0] }} - name: Deploy consul client config become: true template: src: client.init.json.j2 dest: "{{consul_config_dir}}/init.json" |
All magic is happening in lines 9 and 10. What we do there is declaring two variables (facts): consul_server
for storing a name of the first host in servers
group, and consul_server_ip
, which will store the first public IP of that host. It looks a little bit complicated, but if you dump the contents of hostvars
via e.g. - debug: var=hostvars
task, it all will start to make the perfect sense.
Step 3.4 Starting all consul services in all VMs
This one is absolutely trivial:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
- hosts: cluster # ... - hosts: servers # ... - hosts: nodes # ... - hosts: cluster tasks: - name: Ensure consul's running become: true service: name=consul state=started |
Running the playbook one more time will light up the whole cluster and as during the last time, few moments later we could see Consul server UI at 192.168.99.100:8500. This time with two more nodes:
Step 3.5 Connecting the playbook to Vagrantfile
This is going to be a little bit tricky. As we saw in single host provisioning scenario, vagrant will create its own inventory file by default. That would’ve been convenient if we didn’t have the bits of useful information like groups and variables in our own inventory. Likely, that behavior is configurable and by using provisioner’s inventory_path
we still can stick to existing inventory file.
Another issue also lies in default settings. Unlike ansible-playbook
, which provisions hosts in parallel, Vagrant’s ansible provisioner will do that in series. Not only it’s slower than it could be, our sniffing for consul_server_ip
actually depends on all hosts being provisioned altogether.
Again, likely for us, we can tell how many hosts should be provisioned concurrently by setting provisioner’s limit
setting to "all"
. We’ll also need to start the provisioning when all hosts are ready. This is how I made it to work:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# ... def create_consul_host(config, hostname, ip) config.vm.define hostname do |host| #... host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal" yield host if block_given? end end create_consul_host(config, "consul-server", "192.168.99.100") create_consul_host(config, "consul-host-1", "192.168.99.101") create_consul_host(config, "consul-host-2", "192.168.99.102") do |host| host.vm.provision "ansible" do |ansible| ansible.limit = "all" ansible.inventory_path = "./hosts" ansible.playbook = "consul.yml" end end # ... |
After this change single vagrant up
on clean machine will bring up fully functional Consul cluster without need to provision it with ansible-playbbook
.
Conclusion
Provisioning more than one VM with Ansible is not much harder than the single one. In fact, it feels exactly the same. Yes, there’s more text in inventory file, and playbook’s got a little big bigger, but essentially nothing’s changed. I’m especially happy with finding out how to use hostvars
variables. Hardcoding IP address bothered me since the last time, and I’m glad I found the way to avoid it. Of cause, it would be better if IPs went away from inventory file as well and Vagrant itself took care of them, but let’s take one step at a time.
The source code for this post can be found at github.