Provisioning cluster of VMs with Ansible - Dots and Brackets: Code Blog

Seeing how easy it was to provision one VM with Ansible, I can’t stop thinking: would it be as easy to deal with the whole cluster? After all, the original example I was trying to move to Ansible had three VMs: one Consul server and two worker machines. The server is ready, so adding two more machines sounds like an interesting exercise to do. So… let’s begin?

What we already have

So far we’ve built:

Vagrantfile to create a VM,
Ansible inventory file to know where to find it,
Ansible playbook to provision it,
systemd service definition file for Consul and
init.json.j2 configuration template for him as well.

Just so we don’t have to jump back and forth between browser tabs I’ll put those files right here (that’s a lot of text).

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/xenial64"

  config.vm.define "consul-server" do |machine|
	machine.vm.network "private_network", ip: "192.168.99.100"
	machine.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"
	machine.vm.provision "ansible", playbook: "consul.yml"
  end
end

Vagrant.configure("2") do |config|

config.vm.box = "ubuntu/xenial64"

config.vm.define "consul-server" do |machine|

machine.vm.network "private_network", ip: "192.168.99.100"

machine.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"

machine.vm.provision "ansible", playbook: "consul.yml"

end

consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu

1	consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu

- hosts: consul-server

  vars:
    consul_version: 0.9.2
    consul_server_ip: 192.168.99.100
    consul_config_dir: /etc/systemd/system/consul.d

  tasks:

   - name: Install unzip
     apt: name=unzip state=present
     become: true

   - name: Install Consul
     become: true
     unarchive:
       src: https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip
       remote_src: yes
       dest: /usr/local/bin
       creates: /usr/local/bin/consul
       mode: 0555

   - name: Make Consul a service
     become: true
     copy: 
       src: consul.service
       dest: /etc/systemd/system/consul.service 

   - name: Ensure config directory exists
     become: true
     file: 
       path: "{{ consul_config_dir }}"
       state: directory

   - name: Deploy consul config
     become: true
     template: 
       src: init.json.j2
       dest: "{{consul_config_dir}}/init.json"

   - name: Ensure consul's running
     become: true
     service: name=consul state=started

- hosts: consul-server

vars:

consul_version: 0.9.2

consul_server_ip: 192.168.99.100

consul_config_dir: /etc/systemd/system/consul.d

tasks:

- name: Install unzip

apt: name=unzip state=present

become: true

- name: Install Consul

become: true

unarchive:

src: https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip

remote_src: yes

dest: /usr/local/bin

creates: /usr/local/bin/consul

mode: 0555

- name: Make Consul a service

become: true

copy:

src: consul.service

dest: /etc/systemd/system/consul.service

- name: Ensure config directory exists

become: true

file:

path: "{{ consul_config_dir }}"

state: directory

- name: Deploy consul config

become: true

template:

src: init.json.j2

dest: "{{consul_config_dir}}/init.json"

- name: Ensure consul's running

become: true

service: name=consul state=started

[Unit]
Description=consul agent
Requires=network-online.target
After=network-online.target

[Service]
EnvironmentFile=-/etc/sysconfig/consul
Restart=on-failure
ExecStart=/usr/local/bin/consul agent $CONSUL_FLAGS -config-dir=/etc/systemd/system/consul.d
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

[Unit]

Description=consul agent

Requires=network-online.target

After=network-online.target

[Service]

EnvironmentFile=-/etc/sysconfig/consul

Restart=on-failure

ExecStart=/usr/local/bin/consul agent $CONSUL_FLAGS -config-dir=/etc/systemd/system/consul.d

ExecReload=/bin/kill -HUP $MAINPID

[Install]

WantedBy=multi-user.target

{
	"server": true,
	"ui": true,
	"advertise_addr": "{{ consul_server_ip }}",
	"client_addr": "{{ consul_server_ip }}",
	"data_dir": "/tmp/consul",
	"bootstrap_expect": 1
}

{

"server": true,

"ui": true,

"advertise_addr": "{{ consul_server_ip }}",

"client_addr": "{{ consul_server_ip }}",

"data_dir": "/tmp/consul",

"bootstrap_expect": 1

}

Hopefully, you just scrolled that down. As usual, vagrant up will create and provision new VM, which in our case is Consul server. However, don’t do that now – we have other VMs to make.

Step 0. Add more VMs

Oh, Vagrant. Without you I’d have to create those by mouse, clicks and lack of understanding about what I’m doing. Yet here you are.

The code to create consul-server VM already looked like a function, so making it a true function will allow me to reuse it for other cluster members. I also think that it worth removing Ansible provisioner from Vagrantfile just for now and applying the playbook manually with ansible-playbook. As a downside, we also need to add ubuntu user configuration back, otherwise the playbook won’t be able to connect to VM.

This is eventually what I’ve came up with:

Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/xenial64"

  def create_consul_host(config, hostname, ip)
	config.vm.define hostname do |host|
		host.vm.hostname = hostname
		host.vm.network "private_network", ip: ip
		host.vm.provision "shell", inline: "echo ubuntu:ubuntu | chpasswd"
		host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"
	end
  end

  create_consul_host(config, "consul-server", "192.168.99.100")
  create_consul_host(config, "consul-host-1", "192.168.99.101")
  create_consul_host(config, "consul-host-2", "192.168.99.102")
end

Vagrant.configure("2") do |config|

config.vm.box = "ubuntu/xenial64"

def create_consul_host(config, hostname, ip)

config.vm.define hostname do |host|

host.vm.hostname = hostname

host.vm.network "private_network", ip: ip

host.vm.provision "shell", inline: "echo ubuntu:ubuntu | chpasswd"

host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"

end

create_consul_host(config, "consul-server", "192.168.99.100")

create_consul_host(config, "consul-host-1", "192.168.99.101")

create_consul_host(config, "consul-host-2", "192.168.99.102")

end

create_consul_host is a function to create a VM ready to be ansibilized (is that a word?), and I just call it three times to create three identical VMs: consul-server, consul-host-1 and consul-host-2. vagrant up will bring them to life and I don’t even need to check if they are OK. Of cause they are.

Step 1. Teach Ansible to trust

If you try to send any Ansible ad-hoc commands to newly created hosts (e.g. ansible all -i hosts -m ping) it will cowardly refuse to execute them, as it never saw those hosts before. I used to manually confirm that it’s OK to talk to other hosts, but as their number grows, we need something more effective. For instance, configuration file with appropriate option in it.

Apparently, putting ansible.cfg into current directory might solve all trust issues. Especially, if it has the following lines:

[defaults]
host_key_checking = False

1 2	[defaults] host_key_checking = False

In my case I also had to delete few entries from ~/.ssh/known_hosts, as I already used some of IP addresses and seeing them again would made Ansible paranoid.

With the config file and three hosts running, we finally can execute something like ping and see how all of them proudly respond with pong:

ansible all -i hosts -m ping
#consul-server | SUCCESS => {
#    "changed": false, 
#    "ping": "pong"
#}

ansible all -i hosts -m ping

#consul-server | SUCCESS => {

# "changed": false,

# "ping": "pong"

And they don’t. At least two worker hosts ignored the command, which is explainable, as I didn’t update inventory file and therefore Ansible has no idea about them. Oh well.

Step 2. Add new hosts to inventory file

Initial hosts file was quite trivial and copy-pasting that line two times (and changing IP addresses, of cause) would definitely did the trick.

consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu

1	consul-server ansible_host=192.168.99.100 ansible_user=ubuntu ansible_ssh_pass=ubuntu

However, the hosts will have different roles, so it makes sense to somehow reflect that in the file. Moreover, copy-pasting the same login and password three times is just silly.

Let’s organize those hosts into groups. For instance, consul-server can be the sole member of servers group, consul-host-1 and -2 will be the nodes, and both of these groups will be members of a cluster. In addition to that, we can put ssh login and pass variables to group variables section, so we don’t have to copy-paste them.

consul-server ansible_host=192.168.99.100
consul-host-1 ansible_host=192.168.99.101
consul-host-2 ansible_host=192.168.99.102

[servers]
consul-server

[nodes]
consul-host-[1:2]

[cluster:children]
servers
nodes

[cluster:vars]
ansible_user=ubuntu
ansible_ssh_pass=ubuntu

consul-server ansible_host=192.168.99.100

consul-host-1 ansible_host=192.168.99.101

consul-host-2 ansible_host=192.168.99.102

[servers]

consul-server

[nodes]

consul-host-[1:2]

[cluster:children]

servers

nodes

[cluster:vars]

ansible_user=ubuntu

ansible_ssh_pass=ubuntu

Looks serious. I especially like the wildcard structure in the middle – [1:2], which saves me one line of a text.

This time pinging all hosts works without a glitch:

ansible all -i hosts -m ping
#consul-host-1 | SUCCESS => {
#    "changed": false, 
#    "ping": "pong"
#}
#consul-server | SUCCESS => {
#    "changed": false, 
#    "ping": "pong"
#}
#consul-host-2 | SUCCESS => {
#    "changed": false, 
#    "ping": "pong"
#}

ansible all -i hosts -m ping

#consul-host-1 | SUCCESS => {

# "changed": false,

# "ping": "pong"

#consul-server | SUCCESS => {

# "changed": false,

# "ping": "pong"

#consul-host-2 | SUCCESS => {

# "changed": false,

# "ping": "pong"

Instead of all I could use groups names and only a subset of the hosts would receive that command.

Having all configuration in place we finally can get to playbook file.

Step 3. Adapt playbook for multiple roles

We had six tasks for provisioning consul-server.

Install unzip
Install Consul
Make Consul a service
Ensure config directory exists
Deploy consul config
Ensure consul’s running

The fifth one is going to be different for the server and its nodes, as it deploys role specific configuration file, but the rest will be the same for all Consul roles. As we’re allowed to put multiple plays in a playbook, we can organize cluster provisioning into four parts:

Install Consul services on all VMs (tasks 1-4)
Deploy Consul server configuration (task 5)
Deploy Consul nodes configuration (task 5)
Start all Consul agents (task 6)

Step 3.1 Install Consul services

For this step we’ll need to do some copy-pasting. In fact, lots of it. The first play is basically the whole consul.yml we had before minus few things:

“Deploy consul config” and “Ensure consul’s running” steps are gone.
Instead of specific consul-server VM hosts section (line 1) now targets the group called cluster (the one that we defined in inventory file, remember?)
consul_server_ip is also gone, as we don’t need it at the moment.
Consul itself got an update a week ago, so I changed consul_ersion (line 4) to 0.9.3.

This leaves us with something like this:

- hosts: cluster

  vars:
    consul_version: 0.9.3
    consul_config_dir: /etc/systemd/system/consul.d

  tasks:

   - name: Install unzip
     apt: name=unzip state=present
     become: true

  # ...

   - name: Ensure config directory exists
     become: true
     file: 
       path: "{{ consul_config_dir }}"
       state: directory

- hosts: cluster

vars:

consul_version: 0.9.3

consul_config_dir: /etc/systemd/system/consul.d

tasks:

- name: Install unzip

apt: name=unzip state=present

become: true

# ...

- name: Ensure config directory exists

become: true

file:

path: "{{ consul_config_dir }}"

state: directory

Assuming that consul-server, consul-host-1 and -2 are still running, we can install Consul on all three of them with the single command:

ansible-playbook -i hosts consul.yml
#
#PLAY [cluster] ***********************************************************
#
#TASK [Gathering Facts] ***************************************************
#ok: [consul-server]
#ok: [consul-host-1]
#ok: [consul-host-2]
# ...
#PLAY RECAP ***************************************************************
#consul-host-1              : ok=5    changed=4    unreachable=0    failed=0   
#consul-host-2              : ok=5    changed=4    unreachable=0    failed=0   
#consul-server              : ok=5    changed=4    unreachable=0    failed=0

ansible-playbook -i hosts consul.yml

#PLAY [cluster] ***********************************************************

#TASK [Gathering Facts] ***************************************************

#ok: [consul-server]

#ok: [consul-host-1]

#ok: [consul-host-2]

# ...

#PLAY RECAP ***************************************************************

#consul-host-1 : ok=5 changed=4 unreachable=0 failed=0

#consul-host-2 : ok=5 changed=4 unreachable=0 failed=0

#consul-server : ok=5 changed=4 unreachable=0 failed=0

You might be surprised how fast this works. The secret is that Ansible’s provisioning the hosts in parallel.

Step 3.2 Configuring consul-server

It could’ve been another pure copy-paste exercise, but I think we have some improvements to make along the way.

Firstly, let’s have a look at the only provisioning task that our second play will have:

   - name: Deploy consul config
     become: true
     template: 
       src: init.json.j2
       dest: "{{consul_config_dir}}/init.json"

- name: Deploy consul config

become: true

template:

src: init.json.j2

dest: "{{consul_config_dir}}/init.json"

init.json.j2 file name, which made perfect sense for single host provisioning, is getting unclear in multi-host configuration. Is it a server configuration or client’? server.init.json.j2 sounds like a better choice.

Then, “Deploy consul config” task uses consul_config_dir variable, which was declared in the first play and therefore has limited scope. Should I also copy it into the second one? Nah, I don’t think so. Instead, we can make it global by moving to inventory file.

;...
[cluster:vars]
ansible_user=ubuntu
ansible_ssh_pass=ubuntu
consul_config_dir=/etc/systemd/system/consul.d

;...

[cluster:vars]

ansible_user=ubuntu

ansible_ssh_pass=ubuntu

consul_config_dir=/etc/systemd/system/consul.d

Another thing is that template file itself relied on consul_server_ip variable. I never liked that one, as it basically redeclared something already stored in inventory file. Seeing how we put consul_config_dir variable into inventory file, can we do the opposite and use something that’s already there, like ansible_host? Apparently we can, and putting ansible_host into server.init.json.j2 instead of consul_server_ip is perfect replacement for hardcoded IP address.

{
	"server": true,
	"ui": true,
	"advertise_addr": "{{ ansible_host }}",
	"client_addr": "{{ ansible_host }}",
	"data_dir": "/tmp/consul",
	"bootstrap_expect": 1
}

{

"server": true,

"ui": true,

"advertise_addr": "{{ ansible_host }}",

"client_addr": "{{ ansible_host }}",

"data_dir": "/tmp/consul",

"bootstrap_expect": 1

}

So this is how the second play is going to look in consul.yml:

- hosts: cluster
#....

- hosts: servers

  tasks:

   - name: Deploy consul server config
     become: true
     template: 
       src: server.init.json.j2
       dest: "{{consul_config_dir}}/init.json"

- hosts: cluster

#....

- hosts: servers

tasks:

- name: Deploy consul server config

become: true

template:

src: server.init.json.j2

dest: "{{consul_config_dir}}/init.json"

In case you’ve forgotten, servers is also one of the groups we declared in inventory file.

ansible-playbook -i hosts consul.yml won’t do anything unusual except for copying configuration of consul server.

Step 3.3 Configuring Consul agents

This is going to be interesting. Configuration for consul agents was simple, just copy one more JSON into, let’s say, client.init.json.j2, and we probably done.

{
	"advertise_addr": "{{ ansible_host }}",
	"retry_join": ["{{ consul_server_ip }}"],
	"data_dir": "/tmp/consul"
}

{

"advertise_addr": "{{ ansible_host }}",

"retry_join": ["{{ consul_server_ip }}"],

"data_dir": "/tmp/consul"

}

We already know how ansible_host works, so this takes care of advertise_addr, but we also need to find consul_server_ip which we’ve just got rid of. So what should we do? Redeclare it again?

In fact, we don’t have to. Ever wondered what “TASK [Gathering Facts]” means in Ansible output? Apparently, it’s implicit task that collects tons of useful information about hosts we’re going to provision: environmental variables, OS details, network interfaces, etc. What’s more, that data is grouped by the same groups we declared in inventory file, so assuming regular nodes machines should know about existence of servers group, we simply can lookup the IP in that collection.

The variable with automatically collected data is called hostvars and this is how we can use it:

- hosts: cluster
#...
- hosts: servers
#...
- hosts: nodes

  tasks:

   - set_fact: consul_server={{ hostvars[inventory_hostname]['groups']['servers'][0] }}
   - set_fact: consul_server_ip={{ hostvars[consul_server]['ansible_all_ipv4_addresses'][0] }}

   - name: Deploy consul client config
     become: true
     template: 
       src: client.init.json.j2
       dest: "{{consul_config_dir}}/init.json"

- hosts: cluster

#...

- hosts: servers

#...

- hosts: nodes

tasks:

- set_fact: consul_server={{ hostvars[inventory_hostname]['groups']['servers'][0] }}

- set_fact: consul_server_ip={{ hostvars[consul_server]['ansible_all_ipv4_addresses'][0] }}

- name: Deploy consul client config

become: true

template:

src: client.init.json.j2

dest: "{{consul_config_dir}}/init.json"

All magic is happening in lines 9 and 10. What we do there is declaring two variables (facts): consul_server for storing a name of the first host in servers group, and consul_server_ip, which will store the first public IP of that host. It looks a little bit complicated, but if you dump the contents of hostvars via e.g. - debug: var=hostvars task, it all will start to make the perfect sense.

Step 3.4 Starting all consul services in all VMs

This one is absolutely trivial:

- hosts: cluster
# ...
- hosts: servers
# ...
- hosts: nodes
# ...
- hosts: cluster

  tasks: 

   - name: Ensure consul's running
     become: true
     service: name=consul state=started

- hosts: cluster

# ...

- hosts: servers

# ...

- hosts: nodes

# ...

- hosts: cluster

tasks:

- name: Ensure consul's running

become: true

service: name=consul state=started

Running the playbook one more time will light up the whole cluster and as during the last time, few moments later we could see Consul server UI at 192.168.99.100:8500. This time with two more nodes:

Step 3.5 Connecting the playbook to Vagrantfile

This is going to be a little bit tricky. As we saw in single host provisioning scenario, vagrant will create its own inventory file by default. That would’ve been convenient if we didn’t have the bits of useful information like groups and variables in our own inventory. Likely, that behavior is configurable and by using provisioner’s inventory_path we still can stick to existing inventory file.

Another issue also lies in default settings. Unlike ansible-playbook, which provisions hosts in parallel, Vagrant’s ansible provisioner will do that in series. Not only it’s slower than it could be, our sniffing for consul_server_ip actually depends on all hosts being provisioned altogether.

Again, likely for us, we can tell how many hosts should be provisioned concurrently by setting provisioner’s limit setting to "all". We’ll also need to start the provisioning when all hosts are ready. This is how I made it to work:

# ...  
  def create_consul_host(config, hostname, ip)
	config.vm.define hostname do |host|
		#...
		host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"
		yield host if block_given?
	end
  end

  create_consul_host(config, "consul-server", "192.168.99.100")
  create_consul_host(config, "consul-host-1", "192.168.99.101")
  create_consul_host(config, "consul-host-2", "192.168.99.102") do |host|
  	host.vm.provision "ansible" do |ansible|
  		ansible.limit = "all"
  		ansible.inventory_path = "./hosts"
  		ansible.playbook = "consul.yml"
  	end
  end
# ...

# ...

def create_consul_host(config, hostname, ip)

config.vm.define hostname do |host|

#...

host.vm.provision "shell", inline: "apt-get update && apt-get install -y python-minimal"

yield host if block_given?

end

create_consul_host(config, "consul-server", "192.168.99.100")

create_consul_host(config, "consul-host-1", "192.168.99.101")

create_consul_host(config, "consul-host-2", "192.168.99.102") do |host|

host.vm.provision "ansible" do |ansible|

ansible.limit = "all"

ansible.inventory_path = "./hosts"

ansible.playbook = "consul.yml"

end

# ...

After this change single vagrant up on clean machine will bring up fully functional Consul cluster without need to provision it with ansible-playbbook.

Conclusion

Provisioning more than one VM with Ansible is not much harder than the single one. In fact, it feels exactly the same. Yes, there’s more text in inventory file, and playbook’s got a little big bigger, but essentially nothing’s changed. I’m especially happy with finding out how to use hostvars variables. Hardcoding IP address bothered me since the last time, and I’m glad I found the way to avoid it. Of cause, it would be better if IPs went away from inventory file as well and Vagrant itself took care of them, but let’s take one step at a time.

The source code for this post can be found at github.

What we already have

Step 0. Add more VMs

Step 1. Teach Ansible to trust

Step 2. Add new hosts to inventory file

Step 3. Adapt playbook for multiple roles

Step 3.1 Install Consul services

Step 3.2 Configuring consul-server

Step 3.3 Configuring Consul agents

Step 3.4 Starting all consul services in all VMs

Step 3.5 Connecting the playbook to Vagrantfile

Conclusion

Share this:

You might also like

Quick intro to RabbitMQ

The mystery of “Debug adapter process has terminated unexpectedly”

Quick intro to Graphite

Leave a Reply Cancel reply