Autoscaling build servers with Gitlab CI

autoscaling builds

I’ve been using Gitlab CI for a while now and until certain point it worked really well. We had three build servers (GitLab runners) in the beginning, and when number of teammates or build steps and therefore commits and build jobs increased, I’d just add one more server to handle an extra load and felt that problem was solved.

Not for long. When number of servers climbed to more than ten, it became obvious that simply adding servers one by one doesn’t work anymore. It was both expensive to have all of them running all the time and it still wasn’t enough to handle occasional spikes of commits. Not to mention that during the nights and weekends those servers were doing absolutely nothing.

The whole thing needs to be dynamic and fortunately GitLab CI supports autoscaling out of the box. Documentation is a little bit confusing but in reality it’s very easy to get started. So here’s the plan: let’s try it!

Word of warning though: I’ll skip an introduction of what is GitLab, GitLab runner and even Docker – they’ve been discussed in previous posts.

How GitLab CI’s autoscaling works

The idea is very simple. We already used GitLab runners that compiled TypeScript project directly at the host they were installed by using shell executors. However, we also could’ve used docker executor, which would put the code into a Docker container and compile it there. Being able to use Docker for builds, it’s just a tiny step to start using docker-machine utility which would spin up a new VM with Docker installed on it and perform the build remotely. When it’s done we could safely use docker-machine again to remove that temporary host and wait for another build to come. GitLab knows how to do that automatically and has docker+machine executor for that.

Tooling

GitLab, as well as Docker and docker-machine can be installed virtually anywhere, but today I’ll use good old Mac with Vagrant, docker-machine and VirtualBox on it. We’re also going to need some demo project to pass through build pipeline and I think .NET Core’s default “Hello World” console app is perfect for that.

Setting up GitLab and dev environment

I’m going to rush through this part, because GitLab installation was already covered in my previous post. That time we hosted GitLab server in Docker container, but today we’re going to promote it to its own virtual machine.

Configure Virtual Machine

This relatively simple Vagrantfile with slightly less simple provision.sh  should (will) create a new VM with GitLab, .NET Core 2.0 SDK and Docker on it:

vagrant up will take more time than usual, but few minutes later we’ll be able to use “192.168.33.10” IP to navigate to GitLab server:

GitLab - initial page

I’ll skip the part where we enter initial root password, login and create a project (I called mine “console-app”) – previous article already covered that. So assuming you’ve done that, let’s create a .NET Core console app for our newly created GitLab project. After all, autoscaled build servers will need something to build.

Creating .NET Core console app

That’s going to be simple. As we installed .NET Core SDK inside of our VM, we can get in there and use that SDK to create ready-to-use “Hello World” app.

After that we’ll simply git init . and commit it, add our newly created GitLab server as origin and happily push it in there:

Configure build steps

For build steps we’ll have something simple like compiling the project in Debug and Release configurations. As usual, we’ll put build steps definitions into .gitlab-ci.yml file.

If we did everything right and pushed .gitlab-ci.yml to origin, project’s “Pipeline” page will show pending build, which will remain pending until we add some runners to build it.

pending build

Configuring Docker runner

The simplest way to get autoscaling “docker+machine” runner is to start with “docker” runner instead. If we find the right Docker image to build our project and confirm that it works locally, there’s no reason why it won’t work remotely on dynamic VMs.

This image from Microsoft should probably be capable of building our console app – microsoft/2.0-sdk. I copy-pasted the script that installed “shell” runner before, changed a few lines and voilà, this is the thing that will build the project in Docker containers:

And as soon as script finishes, we can go back to the page with pending build, click on it and see this beauty in action:

running build

I even launched watch -n 1 sudo docker ps command to see if a new container is really created, and yes, that’s for real.

Container with the build

It takes some time for build to finish (after all, microsoft/2.0-sdk image is 1.6 GB in size), but consequent builds are much faster.

Successful build

So we confirmed that “docker” executor works. Let’s disable that thing for now (Settings -> CI/CD -> Runner settings) and create a truly scalable runner.

Configuring docker-machine runner

Previous “docker” runner could’ve been installed in the same VM as GitLab, or even at host machine – GitLab’s IP is public anyway. However, “docker-machine” runner will create new VMs, which, when happening inside of existing VM, might lead to The Matrix. For sake of simplicity and saving humanity let’s create this new runner on host machine, which in my case is Mac.

Installing gitlab-runner on Mac tricky. Probably I didn’t do it right, but commands that worked very similarly on Linux and even on Windows doesn’t perform that well here. E.g. it never put runner’s configuration file into correct directory, so I had to copy it over, and it never worked for me as a service, so I run it in user mode instead (gitlab-runner run).

As I promised, “docker-machine” runner configuration is almost identical to simple “docker”:

Isn’t that cool? It just two more settings and none of them says “recompile everything” and “create your own cloud”.

And now, the moment of truth. Hit “retry” button next to one of already finished builds at GitLab and see this magic happing in VirtualBox Manager window:

Runner VM

It created a new virtual machine specifically for this build! As soon as it finishes, that machine will be gone as well. What’s interesting, build output looks like it was produced by regular “docker” runner.

build docker-machine

For this runner we used VirtualBox provider, but docker-machine supports many others: AWS, Google Compute Engine, Azure – you name it. And it doesn’t have to be just one machine at the time. We can create hundreds of them in parallel, keep some VMs in advance, reuse already created VMs – it’s insanely flexible.

Conclusion

I’ve been looking at this autoscaling feature since the first day I started to use GitLab CI, but somehow its documentation made it look extremely complicated, so I never tried it. Maybe they rewrote the docs since then, but recently it all started to make sense and the feature itself is not hard to enable after all. You saw that, it was just a few more parameters to register command.

To be honest, I think it’s going to be a little bit harder in production. E.g. it’s not uncommon for my CI to run 80 or so concurrent builds. When all of them get into different VMs, computing power will stop being a bottleneck for sure, but network and GitLab itself might become one. Pulling docker images can be mitigated by local Docker registry, build caches can go to S3 or Google storage, but repository itself and build artifacts will still have to travel between GitLab and VMs. And I can tell you it’s a uncomfortable to imagine 80 build VMs pulling 5 GB repository from tiny GitLab server simultaneously. Especially when it’s sitting in another network.

Leave a Reply

Your email address will not be published. Required fields are marked *