Playing with a service mesh

I was looking for something new to play with the other day and somehow ended up with the thing called a service mesh. Pretty interesting concept, I can tell you. Not a game changing, or world peace bringing, but still nice intellectual concept with several scenarios where it can make life much simpler. Let’s have a look.

So, what is a service mesh?

Average micro-service application will use some sort of service discovery to find its services and a network to communicate with them. While service discovery part is more or less actively maintained (e.g. via Consul), network just magically works. The same application in Docker or Kubernetes becomes even more magical, as even service discovery and load balancing gets handled for us. In essence, the crucial components of many self respecting distributed application aren’t actively controlled.

Imaginary distributed app where components are talking directly to each other
Imaginary distributed app where components talk directly to each other

But what if I want to reroute the traffic from service A to service B, which is like A, but newer? How do I monitor network latency? View requests success rate? Find services abusing the network? What if I need to retry failed request? How many times should I do that? All these are real life problems, which either end up being hardcoded into every service, or simply ignored.

In contrast, service mesh extracts service-to-service communication into separate component (or layer), so previously implicit infrastructure becomes both manageable and measurable entity. Because of that, service mesh can decide to what particular service the traffic should go, will record success/error rate, perform circuit-breaking, load balancing, requests retry, eviction of failing or slow services from load balancing pool, and all of that will be configured from one place and even have its own UI.

What’s behind the magic

At first I thought authors of the idea somehow reinvented the network, but apparently service meshes like Linkerd or Conduit work very similarly to regular proxies. Instead of making a direct call, a service will make a request to a proxy, which will decide where that call should go to (by examining HTTP “Host” header, for instance). And in order to start using service mesh I don’t even need to recompile the app and replace its hardcoded URLs. Proxy URL can go to http_proxy environmental variable, which works almost everywhere, so application can remain as is. It’s still might sound a little abstract, so let’s have a closer look into few real service meshes – Linkerd and Conduit.

Running service mesh on bare OS

Let’s start with Linkerd, which is a service mesh that can run in Docker, k8s and bare OS. Looking into bare OS example actually will shed some light why the whole idea works. As of today, Linkerd 1.4.1 needs Java8, which I don’t want to install, so the easiest way for me to get started is to launch openjdk Docker image and download Linkerd right there.

OK, so I won’t go into details of Linkerd configuration, mainly because I don’t know them. However, I do know that Linkerd uses file-based service discovery, so if I wanted to register service e.g. search, I would simply create search file in disco folder with service IP address and port number:

172.217... is Google, btw. Now, in order to send a request to search  service, I’ll simply send the request directly to Linkerd, listening now at port 4140 (by default), and indicate in HTTP Host header what service I’m looking for:

And voila! Google responded with 404, most likely because it doesn’t know who search service is. But Linkerd did successfully proxied the request to another service, while the issuer thought it’s talking to a real thing. Moreover, because Linkerd understands HTTP(S) and its status codes, most most likely it logged that request as not successful because of 404.

Imaginary distributed app where services communicate via proxy
Imaginary distributed app where components communicate via proxy

Running Linkerd in Kubernetes

Checking out examples in Kubernetes usually looks more impressive, as due to containers nature we can bring huge blocks of ready to use functionality in one command. I’ll use minikube for running local k8s cluster. There was an older post describing how to get it up and running, so let’s assume it’s done and move straight to interesting part.

Linkerd on k8s works as a DaemonSet, meaning it runs as exactly one pod per host. There’s ready to use configuration file, which we can feed directly to kubectl and get it all installed:

Service mesh node per host
Imaginary distributed app with service mesh node per host

Viewing admin pages and statistics

It might take a little while until Linkerd becomes ready, but once it’s done, we can go straight to its admin page:

linkerd-admin

We also can get Grafana based visualizer and enjoy its glorious emptiness, as there’re no client services talking to each other and therefore no statistics to observe.

And after a while:

grafana-viz

Adding some demo services

However, with some client services these pages will make more sense.

Distributed apps have their own equivalent of “hello-world”, and one is even adapted for k8s. It’s a 2 pod application – pod hello and pod world. Whenever user sends a message to hello service, the service makes a sub call to world and then they both respond with “hello world”.

In order to make them talk through Linkerd we need to do a simple thing: set up a http_proxy environmental variable pointing to proxy service and the job is done. hello-world.yml, hosted among other Linkerd examples, does exactly that: in addition to 2 application pods it sets http_proxy  env variable, which simply points to current host (via host name), where Linkerd listens to incoming connections:

But here’s the problem. For Kubernetes running on minikube, host name will resolve to minikube, which makes zero sense from inside of the cluster itself. It would make more sense to use node IP address instead and it’s actually fairly easy to do. We just need to replace hello-world.yml‘s spec.nodeName with  status.hostIP, and problem will be solved. Those values come from k8s Downward API and that’s beyond the scope of this post.

Now, let’s try to ‘run’ hello-world:

And see, how it messes with statistics:

linkerd-admin-data

Conduit’s approach to service meshes

There’s another service mesh that works with and designed specifically for Kubernetes – Conduit. It’s in early versions, so I’d probably avoid using it in production, but the interesting part is they use slightly different approach for deploying a service mesh. Instead of adding service proxy to every host, Conduit adds a sidecar container to every pod. In some way it’s not injecting service mesh to a host, but rather attaching individual pods to a service mesh.

Imaginary distributed app with services plugged into the service mesh
Imaginary distributed app with services plugged into the service mesh

Other than that, Conduit also has web UI dashboard and even CLI interface for checking current stats and viewing HTTP logs in real time. As this post already turns out to be quite long, I won’t go into the details right now, but probably will do so in two weeks – in the next post. After all, it’s interesting to see how exactly their service mesh works.

Conclusion

So that’s the service meshes. My world didn’t shatter after I got to know them, as most of mesh functionality was already available in one form or another: nginx for reverse proxy (+lua extension for dynamic routing), Consul for service discovery and health check, etc. But I do like the idea that something so invisible and implicit, like a network, can and should be made more manageable, turned into a component. What’s more, I actually can see a real life task at my job where a service mesh could help. We need a dynamic proxy and nginx+lua, HA Proxy or something built in house were the choices for now. Service mesh brings another one.

Leave a Reply

Your email address will not be published. Required fields are marked *