I was looking for something new to play with the other day and somehow ended up with the thing called a service mesh. Pretty interesting concept, I can tell you. Not a game changing, or world peace bringing, but still nice intellectual concept with several scenarios where it can make life much simpler. Let’s have a look.
So, what is a service mesh?
Average micro-service application will use some sort of service discovery to find its services and a network to communicate with them. While service discovery part is more or less actively maintained (e.g. via Consul), network just magically works. The same application in Docker or Kubernetes becomes even more magical, as even service discovery and load balancing gets handled for us. In essence, the crucial components of many self respecting distributed application aren’t actively controlled.
But what if I want to reroute the traffic from service A to service B, which is like A, but newer? How do I monitor network latency? View requests success rate? Find services abusing the network? What if I need to retry failed request? How many times should I do that? All these are real life problems, which either end up being hardcoded into every service, or simply ignored.
In contrast, service mesh extracts service-to-service communication into separate component (or layer), so previously implicit infrastructure becomes both manageable and measurable entity. Because of that, service mesh can decide to what particular service the traffic should go, will record success/error rate, perform circuit-breaking, load balancing, requests retry, eviction of failing or slow services from load balancing pool, and all of that will be configured from one place and even have its own UI.
What’s behind the magic
At first I thought authors of the idea somehow reinvented the network, but apparently service meshes like Linkerd or Conduit work very similarly to regular proxies. Instead of making a direct call, a service will make a request to a proxy, which will decide where that call should go to (by examining HTTP “Host” header, for instance). And in order to start using service mesh I don’t even need to recompile the app and replace its hardcoded URLs. Proxy URL can go to http_proxy
environmental variable, which works almost everywhere, so application can remain as is. It’s still might sound a little abstract, so let’s have a closer look into few real service meshes – Linkerd and Conduit.
Running service mesh on bare OS
Let’s start with Linkerd, which is a service mesh that can run in Docker, k8s and bare OS. Looking into bare OS example actually will shed some light why the whole idea works. As of today, Linkerd 1.4.1 needs Java8, which I don’t want to install, so the easiest way for me to get started is to launch openjdk Docker image and download Linkerd right there.
1 2 3 4 5 6 7 |
docker run -ti --rm openjdk bash cd ~ # Download and extract Linkerd wget https://github.com/linkerd/linkerd/releases/download/1.4.1/linkerd-1.4.1.tgz && \ tar -xzf linkerd-1.4.1.tgz &&\ rm linkerd-1.4.1.tgz && \ cd linkerd-1.4.1 |
OK, so I won’t go into details of Linkerd configuration, mainly because I don’t know them. However, I do know that Linkerd uses file-based service discovery, so if I wanted to register service e.g. search
, I would simply create search
file in disco
folder with service IP address and port number:
1 |
echo "172.217.1.14 80" > disco/search |
172.217...
is Google, btw. Now, in order to send a request to search
service, I’ll simply send the request directly to Linkerd, listening now at port 4140 (by default), and indicate in HTTP Host
header what service I’m looking for:
1 2 3 4 5 6 7 8 9 10 11 |
# Start Linkerd ./linkerd-1.4.1-exec config/linkerd.yaml # .... # Send HTTP request to 'search' service curl -H "Host: search" 127.0.0.1:4140 # ... # <title>Error 404 (Not Found)!!1</title> # ... # <a href=//www.google.com/><span id=logo aria-label=Google></span></a> # ... |
And voila! Google responded with 404, most likely because it doesn’t know who search
service is. But Linkerd did successfully proxied the request to another service, while the issuer thought it’s talking to a real thing. Moreover, because Linkerd understands HTTP(S) and its status codes, most most likely it logged that request as not successful because of 404.
Running Linkerd in Kubernetes
Checking out examples in Kubernetes usually looks more impressive, as due to containers nature we can bring huge blocks of ready to use functionality in one command. I’ll use minikube for running local k8s cluster. There was an older post describing how to get it up and running, so let’s assume it’s done and move straight to interesting part.
1 2 |
minikube start # Starting local Kubernetes v1.10.0 cluster... |
Linkerd on k8s works as a DaemonSet, meaning it runs as exactly one pod per host. There’s ready to use configuration file, which we can feed directly to kubectl
and get it all installed:
1 |
kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/linkerd.yml |
Viewing admin pages and statistics
It might take a little while until Linkerd becomes ready, but once it’s done, we can go straight to its admin page:
1 2 |
echo "Admin page address: http://$(minikube ip):$(kubectl get svc l5d -o 'jsonpath={.spec.ports[2].nodePort}')" # Admin page address: http://192.168.99.100:32681 |
We also can get Grafana based visualizer and enjoy its glorious emptiness, as there’re no client services talking to each other and therefore no statistics to observe.
1 2 3 4 5 6 |
# Install visualization app kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd-viz/master/k8s/linkerd-viz.yml # Get its address echo "Grafana page address: http://$(minikube ip):$(kubectl get svc linkerd-viz -o 'jsonpath={.spec.ports[0].nodePort}')" # Grafana page address: http://192.168.99.100:30063 |
And after a while:
Adding some demo services
However, with some client services these pages will make more sense.
Distributed apps have their own equivalent of “hello-world”, and one is even adapted for k8s. It’s a 2 pod application – pod hello
and pod world
. Whenever user sends a message to hello
service, the service makes a sub call to world
and then they both respond with “hello world”.
In order to make them talk through Linkerd we need to do a simple thing: set up a http_proxy
environmental variable pointing to proxy service and the job is done. hello-world.yml
, hosted among other Linkerd examples, does exactly that: in addition to 2 application pods it sets http_proxy
env variable, which simply points to current host (via host name), where Linkerd listens to incoming connections:
1 2 3 4 5 6 7 |
curl -s https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/hello-world.yml | vim - # env: # ... # - name: http_proxy # - value: $(NODE_NAME):4140 # ... |
But here’s the problem. For Kubernetes running on minikube, host name will resolve to minikube
, which makes zero sense from inside of the cluster itself. It would make more sense to use node IP address instead and it’s actually fairly easy to do. We just need to replace hello-world.yml
‘s spec.nodeName
with status.hostIP
, and problem will be solved. Those values come from k8s Downward API and that’s beyond the scope of this post.
1 2 3 4 5 |
curl -s https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/hello-world.yml | sed s/spec.nodeName/status.hostIP/ | kubectl apply -f- # replicationcontroller "hello" created # service "hello" unchanged # replicationcontroller "world-v1" created # service "world-v1" created |
Now, let’s try to ‘run’ hello-world:
1 2 |
http_proxy=$(minikube ip):4140 curl -s http://hello # Hello (172.17.0.6) world (172.17.0.10)!! |
And see, how it messes with statistics:
Conduit’s approach to service meshes
There’s another service mesh that works with and designed specifically for Kubernetes – Conduit. It’s in early versions, so I’d probably avoid using it in production, but the interesting part is they use slightly different approach for deploying a service mesh. Instead of adding service proxy to every host, Conduit adds a sidecar container to every pod. In some way it’s not injecting service mesh to a host, but rather attaching individual pods to a service mesh.
Other than that, Conduit also has web UI dashboard and even CLI interface for checking current stats and viewing HTTP logs in real time. As this post already turns out to be quite long, I won’t go into the details right now, but probably will do so in two weeks – in the next post. After all, it’s interesting to see how exactly their service mesh works.
Conclusion
So that’s the service meshes. My world didn’t shatter after I got to know them, as most of mesh functionality was already available in one form or another: nginx for reverse proxy (+lua extension for dynamic routing), Consul for service discovery and health check, etc. But I do like the idea that something so invisible and implicit, like a network, can and should be made more manageable, turned into a component. What’s more, I actually can see a real life task at my job where a service mesh could help. We need a dynamic proxy and nginx+lua, HA Proxy or something built in house were the choices for now. Service mesh brings another one.