Microservice challenges

learning curvesEverybody talks how good microservices are, but it’s less popular subject what challenges it brings. As any other tool it is good at solving one kind of problems, sucks at others, and comes at a price.

In case you’ve forgotten what microservices are, it’s an application building pattern that sees an application as a set of small independent services, which communicate with each other via some lightweight protocol.


What you need to remember before joining the microservice camp

It’s difficult to choose service boundaries.

Choosing where responsibility of one module ends and another one starts never was easy. But in case of independent services the cost of mistake is much, much higher. Assume you realized that some three individual services should be joined into just two. Now, you need to change the services, their public API, and also all the services that used to communicate to these three. What’s more, all of them should be updated simultaneously.

Microservice architecture requires more resources.

Moving to distributed app means that modules previously communicating within the same process now have to talk over the network. This is slow. Putting services into containers, while not significantly affecting performance, still adds an overhead. As each service should has its own data storage, data duplication becomes an issue. For instance, it’s not practical to allow reporting service to request its data from, let’s stay, payments service. Ideally it should have local copy of payments data. This is how you get both storage overhead, as well as the headache of keeping data replica consistent.

Distributed app is harder to launch.

It’s not single EXE anymore. Starting distributed app means launching many services, which have own settings, requirements and inter-dependences. Most likely separate tool will be built and/or involved only to make application start possible.

It’s harder to tell, if distributed app is running or not.

It’s OK for some services of an app to be temporarily down. It still should work. But how many of them should be down before we could say something’s not right? How do we detect that some service is down anyway? If we’re talking about something small like a service that sends out “your order has been shipped” emails, it might be down for weeks, and nobody will notice.

In order to get these questions answered, services should be equipped with monitoring and logging tools. Something that was good-to-have in monolithic app suddenly became crucial.

Distributed transactions are ‘fun’.

Certain things that just used to work now become a quest. Here’s an example. When customer clicks “Place an order”, three things happen: subtract order amount from user’s balance, add new record to “orders” table and reduce number of goods available in “warehouse” table. This is usually done in some sort of transaction:

When accounts manager, orders manager and warehouse are three separate services, which is often a logical thing to do, performing distributed transaction becomes The Task. For instance, we sent three HTTP requests to corresponding services to do their part. Two returned “Success”, but the third one – warehouse – didn’t return at all. We obviously should rollback changes in account and orders via another HTTP calls (hopefully, they will succeed this time), but how warehouse service did fail? Was it before it performed the update, or after – when it was responding with success? Or was it failure? Out of nowhere, our data is potentially inconsistent, with no simple way to fix it.

Epic fail happens.

It’s usually referred as catastrophic failover. Let’s assume some services are talking to each other via message queue, which gets malformed message. The first service receives it and dies. Because it didn’t reported back to MQ, it assumes message wasn’t processed and sends it to next guy in the line. And next one. When there’re no subscribers left, MQ dies of messages overflow and takes message publishers with it, as they now have no one to talk to.

That’s hypothetical scenario and reality will differ, but what’s interesting how would you to track down the first event that caused devastation chain? Without logging and monitoring tools that’s hardly doable.

Microservies require learning of new tools.

Containers, virtualization, monitoring, logging, services orchestration, network and firewall management, distributed application patterns – it won’t learn itself.

New security concerns.

Monolithic application security, while not being trivial, still has limited amount of attack vectors and more or less established ways of fighting them. What about dozens of small services with their public APIs? Here’s the simplest security concern: who’s allowed to call methods of, e.g. payments processor service?

We can choose to keep its methods unprotected and put the service into the network where only trusted services reside. But is it safe? Attacker might pass through the firewall or get access to one of the neighbor services and get free pass to payments processor.

Alternatively, if we choose to pass some sort of token to payments processor methods as way of authentication, how do we validate the token and check which actions it actually authorizes? Payments processor is a microservice, so it doesn’t have much of a knowledge about users and roles. We can bring along a copy of users table along with their permissions, but that means overhead and possible inconsistency. Relying on another service to validate permissions introduces tighter coupling and new point of failure. It’s not easy to do the right choice, even if there’s one.


Doing microservices is hard, but this doesn’t mean they should be avoided. If you are Netflix, or your application should be highly resilient and scalable, then you might have no other choice but to adopt this pattern. On the other hand, if you’re building another home page for cat video fan club, you’ll be fine with traditional monolithic approach. It’s a tool for certain jobs and it’s not coming for free.

Leave a Reply

Your email address will not be published. Required fields are marked *