“Microservices” and “Microservice Architecture” are hot buzz words in the development community right now, but concrete examples of microservices in production are still scarce. I thought it might help to give a brief overview of how we’ve utilized microservices for our backend API at Karma over the past couple of years. It’s not exactly a “how-to,” more like a “why-to,” or a “wherefore-to,” but hopefully it will give you some insight as to whether microservices are appropriate for your application, and how you can go about using them.

Why we chose microservices

When we started building Karma, we decided to split the project into two main parts: the backend API, and the frontend application. The backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this API. Along the way we noticed that if the whole backend API is monolithic it doesn't work very well because everything gets entangled.

For example, we have users, devices, and a store. As you can imagine, a user buys a device from the store. It sounds simple enough, but when it's all one application it's easy for user-related code to end up in the store and device APIs, and pretty soon the store API is going behind the back of the device API and changing stuff (like allocating devices to users, something we at Karma are fond of doing). It becomes hard to track what does what, what touches what, and what changes what.

We could have separated the monolith into libraries, and then combined them as one API, but we saw three main problems with that approach:

  1. Scaling. When you want to scale, you have to scale the entire API at once. In the case of Karma, for instance, we need the device and user APIs to scale much faster than the store API.
  2. Versioning. With the library approach, a single dependency can hold the entire application back. For instance, the upgrade from Rails 3 to Rails 4 is a difficult one. Because all our code is spread across multiple projects, we don’t have to update everything at once. We can leave older APIs running, and upgrade them when we have meaningful changes to make.
  3. Multiple languages and frameworks. Right now we're mostly a Ruby shop, but we want to be able to experiment with new technologies and languages when they come along. We're currently playing around with Go and Clojure, and because all our services expose REST APIs, communication isn’t a problem — it’s all just HTTP in the end.

The biggest boost from microservices is programmer productivity: we don't have to keep the whole thing in our heads! It’s all about getting rid of distractions and focusing on what is happening in front of us now instead of worrying about breaking stuff somewhere else.

How we got started

We backed our way into microservices. We started out with one big app in the backend, and we split off pieces when it made sense. This has worked out great for us, because we've been able to learn along the way.

By just going ahead and building the app, we became familiar with the problem we were trying to solve, and the more familiar we were with the problem, the more obvious it was where we needed boundaries between aspects of the app. Every time we encountered something that clearly looked like it should be a separate piece, we turned it into a service.

At first, these pieces were relatively large, but as with other stories of microservice adoption, we've discovered those pieces can be smaller and smaller.

For instance, we started out with a "store" in the larger app, which did everything related to the store. Then we split off handling and shipping. Then we discovered shipping could be separate. Then we found that tracking a shipment is a different role than shipping it out in the first place. The store is now composed of three APIs: the first API processes orders, the second sends orders to the fulfillment center, the third tracks packages that are sent out by FedEx. Our next step might be to split up the order processing a lot more. We're always learning the best way to compose these things, and microservices gives us that flexibility.

Ultimately, a microservice works best when it has one, and only one, responsibility — we even wrap most of our third party dependencies to make sure we don't have to think about them in other parts of the app. It takes at most a week or two to build or rebuild a microservice, and changes to other parts of the system don't necessitate a rewrite of other parts. We have one service we wrote two years ago called "Collector" which we haven't touched since, other than occasional dependency updates.

What our architecture looks like now

There are two ways our microservices communicate with each other: HTTP requests, and a message queue.

We started out just using HTTP and Sinatra in the backend. The services passed messages through URL requests to one another. This works great for things that need to happen right now, but becomes exponentially more complicated the more services you have talking to each other.

For instance, an order comes in and then it needs to be shipped. That's straightforward enough, but what if we want to do more after the order is received? The store might need to talk to the invoice or metrics or mailer API. That packs a lot of knowledge about the entire ecosystem into the store application, and becomes difficult to work with.

So we started splitting off parts of the task into an event-based system. We use Amazon SNS (Simple Notification Service) for publishing events, and Amazon SQS (Simple Queue Service) to store the events. SNS takes a message passed to it by a service and publishes it to the appropriate queues via SQS. Microservices then can take jobs off a queue, process them, and delete them if successful. If a process fails, the message goes back to the queue so another instance of the process can take a crack at it.

When a new microservice is deployed, it includes a configuration file which describes what types of messages it wants to listen to, and what types of messages it wants to publish. We have an in-house tool we use called Fare which reads this configuration and sets up the appropriate SQS and SNS queues.

A simplified view of our platform architectureA simplified view of our platform architecture

Now, when an order comes in, an event is published saying, "An order has been placed, here are the details." The shipment app listens to the messaging system, sees an order take place, looks at the details, and says, "Okay, I need to send two boxes to this person." Any other services interested in an order happening can do whatever they need to with the event in their own queues, and the store API doesn't need to worry about it.

We still use HTTP requests when we need an immediate response, like for logins, or our coverage map. It comes down to whether a service is asking or telling. If it's asking, it probably needs an immediate response, if it's telling it probably won't need a response — just fire and forget.

Challenges we've faced

The biggest challenge with microservices is testing. With a regular web application, an end-to-end test is easy: just click somewhere on the website, and see what changes in the database. But in our case, actions and eventual results are so far from another that it's difficult to see exact cause and effect. A problem might bubble up from a chain, but where in the chain did it go wrong? It's something we still haven't solved.

Instead, we focus on making each component as good as possible, and see what happens when we put them together. We try to make each microservice fulfill a contract. "When I do this, I get this back." We take those contracts, and manually make sure they're fulfilled. The contract is implicit, however, not explicit, so we haven't figured out an automated way to test it.

The upshot of this is that we've had to build the entire app with the assumption that everything will fail at some point. The structure means problems are localized, and can't spread. One part might go down, and that influences other parts directly depending on it, but it doesn't block anything else. And thanks to queues, a broken service can pick up where it left off once it's back online.

On to the next thing

So that’s what our architecture looks like… for now. We’re always looking for ways to improve, and as you can see our path to microservices has been an evolutionary one. Not only do we continue to add functionality, but we keep revisiting different parts of the system we think we can make better. We also have a few tools we’ve built, like Fare, which we’d like to open source as soon as they’re fit for public consumption. Let us know if you’re interested!

Microservices aren’t a silver bullet, and they don’t solve everything, but they’re working great for Karma. Maybe they’re the right fit for your next project?

That guy with the beard