Charity Majors, co-author of Database Reliability Engineering, shared a tweet that drives home how to approach the transition from single applications to a complex, distributed system of microservices.
@mipsytipsy. “Embrace the fact that everything is failing all the time - and it’s okay! We build for resiliency, not uptime.” Charity Majors, Twitter, https://twitter.com/mipsytipsy/status/1134499865335963648.
You can invest a lot in making any single system component reliable, performant, and scalable. The investment is often worthwhile and necessary, but what matters to users is the system’s uptime as a whole. Reliable components aren’t enough to make the entire system reliable if it’s unable to tolerate and recover from failures. Here are some thoughts about building your system for resiliency and two popular microservices patterns to cope with failure.
Building Microservices for Resiliency
Let’s say you have ten microservices, each with high uptime and low response times. One of those services handles authentication, another handles authorization, and all of the other services depend on these two. Since authentication and authorization are required to be up, each call takes the sum of those respective response times and the multiplication of those uptimes. In a pattern where your microservices are calling each other, every link in the chain slows you down, reduces effective uptime, and multiplies the risk of failure.
In a case like this, some calls can be made asynchronously to parallelize work and cut the response time down. However, since these examples are security systems, doing so would open you up to additional risks you may not want to take.
Here are two popular approaches to building a system that allows a user to get the information they need in as few hops as possible and with minimal risk.
The Projection Pattern
Projections allow you to build the data for a particular page ahead of time in the background, from as many source systems as you need. This functionality makes the call to retrieve the data for a page extremely easy and fast.
Data assembled asynchronously in the background vs. at the time of a user query changes the failure mode. If the user makes a query with one or more of the source systems down, they still get results as they would have the last time the source system was up. What might have been a complete failure in another architecture merely results in returning stale data. The system can be surprisingly efficient because the data is assembled once when it is updated, and updates are often a less frequent event than data reads
The UI Composition Pattern
UI composition is another option for building a distributed system of microservices, and perhaps a more familiar one. It has benefits if implemented carefully.
Let’s say you have two entities that reference each other. In some cases where the coupling is loose, the services responsible for those two entities can remain largely unaware of each other. Each may only know of the other by a simple identifier. In this pattern, rather than shifting the responsibility for composing multiple data sources earlier to avoid the risk of failures when responding to user actions, that responsibility is intentionally shifted later in the process. It is treated as a progressive enhancement of the data rather than a necessary element.
While the customer might not get the ideal experience in the case of a service failing, they will still get some large portion of the data they requested. For instance, consider a video streaming service. One type of recommendation might occasionally be missing or loaded later, while other types remain available.
A system that can tolerate and recover from failures is vital to building a successful distributed system of microservices. DragonSpears has architects, engineers and developers with experience building robust distributed systems .