Amazon Prime has made a major change, moving from microservices to a monolith application and cutting costs by 90%.
Distributed microservices are a popular option these days, but they’re not the ideal solution for every situation, as Amazon Prime’s Marcin Kolny points out:
We designed our initial solution as a distributed system using serverless components (for example, AWS Step Functions or AWS Lambda), which was a good choice for building the service quickly. In theory, this would allow us to scale each service component independently. However, the way we used some components caused us to hit a hard scaling limit at around 5% of the expected load. Also, the overall cost of all the building blocks was too high to accept the solution at a large scale.
Ultimately, after looking at different options, the team decided to rearchitect its infrastructure:
We realized that distributed approach wasn’t bringing a lot of benefits in our specific use case, so we packed all of the components into a single process. This eliminated the need for the S3 bucket as the intermediate storage for video frames because our data transfer now happened in the memory. We also implemented orchestration that controls components within a single instance.
As a result of the change, the team saw significant benefits with the new approach:
Moving our service to a monolith reduced our infrastructure cost by over 90%. It also increased our scaling capabilities. Today, we’re able to handle thousands of streams and we still have capacity to scale the service even further. Moving the solution to Amazon EC2 and Amazon ECS also allowed us to use the Amazon EC2 compute saving plans that will help drive costs down even further.
It’s interesting to see an Amazon division speak so openly about microservices not delivering the needed performance, especially since AWS promotes serverless solutions as much as they do.
As Adrian Cockcroft point out, however, Amazon Prime’s example is not a general indictment of serverless architecture, but proof that different needs have different solutions:
I don’t advocate “Serverless Only”, and I recommended that if you need sustained high traffic, low latency and higher efficiency, then you should re-implement your rapid prototype as a continuously running autoscaled container, as part of a larger serverless event driven architecture, which is what they did.