Top 2 Mistakes Going Serverless

Number 1: Serverless is Easy, Stateless is not

The Gist

Neither serverless nor serverful services should be managing state internally, that is up to an external service with its own independent logic.

The Solution

Your stateful services need to exist outside of your core application. Things like cron jobs should be triggered by something stable with retry and backoff functionalities like AWS CloudWatch Events which subsequently push handling to other compute like an AWS Lambda or an AWS Fargate Task depending on your needs.

The Meat

I see a lot of developers of all levels struggling with the implications that stateless backends have on the programming paradigm. A stateless backend forces developers to remove poorly implemented infrastructure without notice. Let’s look at the 2 most common scenarios that beg for stateful servers: cron jobs and websockets with sticky sessions.

Cron Jobs should never be managed by your server, you need a dedicated service

Think about how you build a cron job runner. There’s one critical piece that makes it work: an interval function that checks for jobs and runs them.

Over the surface, the implementation looks simple. Put a repeating function on an interval in your server. The problem with that approach - in the "serverful" world - is that it scales terribly and could potentially cause the server to fail permanently and bring the entire service down. That’s only the beginning. Once you make your service scale horizontally and place it in a cluster, you have multiple workers running in parallel that need to communicate with each other who is running what.

In the serverless world, the possibility for parallelism is even worse because each function executes independently and for a short amount of time. Taking the route where you build the runner into the code will not only trigger multiple independent workers but also kill them prematurely when the functions timeout is met.

This is why you have to extract that logic into a separate service. The biggest difference between doing it in a serverless function vs a server is that the solution will not scale at all from the start in serverless because the very core is completely unstable in comparison.

Websockets belong in your router, the real one

A lot of frameworks try to take over websocket control and to a great degree can do so successfully when the real router is capable of establishing sticky sessions to ensure websocket messages always reach the same member of a cluster. This is obviously impossible when your service is stateless because you cannot store any temporal data on the compute that is managing the connection. Although it may seem like it can, it’ll force functions to stay running until they time out which will eventually cause a failure on the front end.

The real router is usually some external tool that takes care of your load balancing. These load balancers have different ways of making websockets a possibility for both serverless and serverful services in a way that won’t impact the compute dedicated to your application.

many robots working together at the same time to b (1).jpg

Number 2: Databases that require connection pools

The Gist

Since your functions run with massive parallelism, a connection pool can easily fill up and become unreachable after only a handful of queries. Contrary to popular belief, the solution is definitely not scaling up your database!

The Solution

Use the right database for the job. While old school databases like MongoDB and PostgreSQL are great and comfortable, they will never scale like FaunaDB can. It really is the only database out there that you don't need to deal with managing clusters in any way.

The Meat

This issue stems off of the same issues we see coming from working in a stateless programming paradigm. Older databases like MongoDB and PostgreSQL all have their own built in protocols to communicate quickly, these protocols create stateful connection pools in-memory to manage usage so that your hardware can push a lot of signals in parallel. As we have learned, this is a huge problem in serverless because your connection pool cannot persist across function calls properly. What ends up happening instead is that each individual function call creates a 1 member connection so you quickly hit your database limits the more parallelism your functions create to the point that the database stops responding or becomes extremely slow.

MongoDB in particular has very low limits on their entry level databases (something around 100 open connections), having tested it early on I found that MongoDB was not made for highly distributed and scalable serverless systems at all despite this ridiculous way of trying to do it. The reality is that you would need your connection pool shared through the context of your firing function and that is simply not available in common serverless functions.

This doesn't mean there aren't workarounds. MongoDB specifically has created MongoDB Atlas with their own stateful-ish serverless environment and http protocol. Ultimately you can also create a stateful proxy server to hold your connection pool and receive messages from your serverless application via the HTTP protocol (which AWS RDS makes relatively painless to work with).

Still, I wouldn't go with either of those old databases, it's all too much of a hassle. The number 1 database for serverless for me is FaunaDB because it has native support for HTTP, is itself highly distributed, has support for attribute-based access controls (so you might not even need a backend), and has a pay-per-use model to perfectly align with what serverless is all about.