Containerizing Stateful Services on AWS

Containerizing stateful services, make sense?

Docker and containerization no doubly accelerates the construction of modern cloud based systems, more importantly reforms the DevOp models drastically. It has been a tremendous ascendency of container and associated microservices architecture in the past year. Containerization enables an automated and consistent approach for the DevOps, the benefits are obvious in the sense of scalability, portability, upgradability and high availability. That’s why containerization is so hot.

Docker guidelines suggested that containers should be ephemeral, but most systems need to deal with data and store them persistently. There is a large landscape in stateful service domain that are widely used by cloud systems, e.g. datastore services, messaging/streaming services, caching services, or other your own application services who intentionally manipulate persistent states (data). Containerizing the stateless services is straightforward and being more and more mature with a lot good practices. Now, It’s really the time to move the focus to more tricky part, the stateful services. This makes sense as the operations of stateful services are naturally more complicated than the stateless part, containerization essentially guides to a better pattern than a monolithic VM based stack in terms of deployment, scale and management. Stateful services in that sense is more eager to gets optimized for more scalable, portable, high available, upgradable as well as high performance. This is exactly where container based infrastructure intends to offer.

Understanding the issues on the stateful side

We know that containerizing stateful services has more challenges than stateless services. Why is that?

Because the dynamic executable container instances have to be decoupled from the persistent storage, in order to allow container orchestration framework to reschedule containers running across the cluster flexibly. This implies that data high availability and reliability has to be implemented outside of container, given the container itself has no responsibility for maintaining persistent storage and data.

For Amazon AWS users who use or plan to use ECS (Elastic Container Service) as their container framework, the EBS (Elastic Block Service) data volumes is the storage choice in most cases as it provides enough guarantee on high availability, reliability as well as sound performance. On top of that, there are still two major challenges need to be dealt with.

  • Making AWS EBS container runtime awareness

Each stateful service member node (running as a container instance) need to work with certain EBS volume(s) at runtime, so the EC2 running this container instance attaches the corresponding EBS volumes to store the persistent data before the instance is launched. When the service member node is relocated to a new EC2 host, the EBS volume(s) need to be detached from original host and re-attached to the new host.

  • Maintain cluster membership of elastic stateful container instances

Each stateful service node joins the cluster as a member. Every node is assigned a role and determined with its unique identifier across the network for cluster management. When a service node failover or re-scheduling happens, the container instance is terminated on its original EC2 host, and is relaunched on a new EC2 host in the same availability zone. The new instance should be able to restore its role and membership to join back the cluster. In the meantime, all other cluster member nodes need to update the status and accessibility of the rescheduled node.

However, there is no nature support on AWS ECS to address the above issues. Solving the problems in a common yet cost effective way is where Firecamp - an open source software solution for stateful service containerization - comes from.

How Firecamp solves the problem on top of AWS?

Firecamp is an open source software that aims to provide extensible platform and tools to accelerate the stateful services enjoying the containerization world.

While there are already multiple options for stateful service containerization, especially the hotest Kubernetes with StatefulSets, Firecamp is targeted to be more cost effective with the best stability, performance and easy of use. This means Firecamp leverages the best optimized technologies that are natively offered by the cloud infrastructure platform, and introduces minimal additional effort for managing the stateful service clusters quickly and easily.

In the meantime, Firecamp provides the abstraction and extension to make sure the solution is friendly for all major container orchestration frameworks, including AWS ECS, Kubernetes, Docker Swarm and Mesosphere. This allows a Firecamp managed cluster to be easily migrated across different cloud infrastructures.

On AWS cloud environment, Firecamp leverages the following AWS infrastructure services:

  1. AWS ECS cluster service as the basic orchestration platform. Firecamp uses ECS clustering for optimized stateful docker instance management and scale. Firecamp runtime comprises of one manager docker instance across the cluster and a small footprint plugin running on each EC2 node for monitoring and operating the stateful service instances.
  2. EBS data volume(s) for persistent storage. Firecamp manages the EBS volume provision, detach and attach to the proper EC2 hosts during the container instance initialization and rescheduling.
  3. Multiple Availability Zones and Regions for stateful service high availability setup and failover support. Firecamp deploys stateful service working nodes across multiple availability zones for the cluster setup, maintains the membership of each node across the cluster.
  4. Proper Security Group settings to isolate stateful service cluster from other network domains, with only applications for accessing the services. Bastion host is setup in the DMZ with only SSH accessibility, which operates as Firecamp management client node to talk to Firecamp managed stateful service clusters.

The following picture illustrates Firecamp architecture within an AWS VPC.

Firecamp Architecture on AWS

Stateful service operations is traditionally complex, and Firecamp tries to make this easy to use. AWS users can quickly deploy a stateful service cluster within around 25-30 minutes time with one click Firecamp cluster setup followed with a few CLI commands for real stateful service up running.

Here is a high level steps to show how easy Firecamp can be used to deploy a stateful service cluster on AWS.

  1. A CloudFormation templates is provided for Firecamp management cluster deployment through AWS Quickstart (https://aws.amazon.com/quickstart/architecture/cloudstax-firecamp/ ). This provides a one-click setup of Firecamp managed cluster on either a brand-new VPC or your existing VPC into your account space. The deployment is end up with a ready ECS auto-scaling clusters with Firecamp management service running for further stateful service deployment.
  2. A CLI user interface is exposed from Firecamp management service, which provides a set commands for the real stateful services deployment and management. Users can SSH login to the Bastion host to use the CLI interface to deploy and manage the stateful service.

Besides launching the Firecamp platform itself, the project is also providing the best practiced cluster configurations for each support service catalog with Firecamp to further simplify the usage on the popular stateful service software. A list of stateful services are already supported for deployment with Firecamp CLI, which includes:

Cassandra, Redis, ZooKeeper, MongoDB, PostgreSQL, Kafka, ElasticSearch/LogStash/Kibana, Kafka Connector for ElasticSearch.

These stateful services will also be wrapped for direct deployment through AWS Quickstart one after another to further reduce the adoption gap for AWS users.

Now, Set the “fire camp” up!

Containerization is “boiling” the cloud systems evolution, and the “fire” is reaching stateful services.  Firecamp, the open source solution, is the most easy to use and cost saving approach that allows AWS users to achieve this with your full control. Start to planning your stateful services containerization? Let’s set this fire camp up!