Kubernetes is fast becoming an industry standard, with up to 94% of organizations deploying their services and applications on the container orchestration platform, according to a study. One of the key reasons why companies deploy Kubernetes is the standardization, which allows power users to see a performance increase of up to two times.
Standardizing Kubernetes gives organizations the ability to deploy any workload, anywhere. But something was missing: the technology assumed that workloads were ephemeral, meaning that only stateless workloads could be safely deployed in Kubernetes. However, the community has recently shifted the paradigm and brought features like StatefulSets and Storage Classes that make it possible to consume data in Kubernetes.
While running stateful workloads on Kubernetes is possible, it is still challenging. In this article, I provide ways to make this happen and why it’s worth it.
Do it progressively
Kubernetes is about to become as popular as Linux and the de facto way to run any application, anywhere, in a distributed fashion. Using Kubernetes involves learning a lot of technical concepts and vocabulary. For example, newcomers may struggle with Kubernetes’ many logical units such as containers, pods, nodes, and clusters.
If you’re not already using Kubernetes in a production environment, don’t jump right into data workloads. Instead, start by moving stateless apps to avoid losing data when things go awry.
If you can’t find an operator that suits your needs, don’t worry because most of them are open source.
Understand the limitations and specifics
After familiarizing yourself with the general concepts of Kubernetes, dive into the specifics of state concepts. For example, because applications may have different storage needs, such as performance or capacity requirements, you must provision the right underlying storage system.
What the industry commonly refers to as storage “profiles” are called storage classes in Kubernetes. They provide a way to describe the different types of classes that a Kubernetes cluster has access to. Storage classes can have different quality of service levels, such as I/O operations per second per GiB, backup policies, or arbitrary policies such as binding modes and allowed topologies.
Another critical component to understand is the StatefulSet. This is the Kubernetes API object used to manage stateful applications and offers key features such as:
- Robust, unique network identifiers that allow you to keep track of the volume and detach and reattach it as you wish;
- Stable, persistent storage so your data is safe;
- Neat, elegant deployment and scaling required for many Day 2 operations.
Although StatefulSet is a successful replacement for the infamous PetSet (now deprecated), it is still imperfect and has limitations. For example, the StatefulSet controller has no built-in support for volume resizing (PVC). — which is a major challenge if your application’s dataset size is about to grow beyond your current allocated storage capacity. There is workaroundsbut such limitations should be understood well in advance so that the engineering team knows how to deal with them.