Containerization has brought a new chapter in the IT industry revolutionizing the concept of virtualization, especially with Kubernetes. The paradigm in containerization is constantly shifting from managing traditional data-centric architecture to a modern new stack microservices approach to fuel business growth and IT transformation. Though various options were available in the past decade to move away from host-level virtualization to OS-level virtualization, none brought the containers to mainstream production-grade environments where customers can run their mission-critical applications with peace of mind. However, Kubernetes changed the IT industry’s perception of containerization and now numerous organizations run their mission-critical applications with enterprise-grade container orchestration systems.
Kubernetes has brought many benefits but most of the deployment are limited to stateless applications since early adoption of persistent storage wasn’t standardized. Early implementation of persistent storage was very difficult for everyone involved in planning, designing, and managing the container landscape. Initially adding support for storage required changes in Kubernetes code (read “in-tree”) which was complex and could affect the reliability and security of the entire system.
This challenge in Kubernetes was addressed by CNCF by introducing CSI drivers. This step proved effective– third-party container storage providers can now write their plugins. This gave boundless opportunities for container storage providers to provide rich data services as part of their CSI driver development. But CSI drivers were only a step closer in solving the real-world challenges:
Lack of automated provisioning and failover with slow traditional storage provisioning negates container deployment speeds.
- Key levers driving interest in stateful applications includes operational consistency, containerization of long running and data-rich applications, data sharing requirements, and delivery of database services.
- Multiple instance of stateful applications in orchestrated containers sharing underlying container storage requires careful management and robust processes.
- Evaluation of approaches based upon different deployment and operational characteristics, both for on-premises and cloud, requires careful consideration.
Based on the above, customer challenges can be broadly divided into:
Rapid application changes - Application architecture is evolving from monolithic, scale-up apps in the 80s and 90s, to virtualized applications in the 2000s, and now to scale-out microservices-based applications, which are typically built on containers.
Container demand scaling across arrays - Containerized environments are highly fluid and rapidly scale to thousands of containers that can push the boundaries of any single storage solution or system.
Lack of simplified automated provisioning - Provisioning the decision for each storage request in real time requires accessing multiple factors like performance load, the capacity and health of arrays, and policy tags.
Transparent Recovery - Ensuring the robustness of services, data access integrity, and enterprise-grade resiliency.
Currently available storage solutions can be broadly classified into five categories:
- Local host storage solution – Storage hosted on the server where the container is running
- Storage appliance – SAN, NAS, or Hyperconverged storage system with container storage driver
- Distributed file system – A shared file system derived from a single storage pool/namespace across the container cluster
- Container-native storage – Software-defined storage providing data management and protection features exclusively for containerized applications
- Cloud block/file storage – Infrastructure as a service (IaaS) platform for a cloud storage service
Based on the stateful application design and deploy requirements, such as the choice between rehosting existing apps or developing new cloud-native apps, of app clustering, of versatile use cases, of data management/protection, on-premises or cloud locations would dictate the choice of any of the above storage for containers.
Furthermore, developers already demand not just SQL/NoSQL databases as the prime use case of persistent storage. Today, IT demands analytics, high-performance computing, content management, continuous integration/continuous delivery (CI/CD), data processing, and machine learning to be the proven solutions ready to be deployed on persistent storage.
If storage providers truly want these challenges to be addressed, they need to:
- Consider a sophisticated approach, an aggressive strategy, best practices, and a clear roadmap while developing their respective CSI drivers
- Demonstrate the right combination of performance and usability and the right characteristics that allow the smooth adoption of Kubernetes
- Fulfil requirements with read/write multiple pods and features such as snapshot, cloning, raw block topology, expansion, and more
- Develop extensive API support for private, hybrid, or all public cloud deployments of Kubernetes.
In my point of view, while architecting containerized applications especially stateful types, choosing the right storage solution will majorly contribute to the successful design. For example, in some use cases for containerized applications, attaching and detaching times of an EBS volume would break the system. Also, an NFS server can be designed as resilient by putting it behind a DNS but it would still be a single point of failure. It is of paramount importance that storage runs toe to toe with the container requirement to have a truly flexible, portable, and scalable solution by being Agile and quick to respond to container’s requests.
With container workloads becoming part of the mainstream and organizations choosing more and more scalable containerized apps, I think storage’s journey has just started and we are yet to see many new innovations in the near future.