Kubernetes has become one of the most preferred container orchestration tools. Central Platform Teams across organizations should start providing Kubernetes as a Service. Just a plain K8s cluster might fulfil basic needs of the internal teams, but how can a platform team start for enterprise grade?
Kubernetes enables organizations to be cost effective while offering a versatile container orchestration solution.
This blog discusses a few primary aspects and supporting open-source tools that a platform team can consider, while keeping their costs at bay.
Single vs Multiple Tenancy Cluster:
Do we have a best practice on how to structure the infra? Well, we do not have a definite solution, as it depends on multiple factors such as the flexibility of teams, scale of adoption, central infra vs. distributed infra teams etc.
Teams can use following strategies:
- Single cluster, divide it using namespaces.
- Separate clusters that are independent with their own resources.
- Hybrid approach.
As a platform team you could start with a cluster, having a bare minimum of three nodes and allocate namespaces (logical partitioning in a cluster) for each team. Each team can have three different namespaces for dev, staging and pre-prod. But for production, its always recommended to have separate clusters.
But with multiple teams using the same cluster (different namespaces), how are the resources such, as CPU and memory, distributed fairly and used judiciously? K8s offers two settings:
- Requests – These are what the CPU and memory allocation a container is guaranteed to get, based on a defined value.
- Limits – These makes sure a container does not go above a defined value of CPU/memory usage.
Based on the infrastructure configuration, the platform team should create maximum permissible quotas for each team intended to use the cluster.
For simplicity purposes, let us say you have a cluster with one node of the following configuration:
- 4 CPU core
- 16 GB memory
To begin with, you could equally divide the resources between 4 teams i.e; 1 CPU core and 4 GB memory for each team. As you progress, you could add more nodes as and when you onboard more teams.
For this purpose, K8s offers:
Using ResourceQuota helps you to lock down a namespace within specified requests and quotas.
Let us look at a sample quota:
Authentication and Authorization
To access a cluster all one needs is a valid kubeconfig file. That is the only way for your developers to interact with the cluster, but if this file is compromised then your whole cluster is open to vulnerability. Enabling Role Based Access Control (RBAC) on a cluster is a must.
Authentication can be managed by integrating clusters to an organization’s IAM system like LDAP/AD. This helps centralize the access management and also helps to achieve the user life cycle.
Cluster role, along with their bindings, help enable Authorization. Cluster role and its binding gives a user access in order to manage, define permissions for all namespaces or a few namespaces in the cluster, and manage cluster scoped resources.
To adjust to changing application demands, clusters often need a way to automatically scale up or down.
There are two types of scaling:
- Cluster auto scaling – Based on the overall resource usage the number of nodes can be automatically scaled up or down. To enable this feature, the underlying infrastructure should support infra autoscaling.
- Horizontal pod auto scaling - This component uses the cluster’s metrics server to monitor the resource demand of pods. If an application hits peak traffic, the number of pods are automatically scaled to meet the demand.
Container Image Registry and Security
Container Registries and K8s go hand-in-hand. Every organization should have their own private registry alternatives for docker hub and provide a mechanism to their tenants to scan docker images before pushing it to registry.
Container security is tricky. Capability of auditing, tracking Common Vulnerabilities and Exposures benchmarks established by CIS, the National Vulnerability Database, and other bodies is a must.
We recommend Clair container security. API-driven, Clair is a static container security analysis with CVE database. Clair has the following features:
- It is open source and built by CoreOS. CoreOS has created various reliable K8s operators.
- Quay.io, a public container registry substitute to Docker Hub, uses Clair.
- It’s API-driven.
So as a platform team, if you can host Clair container registries on any of your servers, all that your tenants need is a REST API against which they can scan their pre-built containers.
Snapshot of a scan using Clair:
Let's look at some of the sample outputs from a scan:
Distributed Log and Error Management
We need a mechanism to aggregate the logs produced by the containers and provide a way to present it to your tenants, so that they can perform log analytics and search the log for debugging purposes when scanning docker images.
Our suggestion is to use Open Distro built on Elasticsearch – Licenced under Apache 2.0 Open Distro, it brings in an array of features similar to that of a licenced Elastic Stack. The tool provides features like enterprise-grade security, alerting, data querying with powerful SQL language, k-NN, Performance Analyzer and Index Management.
In Kubernetes, there are multiple release and deployment strategies that a team can choose to deploy their applications and corresponding services. The release strategies can depend on multiple factors. Some of the common deployment strategies are as follows:
- Rolling updates: This is the default strategy supported by Kubernetes, where a existing service gets upgraded one after the other.
- Recreate: As part of this deployment strategy, all the pods or deployment associated with a new release will be terminated and the latest version will be deployed again.
- Blue-Green deployment: In this deployment strategy, the set of services being released are tagged as blue or green. For e.g. The set of services that will retire are tagged in blue and the new services that needs to be deployed are rolled up alongside and tagged in green. The traffic is switched from blue to green when the traffic is low, and the green segment is completely healthy.
- Canary deployments: As part of this strategy, a set of services are rolled out to only a few sets of users. The services are tested by the subset of users and the new services are rolled out completely, only after the users are satisfied by the functionality.
- A/B testing: It is like canary deployment, but has more focus on user engagement and satisfaction to decide on which version to release by using predefined metrics and stats.
Some of the open source tools that are widely used to manage deployments in Kubernetes are Spinnaker and CaaS platforms such as Rancher. Optionally, several custom solutions can be used, that integrate with the service mesh, to enable a smoother release.
Monitoring and Alerting
Monitoring the health of your Kubernetes cluster and the application deployed is inevitable. Kubernetes architecture is complex, as there are many components to it. Prompt indication of any anomalies with any of these components is helpful always.
Some of the preferred tools used for monitoring Kubernetes cluster are:
- Prometheus + Grafana
- Influx DB + Grafana
These tools consist of a time series database and a dashboard that provides interactive visualizations and alerting, thereby promoting observability.
The idea is to keep your platform cost at bay using this container orchestration solution and hence, we specifically suggested open source tools that we have seen and tried for ideal access management. There are many Caas/PaaS solutions, like OpenShift and VMware Tanzu which provide many of these aspects out of the box, but come with a cost and vendor locking.