With the emergence of new technologies and framework, there is a huge technology shift as compared to the traditional IT environment. Eventually, it led to a significant change in ‘how we manage’, ‘what we manage’, and ‘where we manage’ applications. Nowadays, most of the organizations are tending towards implementing a ‘DevOps’ framework to manage microservices and API-based applications deployed in a containerized or hybrid-cloud environment.
Organizations need a solution, which will ensure that applications run consistently when moved from one environment to another. For example, test to production, or say from physical servers to virtual machines, or private/public cloud. Usually, the problems are faced when the auxiliary solution/software requirements of the application differ.
Another thing that drives the IT industry is the optimization of resources and cost. In case of a virtualized environment, the applications get installed on the top of a virtual machine and operating system, for which CPUs and RAMs are allocated, and organizations must size the CPUs and RAMs considering the peak load that the application is expecting. However, the application may be utilizing the allocated CPU and RAM only for a small span of time and it remains the unutilized majority of the time.
Therefore, to find a remedy for the first problem and to optimize the usage of compute resources, the organization is moving toward taking an approach to containerize their applications so that the entire runtime environment of an application along with its dependencies can be packaged into a container, which will abstract the differences in operating system distribution and the underlying infrastructure.
In a containerized environment, a single server can host more containers than virtual machines and multiple containers can run multiple applications in the same server or virtual machines. Another advantage of containerized environment is, virtual machines or physical servers take several minutes to boot up their operating system. It also takes longer to start the application compared to containerized applications where containers start when they are needed and get removed when they are no longer needed. Moreover, a containerized environment also optimizes the compute resources required for an application to run.
With the advent of containers in the IT environment, it has become essential to monitor the containers to maintain the performance of the applications hosted in the container environments. However, only monitoring the performance of containerized applications may not be sufficient for efficiently managing the business service. Most of the IT environments are a mix of traditional IT environments, i.e. physical servers, monolithic applications and such along with newer technologies such as private virtualization and N-Tier Apps among others and emergent technologies including containers, hybrid cloud, microservices & APIs.
From the nature of monitoring: The traditional or current environment comprising of physical servers and virtual servers are quite different from the new, emergent technologies. The reasons are:
- The number of containers is much higher compared to the number of physical servers or virtual servers.
- A container’s life span is less compared to physical servers or virtual servers. This is because containers come and go at a faster rate.
- Overload of metrics to monitor is another reason. Metrics monitored in a container environment is usually much higher compared to physical servers or virtual servers.
- Different application technologies for different containerized environment—as an example, Red Hat OpenShift architecture – is different from the Docker Enterprise.
- Identifying container utilization and statistics for multitenant application is more complex compared to a physical or virtual server environment.
- Clustered applications might require monitoring of all instances to identify the faulty node. Nowadays, a customer environment comprises of all the three major types of deployments of applications. For example:
- Traditional Environment: Where a customer can’t migrate its legacy application from physical servers to a private cloud or containerized environment—for example, homegrown, and data-intensive applications.
- Current Environment: Where a customer can’t migrate its application to a containerized environment and still run it in its private virtualized cloud environment.
- Emerging Environment: Where a customer has built its new home-grown application and is running it in a containerized environment.
Therefore, while considering a monitoring solution for its entire IT landscape, a customer should think of a solution that can cater to the requirements of all types of IT environments running inside an organization. It could be a traditional, current, or emerging environment with the purpose of saving costs on different aspects, including trained resources to handle the monitoring environment and avoid the deployment of multiple monitoring instances for monitoring the entire IT landscape.
To ensure optimal performances of business services in a full-stack environment, which includes physical servers, virtual machines, containers, applications hosted on both containerized environment and traditional environment, we must essentially keep a vigilant eye on the following two factors:
- Determine the infrastructure topology for the full-stack environment, which is very dynamic in nature—automated, dynamic, and real-time discovery of the container environment along with the dependency mapping to derive a topology for the microservices in real-time must be done. It should discover container orchestration layer, container hosts, container images, container network, container platform, and container links, and map the relations with each other along with the virtualized environments, physical servers along with traditionally hosted applications.
- Identification of elements that impact the container application performance along with application performance for traditional environments: For example, individual containers, microservices, APIs, and user transactions for both containerized and non-containerized traditional application environments should be identified so that its performance and health can be maintained.
Image of culprit container technology running for a short life span and over utilizing the compute resource or hosts or applications memory leaks and such others. Monitor performance and system health of each component and correlate events that get generated from different components in an application infrastructure to identify the issues quickly.
Container host performance contributing toward the degradation of the business services of the container running on top of that: Monitor container hosts infrastructure and applications performance running on top of containers as well as non-container hosts./p>
Overall application performance and drill down to the root cause: Identify problems related to containers, hosts, network, storage, and automatically locate all the dependencies throughout the application. It should provide full-stack monitoring of web applications down to the code level to database statements for the entire platform and correlate it to find out the root cause.
Full-stack Container Monitoring
While deciding on a vendor-specific monitoring solution, we must keep the following things into consideration:
- The framework for different containerized environments is different because of their own abstractions, such as namespaces, pods, services, and so on. Therefore, the solution must be capable of working in different environments.
- Since there is flexibility to move containers from one server to another, it becomes a challenge to correlate application degradation issues and identify what is contributing toward its degradation, whether it is the server, storage, network, or application code-level exceptions. The solution should be intelligent enough to identify the notified issues automatically and immediately.
- It becomes difficult to track container technology as its life span is small as compared to virtual machines or physical machines. The life spans become small due to application upgrades where old containers are replaced by new or due to auto-scaling, where the container gets created or deleted per service loads. Therefore, it becomes hard to track the containers to identify a root cause and the solution should be efficient enough to identify it.
- Multiple containers with the same copies of the code could be contributing toward a service to operate. It becomes a challenge to identify the cause behind the failure of the microservices. The solution should be able to identify it.
- There will be a mix of the traditional application environment, a newer application environment with current technologies, and emerging application environment. Therefore, the monitoring solution should be capable of monitoring the entire mix of environments and correlating the events across the infrastructure.
Finally, as everything boils down to the business KPIs, focusing on business metrics is very important for any organization. However, the degradation of a business KPI depends on multiple factors and all can be considered with the help of an effective solution that will monitor infrastructure, the orchestration layer in case of containerized environments, applications hosted on virtual machines, physical servers or containers, and end-user transactions for the applications. All these will affect the business KPIs of an organization.
Many product vendors like Dynatrace, AppDynamics, Sysdig, Data Dog, and Stack State offer full-stack monitoring capabilities powered by AI Ops, ChatOps, and VoiceOps. The definition of monitoring is also changing from policy-based event monitoring to pattern-based outlier and anomaly detection. In the case of policy-based monitoring, it is the failure to send an alert in many rare cases, if the policy is not configured for the same. However, with AI-Ops and pattern-based monitoring, this problem is getting taken care of by many new-edge monitoring tools.
Apart from the above-mentioned commercially licensed products available in the market, there are open-source products as well which can do containerized platform monitoring—namely, Prometheus, Grafana, and Kibana. Prometheus has exporters which are managed as a sidecar container of the main Docker container. The Prometheus server handles ingestion, processing, alerting, and storage. Grafana is the visualization engine for the processed data in Prometheus. Prometheus decides what alerts to be sent for fire, alerts are instead sent to an alert manager, which duplicates and aggregates alerts and sends notifications. Kibana is a log-management solution which collects logs from different elements in the environment and queries can be written to get meaningful insights and dashboards can also be prepared based on different requirements and organizational needs.
Furthermore, to summarize and fulfill the needs of full-stack monitoring for an organization, we should be focused on monitoring the following:
- Business and custom metrics
- Real-user monitoring
- In-container/Application monitoring
- Container resource monitoring
- Non-container monitoring
- Orchestration monitoring
- System/Infra monitoring
The full-stack monitoring solution should provide complete visibility of the entire environment, which will help identify root causes quickly and maintain the performance and health of the environment in a better way.