The 3Vs of Big Dataare not only revolutionizing our lives and helping organizations in business transformation but also helping us to make a paradigm shift toward making better decision making and taking timely appropriate action through implementation of Real Time Data Analytics.
Real Time Data Analytics is also influencing organizations relying on local or remote hardware infrastructure for their daily business needs. It will not be wrong to say that today, almost every organization has a dedicated team for managing network.
Real time network monitoring technologies are becoming an integral part of network infrastructure management. These technologies are just not allowing us to ensure that all business processes can be completed in an efficient fashion, but also help in real-time monitoring of data security threats.
System/Application logs plays a pivotal role in work of a system/network administrator. An administrator usually navigates between different log files if backtracking for event is required in case of failures, hence need for proper log management. However it takes a lot of manpower to perform time consuming analysis on these log files.
A few tools are available that could provide a real-time cost effective solution to this challenge, and also act as central infrastructure monitoring system and log repository for historical analysis.
Considering the challenges above, the question arises as to, what should we expect from a centralized monitoring solution?
Centralized Monitoring system expected feature list:
- Capable of combining data from multiple sources and different formats
- Capable enough to process GBs of logs daily and support Real-Time or Near Real Time Data analysis and dashboard.
- Support correlation, aggregation, and filtering of data on-the-fly
- Support reliable and secure data transmission and should be Fault Tolerant
- Generate custom Events/Notification.
- Should support modern responsive visualization with support of all latest browsers
Where ELK stack fits in
ELK is an end-to-end solution stack, designed to deliver actionable insights in real time from almost any type of structured and unstructured data source. It comprises of -
- Elasticsearch for deep search and data analytics
- Search & Index
- Schema free, REST & JSON based document store
- Distributed and horizontally scalable
- Based on Apache Lucene
- Logstash for centralized logging, log enrichment, and parsing
- Access Log files without system access
- Manage Events & Logs
- Collect, Parse, enrich, store / forward data
- Multiple input & output options, alerts
- Sensitive data security via SSL
- Kibana for powerful and beautiful data visualizations
- Visualize data through multiple chart format
- See Real Time Update to data
- Custom chart & Dashboard
- Time based comparison
- Powerful Search Index
How an ELK stack solution should look
An ELK solution can be a combination of multiple component at each level. The component level segregation aligns to industry standards. The design includes features like extendible components, fault –tolerance, secure, cloud enabled. Its key components include -
- Logs Forwarder
- Logs Shipper
- Logs Indexer
Fig. 1: Reference Architecture for ELK Stack Solution
Logs Forwarder: is responsible for logs indexing, parsing, and forwarding to a centralized log management system.
Logs Shipper: it’s a TCP port listener and listens to port communication. It’s responsible for aggregating all the requests and transferring it to another port.
Queue/Cache: acts as a buffer. It’s a fault tolerance mechanism introduced to handle enormous requests from different systems. Each logs-shipper instance could be mapped with a Queuing system instance.
Logs Indexer: the agent draws the log event from our queue instance, indexes it, and sends the indexed event to ElasticSearch.
ElasticSearch: for storage and querying of logs. Offers near real-time solutions for both quantitative and qualitative data.
Kibana: for visualization and the creation of diagnostic dashboards.
Sample ELK Stack dashboard
- Assists the server admin to monitor the server’s key health parameters.
- Provides information about computer performance and running processes and CPU usage. It also tracks memory information, network activity, and statistics.
- Assists the administrator to monitor performance of set of application that need to monitor. e.g.:- SQL Server performance on Database Server
- Allow Query, Filtering, and time-based filtering capability.
- Threshold notification when certain key parameter goes beyond certain level e.g.: %CPU usage more than 90%
Fig. 2: Real Time Performance Monitoring Dashboard
Windows Event Log Dashboard:
- The event log collects events from Application, System, and Security. Additional event logging too can be configured.
- The dashboard provides a way to view the event log items, and help you visualize activity and incidents in log files.
- Displays different chart based on sources, security severity level, or Audit policy
- Alerts when a specific event or set of events occur on the system. E.g.: A custom application throwing exception regularly.
- Allows Query, Filtering, and time-based filtering capability.
Fig. 3: Real Time Windows Event Monitoring Dashboard