April 21, 2014

483 Views

Demystifying Software Defined Storage (SDS)

What is Software Defined Storage (SDS)? While Software Defined Networking (SDN) has become a buzzword, SDS has been slightly behind. But it could catch up any time. Like SDN, SDS too is about separating the Data Plane (hardware delivering storage capacity) from the Control Plane (software-logic driving storage services). It is about having the software storage stack providing a full suite of storage services on commodity hardware (most often x86-Hardware with hypervisors). The services enable mobility of data between underlying persistent data placement resources. In other words, it is moving away from expensive proprietary storage hardware that employs custom ASICs and FPGAs for specialized features such as Fault-Tolerance, Dynamic Tiering, Caching, Compression and QoS.

How did this come about? The emergence of SDS can be attributed to 3 factors – general-purpose hardware becoming powerful, automation of storage decisions and the falling price of Flash memory.

The rise of the ubiquitous x86 platform. There was a time when it was faster to accomplish electronic functions through hardware (gates) than through software. Functions that were once accomplished through circuits built using programmable general-purpose ICs - Timer, Peripheral Controller, Interrupt Controller, etc., - first moved to circuits packed inside Application Specific Integrated Circuits (ASIC). The flexibility that the programmable ICs provided was then recovered through Field-Programmable Gate Arrays (FGPA). FGPA was the ultimate, providing the speed of hardware with the flexibility of software. Gone are those days. Since the time 8086 was introduced about three-and-half decades ago, Processors have become powerful several folds.

Today, x86 processors have as many as 10 CPU cores in a socket and have enough power to handle tasks that once required custom ASICs at lower cost. Intel's virtualization extension enables one to granularly schedule virtualized workloads and assign processing resources to each of them. Software-centric Storage controllers (decoupled from storage hardware) are driven by these processors to offer enterprise-grade performance. Moreover, Intel processors have been operating on a 18-month release cycle compared to 4 or 5 years for the Customer Storage arrays. When a newer generation of Intel Processors adds support for 16 PCI Express 3.0 lanes for extended scalability or support for DDR4 RAM for extended performance, SDS benefit immediately. All that is required may be a simple modification or a direct replacement or upgrade of the Processor.

Storage decisions complicated by Virtualization. Specialized Storage solutions such as Storage Area Networks and Network File Systems use Storage constructs such as RAID groups and LUNs (Logical Unit Numbers) to provide objects that could be administered. Although these proved useful, these also added a lot of complexity. With the advent of virtualization, provisioning and managing storage on a per-Virtual Machine basis meant administrators having to grapple with these constructs more than ever before.

Also, while individual units of hardware supported individual workloads earlier, now a single hypervisor supports multiple workloads. Traditional storage solutions make it difficult to provide the right services for individual workloads. Storage systems will have to be configured for data requiring the highest level of availability increasing cost. To increase performance (I/O Per Second) in these systems, more disk spindles will have to be added even when total storage capacity is not a problem. To protect component added to a single virtualized workload, coarse-grained hardware-based solutions will have to replicate hundreds of less important workloads too. In other words, Legacy storage solutions provide a scale-up architecture where entire new shelves of disks have to be deployed, even when a small amount of additional storage is needed.

SDS on the other hand, make Commodity Storage hardware take part in a scale-out architecture where each x86 node has direct attached hard disks and solid-state storage that can be leveraged by workloads on all nodes. Not just storage capacity, but also storage control logic across the cluster can be leveraged to avoid performance bottlenecks. Rather than using expensive RAID constructs, SDS systems may choose to store multiple copies of data at various locations in the cluster.

Data locality can be a key part of scaling strategy. Data is kept local to the source VM whenever possible. When workloads move to alternate hosts, instead of blasting the network fabric with a full transfer of a VM's storage blocks to another host, the VM's files are retrieved lazily from the remote node. As the VM makes storage calls, the retrieved blocks are moved to the new VM location over time. This migration happens in a natural way without impacting the network.

In a Virtual Desktop Infrastructure (VDI) environment, all desktops could boot at once, creating a boot storm. It is a period of extremely high I/O. SDS can detect when a high I/O state is triggered and automatically move the appropriate data to a higher performing storage tier.

Abstracting the storage constructs and letting software handle workload management automatically helps the administrator focus on more productive activities. The goal is to leverage storage hardware for which nothing is factory-defined. This means nothing is hard-wired and there is flexibility (late-binding). In early-binding or static-binding systems, many components and their actions and configurations, are fixed in the hardware. Here, the Hypervisor and SDS together provide a menu of services (open APIs) to discover the capabilities of the hardware, and apply the right capabilities and properties as needed, on a per-VM basis.

Fall in price of Flash Memory. Storage performance has always been behind and developments over the last several decades have been only evolutionary.  The fastest Serial Attached SCSI (SAS) cannot provide the performance demanded by today's workloads – Virtual desktops, Big Data analysis, etc., Flash storage was a major revolutionary advancement. Until recently it was too expensive but is now emerging as a popular storage with its cost plummeting. Hybrid storage arrays, all-flash arrays and SDS systems are the emerging classes of storage.

In an SDS system, fast performance tier made of flash storage can be reserved for workloads that have particularly high I/O demands. SDS systems also use Flash to create a caching layer that is used to hold write operations for a period of time, before eventually ordering data from multiple writes into the cache as a sequential I/O to the hard drives, speeding up random I/O operations.

The foundational tenets of Software Defined Storage are these – virtualization which is abstraction of underlying storage constructs, and Separation of the control plane (centralized) from the data plane (distributed). Workloads are no longer tied to individual systems. This allows administrators to assign resources to individual applications and workloads, without having to worry about the underlying hardware.

After the initial hype, the dust is settling. Hardware-based storage solutions on one hand, are losing their sheen. Some of these storage vendors are adapting to the VM-centric nature of today's data center using APIs created by hypervisor vendors. VMware has created APIs like vSphere API for Array Integration (VAAI), vSphere API for Data Protection (VADP) and vSphere API for Storage Awareness (VASA). In other words, the advancement in integrating storage with hypervisors, are software-based. The SDS vendors on the other hand are realizing that SDS is not just software, though it is software-directed. Some of them are making available a complete list of hardware compatibility. They don't support the hardware, but simply sell and support the software side of the solution. Others like Tintri and Nutanix, to a name a few, are bundling hardware along and providing complete infrastructure support. SDS is certainly here to stay.Visit this section to know more about HCL Tech's data storage solutions.

References:

1. http://en.wikipedia.org/wiki/Software_defined_storage

2. http://go.nutanix.com/DummiesBook_SDSD.html