POSIX has been the standard file system interface for Unix-based systems (which includes Linux) since its launch more than 30 years ago. Its usefulness in processing data in the user address space, or memory, has given POSIX-compliant file systems and storage a commanding presence in applications like deep learning that require significant data processing. The POSIX-compliant Lustre file system, for example, powers most supercomputers, and POSIX’s dominance continues down market too.
POSIX has its limits, though, and features like statefulness, prescriptive metadata, and strong consistency become a performance bottleneck as I/O requests (input/output) multiply and data scales, limiting the scalability of POSIX-compliant systems. That’s often an issue in deep learning, AI, and other data-intensive uses now, but as data and the need to analyze it grow exponentially, the problem has, over time, moved down market.
Enter object storage. Unlike a file system, object storage requires no hierarchical data structure. It’s a flat pool of data, with each piece of data defined by its meta data. It has no scalability limits, making it ideal for high-end storage and applications, but it has one performance limitation that POSIX doesn’t have: data requests have to go through the POSIX file system stack. POSIX gets around that problem with the mmap() function, which makes the user space an intermediary between the operating system and storage.
Recently some engineers – including longtime Enterprise Storage Forum contributor Henry Newman – took that advantage away from POSIX by creating mmap_obj(), which gives object storage systems the ability to process data in memory. With object storage’s scalability (and lower cost), the breakthrough could mean the end of POSIX’s dominance in compute-intensive environments.
The mmap_obj() developers note that one piece of work still needs to be done: there needs to be a munmap_obj() function to release data from the user space, similar to the POSIX function.
POSIX, meet object storage
Though POSIX is a helpful method of transporting applications between different operating systems, it does not have the scalability to meet the most demanding applications, and its speed declines as demands increase.
Accessing data in file system storage then becomes a challenge, especially for organizations with very large amounts of data and performance needs. Object storage is a more recent form of data storage that holds data in any shape (called objects). Development began at Seagate in 1999, based in part on previous work by RAID inventor Garth Gibson and others.
Object storage is the most scalable of the three forms of storage (file and block are the others) because it allows enormous amounts of data in any form to be stored and accessed. Data is stored in a flat pool and can be managed through the metadata of each data object. But requesting stored object data requires additional processing time because requests must go through the kernel, or operating system.
Object storage is ideal for very large data stores and is widely used in cloud computing, but until now, POSIX has had the advantage of being able to process data in the user address space. However, POSIX has had very little improvement in performance over the years, so its era of usefulness is coming to an end.
With mmap_obj(), that advantage is no more, and now object storage can also process data in the user space. As new technologies meant to speed data processing come online—chief among them Non-Volatile Memory Express (NVMe)—POSIX’s limits will become acute. [ NVMeOF]
NVMe over Fabrics (NVMeOF) is a protocol that lets NVMe devices communicate directly with each other over a network, in effect becoming a form of in-memory processing. With such high-speed data transfer and processing available, POSIX will be hard-pressed to meet these new performance demands. With flat storage pools that can scale infinitely and the ability to process data in high-performance memory, object storage now has the potential to make POSIX file systems obsolete.
As storage systems exceed billions of files, POSIX scalability and performance limitations will make object storage the preferred option.
NVMeOF and mmap change the game
Object storage can currently be used with POSIX file systems – Amazon Web Service’s S3 file system interface is a notable example – they are still subject to POSIX’s limitations.
But what if there were a much faster, more scalable way to access data in object storage?
NVMeOF is a relatively new technology in which fabrics refers to network fabric and a device connected to a computer network allows data to be transferred into that device, such as an SSD. Using the new memory mapping protocol mmap_obj() to copy object data into the NVMe device means that all the data can be temporarily stored and processed on the device without the need for POSIX. The external device connects directly to the computer system, and the CPU has a path to the data in the device: it is available almost as main memory while attached. Data stays in the SSD during computing, and actively accessing the data—particularly the metadata—becomes much faster. Low latency is long-sought in object storage and data processing. With NVMeOF, it has become readily available.
NVMe (the original version) made data available to an SSD using a flash connection (such as a drive). But NVMeOF makes that data available to entire networks. Instead of just processing data within the device attached to the computer, NVMeOF allows users to access memory over the network. Object storage and NVMeOF in data centers and data lakes means higher compute power than previously known. Data lakes are a repository of raw, unstructured data. Using object storage (rather than file storage) for data lakes gives data analysts an easier method of managing and analyzing massive amounts of data; using memory mapping and NVMeOF to quickly access that data would provide new levels of high-performance computing.
NVMeOF could also provide higher compute power for data centers. Google, IBM, and Amazon Web Services are already using cloud-based object storage. Currently, accessing object-stored data in data centers requires an application program interface (API) and input and output commands. By using NVMeOF with memory mapping, data centers can bypass a server’s operating system to process data. No intermediary interface such as a file system is needed, either. The need for a POSIX interface could be bypassed altogether with object storage by using a REST interface for applications. Data centers can bring object-stored data directly into memory for processing, creating unheard of levels of performance and scalability.
NVMeOF and memory mapping technologies, paired with object storage, will change the way data centers, data lakes and very large applications process data. Network computing power and speed will skyrocket over NVMeOF connections. Though this may have its limitations—transferring data in file and block storage to object storage, for one—it will mean new developments for data-intensive computing.
In a time where data centers and cloud infrastructure must rapidly scale to meet demands for data storage and processing, accessing object storage through memory mapping will be an unparalleled way to accelerate data center performance.