One of the fundamental and open problems in computer science is effective data storage. Unfortunately, magnetic and flash storage devices alone have proven to be unreliable to guarantee data availability and survivability, due to their frequent and unpredictable failures. Replication has been the prominent technique to prevent data loss. By replication, copies of the data are kept in multiple storage devices so, in case of device failure, data may be retrieved from a healthy data replica. Although replication solves the hurdle of data survivability, it introduces a new and more complex challenge: data consistency. Data on each replica needs to be identical for any storage failure to be tolerable. But, how can we make sure that changes made on one of the replicas are going to be propagated to the rest of the replicas? Traditional solutions involve RAID (redundant arrays of inexpensive disks) controllers. A number of storage devices are connected to a RAID controller and any updates to the data are propagated from the controller to all the connected devices. Although RAID handles data consistency, it is usually managed by a centralized controller, resides on a single location, and is connected to the network via a single network interface. These characteristics make RAID systems (and any similar technologies) potential single points of failure and performance bottlenecks.
The previous issues led to the popularity and widespread use of distributed storage systems (DSS). A Distributed Storage System overcomes single point of failure and bottleneck issues by replicating the data in geographically dispersed nodes, ensuring data availability even in cases of complete site disasters. Although data distribution enhances the robustness of data replication, it creates new challenges in data consistency. Multiple clients may now access different data replicas concurrently by communicating with the remote data hosts. Network asynchrony and potential node failures make it even more difficult to ensure that any data updates will be propagated to all the data replicas. Current commercial solutions (such as Dropbox), fail short to handle concurrency, and they rely on infrequent concurrent data accesses to the storage space by individual users. However, future applications involving concurrent accesses to a DSS via multiple computing devices will demand strong guarantees on the contents of the DSS, which will have to be indistinguishable from the guarantees offered by a centralized storage solution.
Building on the two decades of incremental research, IMDEA Networks launches the scientific project ATOMICDFS, with the aim to tackle the challenge of “Seeking Efficient Atomic Implementations of Distributed Data Storage.” DFS stands for Distributed File System, and it describes a special form of a DSS that stores and handles files. ATOMIC defines the consistency guarantees that our file system provides in case of concurrent accesses. Atomicity is the strongest and most intuitive consistency guarantee, as it creates the illusion of a sequentially accessed storage even when multiple clients access it concurrently. What clients expect to see is the existence of a single copy of the file system as if it was accessed on a local machine. Whenever a file is read we expect to obtain: (a) the changes of the last preceding write/append operation, and (b) a copy of the file that is as recent as the one obtained by the last preceding read operation. Although intuitive and easy to understand, atomicity is very hard to provide in an asynchronous, fail prone, message passing environment due to its high unpredictability.
ATOMICDFS aims to investigate the existence of highly efficient DFS able to provide atomic guarantees in such harsh environments. This question may be divided into smaller components. Our goal is to identify the major components and present a theoretical analysis of the hardness of each and every one of them. Then we will develop algorithmic solutions to address the different aspects of the problem, targeting a performance that will meet our theoretical bounds. Ultimately, we plan to implement our algorithms on top of cheap commodity hardware (i.e. regular magnetic media), without any special characteristics, to provide the impression of a single, highly-available storage space, that will be accessed by multiple clients concurrently. This will eventually lead to a new, cost-effective, widely-available, robust and highly consistent Distributed Storage solution.
Dr. Nicolas Nicolaou has been awarded an EU Marie Curie Intra-European Fellowship (IEF) for career development to work on this research project together with Scientist in Charge Dr. Antonio Fernández Anta, a Research Professor at IMDEA Networks. ATOMICDFS will run from December 2014 to November 2016.