ACM Transactions on

Storage (TOS)



Special Issue on Computational Storage

Since the first hard disk drive (HDD) was introduced in 1956, storage devices have remained “dumb” for more than 60 years. However, ever growing demand for big data processing and recent advances in storage technology are reshaping the traditional CPU-centric computing paradigm. Many studies show that the energy consumed by data movement is starting to exceed the energy consumed by computation. Especially, the advent of high-performance solid-state drives (SSDs) based on non-volatile memory (e.g., NAND flash memory, 3D XPoint, etc.) opens up new opportunities for storage-centric computing paradigm.

Read more.

Forthcoming Articles

EIC Message: TOS 15:4

Efficient Erasure-Coded Block Storage System Based on Speculative Partial Write

In this paper we present PariX, a speculative partial write scheme which supports fast small writes in erasure-coded block storage systems. We transform the original formula of parity calculation to use the data deltas (between the current/original data values), instead of the parity deltas, to calculate the parities in journal replay. For each partial write, this allows PariX to speculatively log only the new value of the data without reading its original value. For a series of n partial writes to the same data, PariX performs pure write (instead of write-after-read) for the last n ? 1 ones while only introducing a small penalty of an extra network RTT (round-trip time) to the first one. We design and implement a PariX-based Block Storage system (PBS) which provides high-performance virtual disk service for VMs running cloud-oblivious applications. PBS not only supports fast partial writes, but also realizes efficient full writes, background journal replay and fast failure recovery with strong consistency guarantees. Both micro benchmarks and trace-driven evaluation show that PBS provides efficient block storage and outperforms state-of-the-art EC-based systems by orders of magnitude.

Information Leakage in Encrypted Deduplication via Frequency Analysis: Attacks and Defenses

Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryption to preserve deduplication effectiveness. However, such deterministic encryption reveals the underlying frequency distribution of the original plaintext chunks. This allows an adversary to launch frequency analysis against the ciphertext chunks and infer the content of the original plaintext chunks. In this paper, we study how frequency analysis affects information leakage in encrypted deduplication storage, from both attack and defense perspectives. Specifically, we target backup workloads, and propose a new inference attack that exploits chunk locality to increase the coverage of inferred chunks. We further combine the new inference attack with the knowledge of chunk sizes and show its attack effectiveness against variable-size chunks. We conduct trace-driven evaluation on both real-world and synthetic datasets and show that our proposed attacks infer a significant fraction of plaintext chunks under backup workloads. To defend against frequency analysis, we present two defense approaches, namely MinHash encryption and scrambling. Our trace-driven evaluation shows that our combined MinHash encryption and scrambling scheme effectively mitigates the severity of the inference attacks, while maintaining high storage efficiency and incurring limited metadata access overhead.

GraphOne: A Data Store for Real-time Analytics on Evolving Graphs

There is a growing need to perform real-time analytics on evolving graphs in order to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations. To address this challenge, we have designed and developed GraphOne, a graph data store that combines two complementary graph storage formats (edge list and adjacency list), and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities with only a small data duplication. Experimental results show that GraphOne achieves an ingestion rate of two to three orders of magnitude higher than graph databases, while delivering algorithmic performance comparable to a static graph system. GraphOneis able to deliver 5.36x higher update rate and over 3x better analytics performance compared to a state-of-the- art dynamic graph system

Countering Fragmentation in an Enterprise Storage System

As a file system ages, it can experience multiple forms of fragmentation. Fragmentation of the free space in the file system can lower write performance and subsequent read performance. Client operations as well as internal operations, such as deduplication, can fragment the layout of an individual file, which also impacts file read performance. File systems that allow sub-block granular addressing can gather intra-block fragmentation, which leads to wasted free space. Similarly, wasted space can also occur when a file system writes a collection of blocks out to object storage as a single large object, as the constituent blocks can become free at different times. The impact of fragmentation also depends on the underlying storage media. This paper describes how the NetApp WAFL file system leverages a storage virtualization layer for defragmentation techniques that physically relocate blocks efficiently, including those in read-only snapshots. The paper analyzes the effectiveness of these techniques at reducing fragmentation and improving overall performance across various storage media.

INSTalytics: Cluster Filesystem Co-design for Big-data Analytics

We present the design, implementation, and evaluation of INSTalytics a co-designed stack of a cluster file system and the compute layer, for efficient big data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle. To achieve this, INSTalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables INSTalytics to preserve the same recovery cost and availability as traditional replication. INSTalytics also uses compute-awareness to expose a new sliced-read API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes. We have built a prototype implementation of INSTalytics in a production analytics stack, and show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.

Sketching Volume Capacities in Deduplicated Storage

The adoption of deduplication in storage systems has introduced significant new challenges for storage management. Specifically, the physical capacities associated with volumes are no longer readily available. In this work we introduce a new approach to analyzing capacities in deduplicated storage environments. We provide sketch-based estimations of fundamental capacity measures required for managing a storage system: How much physical space would be reclaimed if a volume or group of volumes were to be removed from a system (the {\em reclaimable} capacity) and how much of the physical space should be attributed to each of the volumes in the system (the {\em attributed} capacity). Our methods also support capacity queries for volume groups across multiple storage systems, e.g., how much capacity would a volume group consume after being migrated to another storage system? We provide analytical accuracy guarantees for our estimations as well as empirical evaluations. Our technology is integrated into a prominent all-flash storage array and exhibits high performance even for very large systems. We also demonstrate how this method opens the door for performing placement decisions at the data center level and obtaining insights on deduplication in the field.

The Composite-file File System: Decoupling One-to-one Mapping of Files and Metadata for Better Performance

The design and implementation of traditional file systems typically use the one-to-one mapping of logical files to their physical metadata representations. File system optimizations generally follow this rigid mapping and miss opportunities for an entire class of optimizations. We designed, implemented, and evaluated a composite-file file system, which allows many-to-one mappings of files to metadata. Through exploring different mapping strategies, our empirical evaluation shows up to a 27% performance improvement under webserver and software development workloads, for both disks and SSDs. This result demonstrates that our approach of relaxing file-to-metadata mapping is promising.

Introduction to the Special Issue on USENIX FAST 2019

All ACM Journals | See Full Journal Index

Search TOS
enter search term and/or author name