ACM Transactions on

Storage (TOS)



Volume 15, Issue 1, March 2019 is now available.


We would like to thank our colleagues who have served as reviewers for ACM TOS between 2016 ~ 2018. Through this list, we express our deepest gratitude for sacrificing your time and effort in providing your valuable comments. Thanks!


Special Issue on Computational Storage

Since the first hard disk drive (HDD) was introduced in 1956, storage devices have remained “dumb” for more than 60 years. However, ever growing demand for big data processing and recent advances in storage technology are reshaping the traditional CPU-centric computing paradigm. Many studies show that the energy consumed by data movement is starting to exceed the energy consumed by computation. Especially, the advent of high-performance solid-state drives (SSDs) based on non-volatile memory (e.g., NAND flash memory, 3D XPoint, etc.) opens up new opportunities for storage-centric computing paradigm.

Read more.

Forthcoming Articles
CORES: Towards Scan-Optimized Columnar Storage for Nested Records

Due to the record transformation in storage layer, the unnecessary processing costs derived from either unwanted fields or unsatisfied rows may be very heavy in complex schemas, significantly wasting the computational resources in the large-scale analytical workloads. We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly-selective filters down into the column-based storage, where each filter consists of several filtering conditions on a field. By applying highly-selective filters to column scan in storage, we demonstrate that both the IO and the deserialization cost could be significantly reduced by introducing a fine-gained composition based on bitset. We also generalize this technique by two pair-wise operations rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open source platform. For practical purposes, we highlight how to effectively construct a nested column storage and efficiently drive multiple filters by a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad-hoc queries. The experiments, covering a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7X<26.9X compared to the state-of-the-art platforms in scan-intensive workloads.

An Attention-Augmented Deep Architecture for Hard Drive Status Monitoring in Large-Scale Storage Systems

Although a set of reactive fault tolerant measures such as RAID have been implemented, it is still a tough issue to enhance the reliability of large-scale storage systems. Proactive prediction is an effective method to avoid the possible hard drive failures in advance. A series of models based on the self-monitoring, analysis and reporting technology (SMART) have been proposed to predict impending hard drive failures. Unfortunately, there remain some serious yet unsolved challenges like the lack of explainability of prediction results. In order to address these issues, we carefully analyze a dataset collected from a real-world large-scale storage system. Based on the insights gotten from the analysis, we design an attention-augmented deep architecture for hard drive health status assessment and failure prediction in this paper. The deep architecture, named AMENDER, can not only monitor the status of hard drives but also assist in failure cause diagnose. We evaluate AMENDER through large amounts of experiments based on the real-world datasets.

Enabling Efficient Updates in KV Storage via Hashing: Design and Performance Evaluation

Persistent key-value (KV) stores mostly build on the Log-Structured Merge (LSM) tree for high write performance, yet the LSM-tree suffers from the inherently high I/O amplification. KV separation mitigates I/O amplification by storing only keys in the LSM-tree and values in separate storage. However, the current KV separation design remains inefficient under update-intensive workloads due to its high garbage collection (GC) overhead in value storage. We propose HashKV, which aims for high update performance atop KV separation under update-intensive workloads. HashKV uses hash-based data grouping, which deterministically maps values to storage space so as to make both updates and GC efficient. We further relax the restriction of such deterministic mappings via simple but useful design extensions. We extensively evaluate various design aspects of HashKV. We show that HashKV achieves 4.6x update throughput and 53.4% less write traffic compared to the current KV separation design. In addition, we demonstrate that we can integrate the design of HashKV with state-of-the-art KV stores and improve their respective performance.

ZoneTier: A Zone-based Storage Tiering and Caching Co-Design to Integrate SSDs with SMR Drives

Integrating solid state drives (SSDs) and host-aware shingled magnetic recording (HA-SMR) drives can potentially build a cost-effective high-performance storage system. However, existing SSD tiering and caching designs in such a hybrid system are not fully matched with the intrinsic properties of HA-SMR drives due to their lacking consideration of how to handle non-sequential writes (NSWs). We propose ZoneTier, a zone-based storage tiering and caching co-design, to effectively control all the NSWs by leveraging the host-aware property of HA-SMR drives. ZoneTier exploits real-time data layout of SMR zones to optimize zone placement, reshapes NSWs generated from zone demotions to SMR preferred sequential writes, and transforms the inevitable NSWs to cleaning-friendly write traffics for SMR zones. ZoneTier can be easily extended to match host-managed SMR drives using proactive cleaning policy. We implement a prototype of ZoneTier with user space data management algorithms and real SSD and HA-SMR drives which are manipulated by the functions provided by libzbc and libaio. Our experiments show that ZoneTier can reduce zone relocation overhead by 29.41% in average, shorten performance recovery time of HA-SMR drives from cleaning by up to 33.37%, and improve performance by up to 32.31% than existing hybrid storage designs.

Level Hashing: A High-performance and Flexible-resizing Persistent Hashing Index Structure

Non-volatile memory (NVM) as persistent memory is expected to substitute or complement DRAM in memory hierarchy, due to the strengths of non-volatility, high density, and near-zero standby power. However, due to the requirement of data consistency and hardware limitations of NVM, traditional indexing techniques originally designed for DRAM become inefficient in persistent memory. To efficiently index the data in persistent memory, this paper proposes a write-optimized and high-performance hashing index scheme, called level hashing, with low-overhead consistency guarantee and cost-efficient resizing. Level hashing provides a sharing-based two-level hash table, which achieves a constant-scale search/insertion/deletion/update time complexity in the worst case and rarely incurs extra NVM writes. To cost-efficiently resize this hash table, level hashing leverages an in-place resizing scheme that only needs to rehash 1/3 of buckets instead of the entire table to expand a hash table and rehash 2/3 of buckets to shrink a hash table, thus significantly reducing the number of rehashed buckets and improving the resizing performance. Experimental results demonstrate that level hashing achieves 1.4×?3.0× speedup for insertions, 1.2×?2.1× speedup for updates, 4.3× speedup for expanding, and 1.4× speedup for shrinking a hash table, while maintaining high search and deletion performance, compared with state-of-the-art hashing schemes

SolarDB: Towards a Shared-Everything Database on Distributed Log-Structured Storage

Efficient transaction processing over large databases is a key requirement for many mission-critical applications. Though modern databases have achieved good performance through horizontal partitioning, their performance deteriorates when cross-partition distributed transactions have to be executed. This paper presents Solar, a distributed relational database system that has been successfully tested at a large commercial bank. The key features of Solar include: 1) a shared-everything architecture based on a two-layer log-structured merge-tree; 2) a new concurrency control algorithm that works with the log-structured storage, which ensures efficient and non-blocking transaction processing even when the storage layer is compacting data among nodes in the background; 3) fine-grained data access to effectively minimize and balance network communication within the cluster. According to our empirical evaluations on TPC-C, Smallbank and a real-world workload, Solar outperforms the existing shared-nothing systems by up to 50x when there are close to or more than 5% distributed transactions.

Characterizing output behaviors of a production supercomputer: analysis and implications

This paper studies the output behavior of the Titan supercomputer and its Lustre file stores. We introduce a statistical benchmarking methodology that collects/combines samples over times and settings: 1) To measure the performance impact of parameter choices against the interference in the production setting; 2) to derive the performance of individual stages/components in the multi-stage write pipelines, and their variations over time. We find that Titan's I/O system is highly variable with two major implications: 1) Stragglers lessen the benefit of coupled I/O parallelism. I/O parallelism is most effective when the application distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets, in a balanced way with minimal contention. 2) our results also suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify "good locations" in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.

All ACM Journals | See Full Journal Index

Search TOS
enter search term and/or author name