TOS 15:1 - EiC Message
This paper presents REGISTOR, a platform for regular expression grabbing inside storage. The main idea of Registor is accelerating regular expression (regex) search inside storage where large data set is stored, eliminating the I/O bottleneck problem. A special hardware engine for regex search is designed and augmented inside SSD that processes data on-the-fly during data transmission from NAND flash to host. In order to make the speed of regex search match the internal bus speed of modern SSD, a deep pipeline structure is designed in Registor hardware consisting of file semantics extractor, matching candidates finder, regex matching units (REMUs) and results organizer. Furthermore, each stage of the pipeline makes use of maximal parallelism possible. To make Registor readily usable by high level applications, we have developed a set of APIs and libraries in Linux allowing Registor to process files in SSD by recombining separate data blocks into files efficiently. A working prototype of Registor has been built in our newly designed NVMe-SSD. Extensive experiments and analyses have been carried out to show that Registor achieves high throughput, reduces I/O bandwidth requirement by up to 97% and CPU utilization by as much as 82% for regex search in large data set.
Traditionally, file systems were implemented as part of OS kernels. As complexity of file systems grew, many new file systems began being developed in user space. Low performance is considered the main disadvantage of user-space file systems but the extent of this problem has never been explored systematically. As a result, the topic of user-space file systems remains rather controversial: while some consider user-space file systems a "toy" not to be used in production, others develop full-fledged production file systems in user space. In this article we analyze the design and implementation of the most widely known user-space file system framework, FUSE, for Linux. We then characterize its performance and resource utilization for a wide range of workloads. Our experiments indicate that depending on the workload and hardware used, throughput degradation caused by FUSE can be completely imperceptible or as high as -83%, even when optimized; latencies of FUSE file system operations can increase from none to 4x when compared to in-kernel Ext4. On the resource utilization side, FUSE can increase relative CPU utilization by up to 31% and underutilize disk bandwidth by as much as -80% compared to Ext4.
Introduction to the Special Issue on ACM International Systems and Storage Conference (SYSTOR) 2018
We present Lerna, an end-to-end tool that automatically and transparently detects and extracts parallelism from data dependent sequential loops using speculation combined with a set of techniques including code profiling, dependency analysis, instrumentation, and adaptive execution. Speculation is needed to avoid conservative actions and detect actual conflicts. Lerna targets applications that are hard-to-parallelize due to data dependency. Our experimental study involves the parallelization of 13 applications with data dependencies. Results on a 24-core machine show an average of 2.7x speedup for micro-benchmarks and 2.5x for the macro-benchmarks.