Shu Yin's research while affiliated with ShanghaiTech University and other places

Publications (46)

Article
User‐level file systems are usually adopted to bridge the gap between efficacy and efficiency of file system developments for new applications' I/O demands. And the widely known user‐space file system framework, FUSE, is commonly utilized to deployed user‐level file systems. This article first uses a popular stack‐able file system as a case study t...
Preprint
The remote procedure call (a.k.a. RPC) latency becomes increasingly significant in a distributed file system. We propose BuffetFS, a user-level file system that optimizes I/O performance by eliminating the RPCs caused by \texttt{open()} operation. By leveraging \texttt{open()} from file servers to clients, BuffetFS can restrain the procedure calls...
Article
Since large-scale and data-intensive applications have been widely deployed, there is a growing demand for high-performance storage systems to support data-intensive applications. Compared with traditional storage systems, next-generation systems should embrace a dedicated processor to reduce the computational load of host machines and may have hyb...
Preprint
The lookup procedure in Linux costs a significant portion of file accessing time as the virtual file system (VFS) traverses the file path components one after another. The lookup procedure becomes more time consuming when applications frequently access files, especially those with small sizes. We propose Stage Lookup, which dynamically caches popul...
Article
Augmented reality (AR) applications that overlay a user's perception of the real world with digitally generated information are on the cusp of commercial viability. AR has appeared in several commercial platforms like Microsoft HoloLens and smartphones. They extend the user experience beyond two dimensions and supplement a user's normal 3D world. A...
Article
Full-text available
There have been significant advances in capturing gigapixel panoramas (GPP). However, solutions for viewing GPPs on head-mounted displays (HMDs) are lagging: an immersive experience requires ultra-fast rendering while directly loading a GPP onto the GPU is infeasible due to limited texture memory capacity. In this paper, we present a novel out-of-c...
Article
The last decade witnessed a dramatic advance in cloud computing research and techniques. One of the key challenges in this field is reducing the massive amount of energy consumption in cloud computing data centers. Many power-aware virtual machine (VM) allocation and consolidation approaches were proposed to reduce energy consumption efficiently. H...
Article
This paper addresses an issue of erasure-coded data archival, where ( ) erasure codes are employed to archive rarely accessed replicas. The traditional synchronous encoding process neither leverages the existence of replicas, nor handles encoding operations in a decentralized manner. To overcome these drawbacks, we exploit pipelined encoding proces...
Article
Full-text available
The Popular Disk Concentration (PDC) technique and the Massive Array of Idle Disks (MAID) technique are two effective energy conservation schemes for parallel disk systems. The goal of PDC and MAID is to skew I/O load toward a few disks so that other disks can be transitioned to low power states to conserve energy. I/O load skewing techniques like...
Article
Full-text available
MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source imple- mentation of MapReduce is widely used for short jobs requiring low response time. In this paper, We proposed a new preshuffling strategy in Hadoop to reduce high network loads imp...
Conference Paper
Full-text available
Existing reliability models evaluate the lifetime of storage systems well. However, few models aim to analysis parallel storage systems, especially systems equipped with energy-efficient solutions. MREED model provides an inspiring reliability analysis idea for energy-efficient parallel storage systems, especially RAIDs. Unfortunately, the MREED ha...
Article
Dynamic Voltage Scaling (DVS) is a key technique for embedded systems to exploit multiple voltage and frequency levels to reduce energy consumption and to extend battery life. There are many DVS-based algorithms proposed for periodic and aperiodic task models. However, there are few algorithms that support the sporadic task model. Moreover, existin...
Conference Paper
Full-text available
We present a multicore-enabled smart storage for clusters in general and MapReduce clusters in particular. The goal of this research is to improve performance of data-intensive parallel applications on clusters by offloading data processing to multicore processors in storage nodes. Compared with traditional storage devices, next-generation disks wi...
Chapter
Introduction Modeling Reliability of Energy-Efficient Parallel Disks Improving Reliability of MAID via Disk Swapping Experimental Results and Evaluation Related Work Conclusions References
Article
We develop a mathematical model — MREED — to quantitatively evaluate the failure rate of energy-efficient parallel storage systems. The Power-Aware Redundant Array of Inexpensive Disk (PARAID) aims to reduce energy use of commodity server-class disks without specialized hardware. The goal of PARAID is to skewed striping pattern to adapt to the syst...
Conference Paper
In this study we develop a secure allocating processing(SAP) algorithm for the S-FAS scheme [13] to improve the security level and consider its performance using the heterogeneous feature of a large distributed system. The SAP allocation algorithm considers load balancing, delayed effects caused by the workload variance of many consecutive requests...
Conference Paper
Full-text available
There is a growing demand for large-scale distributed storage systems to support resource sharing and fault tolerance. Although heterogeneity issues of distributed systems have been widely investigated, little attention has yet been paid to security solutions designed for distributed storage systems with heterogeneous vulnerabilities. This fact mot...
Article
Full-text available
A critical problem with parallel I/O systems is the fact that disks consume a significant amount of energy. To design economically attractive and environmentally friendly parallel I/O systems, we propose an energy-aware prefetching strategy (PRE-BUD) for parallel I/O systems with disk buffers. We introduce a new architecture that provides significa...
Article
Full-text available
Reducing power consumption of wireless networks has become a major goal in designing modern multimedia wireless systems. In an effort to reduce power consumption, this paper addresses the issue of scheduling real-time messages in multimedia wireless networks subject to both timing and power constraints. A power-consumption model is introduced to ca...
Article
In the past decade, parallel disk systems have been highly scalable and able to alleviate the problem of disk I/O bottleneck, thereby being widely used to support data-intensive applications. Although a variety of parallel disk systems were developed, most existing disk systems lack a means to adaptively control the quality of security for dynamica...
Conference Paper
Full-text available
In largely distributed clusters, computing nodes are geographically deployed in various computing sites. Information processed in a distributed cluster is shared among a group of distributed processes or users by virtue of messages passing protocols (e.g. message passing interface - MPI) running on the Internet. Because of the open accessible natur...
Article
Full-text available
Improving security and minimizing power consumption are crucial for large-scale data storage systems. Although a handful of studies have been focused on data security and energy efficiency, most of the existing approaches have concentrated on only one of these two metrics. In this paper, we present a new approach to integrating power optimization w...
Conference Paper
Full-text available
Energy efficient computing is becoming increasingly important as the scale of parallel computing systems is expanding. As the processing power of parallel computing systems has been incremented there has been an increased demand for large scale storage systems to store the output of these parallel computing systems. Data centers are growing at an e...
Conference Paper
Full-text available
MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop-an open-source implementation of MapReduce is widely used for short jobs requiring low response time. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in natu...
Article
Full-text available
Energy conservation has become a critical problem for real-time embedded storage systems. Although a variety of approaches for reducing energy consumption have been extensively studied, energy conservation for real-time embedded storage systems is still an open problem. In this article, we propose an energy management strategy, I/O Burstiness for E...
Conference Paper
Full-text available
The Popular Disk Concentration (PDC) technique and the Massive Array of Idle Disks (MAID) technique are two effective energy saving schemes for parallel disk systems. The goal of PDC and MAID is to skew I/O load towards a few disks so that other disks can be transitioned to low power states to conserve energy. I/O load skewing techniques like PDC a...
Conference Paper
Full-text available
Cluster storage systems are essential building blocks for many high-end computing infrastructures. Although energy conservation techniques have been intensively studied in the context of clusters and disk arrays, improving energy efficiency of cluster storage systems remains an open issue. To address this problem, we describe in this paper an appro...
Article
Full-text available
Cluster computing has emerged as a primary and cost-effective platform for running parallel applications, including communication-intensive applications that transfer a large amount of data among the nodes of a cluster via the interconnection network. Conventional load balancers have proven effective in increasing the utilization of CPU, memory, an...
Article
Full-text available
Load balancing for clusters has been investigated extensively, mainly focusing on the effective usage of global CPU and memory resources. However, previous CPU- or memory-centric load balancing schemes suffer significant performance drop under I/O-intensive workloads due to the imbalance of I/O load. To solve this problem, we propose two simple yet...
Conference Paper
Full-text available
Many energy conservation techniques have been proposed to achieve high energy efficiency in disk systems. Unfortunately, growing evidence shows that energy-saving schemes in disk drives usually have negative impacts on storage systems. Existing reliability models are inadequate to estimate reliability of parallel disk systems equipped with energy c...
Conference Paper
Full-text available
In the past decade, parallel disk systems have been developed to address the problem of I/O performance. A critical challenge with modern parallel I/O systems is that parallel disks consume a significant amount of energy in servers and high performance computers. To conserve energy consumption in parallel I/O systems, one can immediately spin down...
Conference Paper
Full-text available
Parallel disk systems consume a significant amount of energy due to the large number of disks. To design economically attractive and environmentally friendly parallel disk systems, in this paper we design and evaluate an energy-aware prefetching strategy for parallel disk systems consisting of a small number of buffer disks and large number of data...

Citations

... While the file system is running at the mount time, the FUSE daemon cannot be called by VFS without the use of FUSE. In some cases, FUSE may take another action, such as returning the requested data into the buffer, or it may perform some pre-processing by requesting data from the underlying file systems [18]. Eventually, when the mounting connection is no longer needed, the mount point directory contents automatically disappear as soon as the file system is unmounted. ...
... For example, attacks on wireless mobile ad hoc networks are investigated in [15], attacks on wireless sensor networks [16], and cloud computing environments are discussed in [17]. Furthermore, due to a lack of end-to-end novel security solutions, several attacks have been against evolving technologies, for instance, those depending on artificial intelligence (AI) [18], augmented reality (AR) [19], Software-Defined Networking (SDN) paradigm [20] and Blockchain addressed in [21]. Attacks against new business models and e-health applications presented in [22]. ...
... centimeter to decameter and beyond), coupled with the computational overhead of operating VR user interfaces with large textures or vertex counts (e.g. Lyu et al., 2019) offers further challenges in terms of latency. To capture heterogeneities at multiple length scales such that VR user experiences approach those of the field, it is clear that visualizations using locally varying texture resolution and/or vertex densities (e.g. ...
... Both for live and non-live migration schemes, there is a need for efficient resource allocation to carry out the migration process [34]- [37]. Migration can be performed either using a wired or wireless network. ...
... In HDDs, data manipulation may significantly affect the performance due to disk rotation to access different tracks. Although SSDs have better performance and the cost per gigabyte has decreased, solid-state drives are still more expensive than HDDs (about 33.33% [30]). Solid-state drives are devices that have a similar block I/O interface to hard disks [31]. ...
... Results indicate energy consumption is responsible for 5 to 28% of overall cost for databases placed in HDDs. [20] ✳ ✳ ✳ Tan et al. [14] ✳ ✳ ✳ Wan et al. [21] ✳ ✳ ✳ Mingzhou et al. [22] ✳ ✳ Salkhordeh et al. [23] ✳ ✳ Desnoyers et al. [24] ✳ ✳ Li et al. [25] ✳ ✳ Boukhelef et al. [11] ✳ ✳ ✳ ✳ Jiao et al. [16] ✳ ✳ ✳ Kim et al. [26] ✳ ✳ ✳ Jingyu et al. [27] ✳ ✳ ✳ Boukhelef et al. [15] ✳ ✳ ✳ Wu et al. [28] ✳ ✳ ✳ ✳ Lin et al. [29] ✳ ✳ ✳ ✳ ...
... When the data is prefetched using uniform distribution and Zipf distribution (SP_Uniform, Sp_Zipf, and Sc) [29,30], the average cache hits are calculated as 56.2%, 55% and 46.6%, respectively. Our present technique outperforms the others by providing an average cache hit ratio of 78.5%. Figure 10b compares the execution time taken by our proposed PAP framework with the Reliable Energy-Efficient Storage System (RESS) and Modified Parallel Log-Structured File System (PLFS) proposed in [31]. The results demonstrate that PAP performs much better for 100 and 200 queries by taking 23 and 30 s, respectively. ...
... Yin et al.propose architecture for active storage systems and a hybrid disk configuration for the storage system in active storage. Termed as the HcDD the disk architecture consist of dual buffer; [8] a HDD write buffer for deduplication of write requests and SSD buffer for providing parallelism to write requests. Read requests are directly processed by the SSD whereas write request to an active storage system is sent to the HDD buffer for deduplication before processing. ...
... 90% of all data in the world was generated over the last 2 years [4], and global data are predicted to reach 163 zettabytes by 2025 [5]. Facebook and Google respectively process 20 [6] and 25 petabytes of data per day [7]. ...
... The concept of pipelined encoding process to improve archival system is explored in [12]. A novel system for archival storage [13], novel erasure codes concept [14], a data placement scheme in distributed environment for dataintensive applications [15], decentralized erasure coding for efficient data archival system [16], erasure coding with parity logging in cluster storage environment [17], the utility of Software Defined Network (SDN) to decouple data and control planes in distributed environments, and graph based optimizations in Hadoop framework for improving functions like Swappiness and Hugepage [18] are other important contributions found in the literature. ...