Pengfei You's research while affiliated with National University of Defense Technology and other places

Publications (15)

Article
Full-text available
Distributed storage systems, built on peer-to-peer networks, can provide large-scale data storage and high data reliability by redundancy. Data backup is the process to store data into a set of redundant storage nodes. Rapid completion of such a process is very critical to maintain system performance. In traditional data backup in distributed syste...
Article
Optimizing the Map process is important for the improvement of the MapReduce performance. Many efforts have been devoted into the problem to design more efficient scheduling strategies. However, there exists a kind of MapReduce applications, named imprecise applications, where the imprecise results based on part of map tasks can satisfy the require...
Conference Paper
To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approach, named Dart, to meet the given deadline and maximize the input size if only part of the dataset can be processed before the time limi...
Conference Paper
Distributed storage systems can provide large-scale data storage and high data reliability by redundant schemes, such as replica and erasure codes. Redundant data may get lost due to frequent node failures in the system. The lost data is needed to be regenerated as soon as possible so as to maintain data availability and reliability. The direct way...
Article
Full-text available
Cloud storage system provides reliable service to users by widely deploying redundancy schemes in its system – which brings high reliability to the data storage, but inversely introduces significant overhead to the system, consisting of storage cost and energy consumption. The core behind this issue is how to leverage the relationship between data...
Article
Quality of service (QoS) optimization for end-to-end (e2e) services always depends on performance analysis in cloud-based service delivery industry. However, performance analysis of e2e services becomes difficult as the scale and complexity of virtualized computing environments increase. In this paper, the authors present a novel hierarchical stoch...
Conference Paper
Parallel computing can improve the data-processing efficiency significantly. However, the traditional approaches, such as MPI and MapReduce, need to program in the special environment. In this paper, a new distributed computing framework named MEX is proposed. Users just provides the input files and the name of an executable program to MEX. Then ME...
Article
The scheduling approach in MapReduce may result in the "long tail" problem because of the unreasonable task assignment and high scheduling overhead because of an amount of task scheduling operations. To address these problems, a new task scheduling approach for MapReduce, named "Iterative Task Scheduling Algorithm", is proposed. The new approach tr...
Conference Paper
Due to high storage efficiency, erasure codes are recently used to provide high data reliability in distributed storage systems. When multiple data loses in system, regeneration time for them demands to be as short as possible so as to keep data availbility and reliability. Common way is to repair them one by one, which prolongs the regeneration ti...
Chapter
Wireless sensor networks (WSN) is a key technology extensively applied in many fields, such as transportation, health-care and environment monitoring. Despite rapid development, the exponentially increasing data emanating from WSN is not efficiently stored and used. Besides, the data from multiple different types and locations of WSN needs to be we...
Article
Full-text available
Sensor information system is a specific distributed information management system for applying sensor data and aims to effectively process, manage, and analyze data emanating from sensor networks. Recently, with the development of sensor networks, sensor information system encounters many challenges, such as huge and diverse data, heterogeneous cli...
Conference Paper
Wireless sensor networks (WSN) is a critical technology for information gathering covering many areas, including health-care, transportation, air traffic control and environment monitoring. Despite wide use, the fast increasing data emanating from WSN is not fully utilized due to the limitation for structure of WSN itself. Along with the further de...
Conference Paper
Recently, Cloud computing, as one of the hottest words in IT world, has drawn great attention. Many IT companies such as IBM, Google, Amazon, Microsoft, Yahoo and others vigorously develop cloud computing systems and related products to customers. However, there are still some difficulties for customers to adopt cloud computing, in which many secur...

Citations

... In [26], the authors proposed a general framework to find the optimal cost feasible recovery in a dynamic network topology and investigated the gains for different codes by utilizing the network aware. In addition, there are some special network topologies which are used to promote the repair efficiency, such as in literatures [27][28][29][30][31]. Therefore, network topology plays an important role in DSSs, which inspires us to explore and exploit the relationships for failure recovery. ...
... Performance Prediction: In order to allocate the required resources and meet job deadlines, much related work focuses on exploiting historic [9,25,30,31,47,48,55,57], and runtime [9,23,25,26,36,47,48,56,57] job information, while other research [9,17,29,42,45,56] focuses on building job performance profiles and scalability models offline. Although, effective in many situations, we show that approaches similar to these suffer when used under resource constrained settings. ...
... Wang et al. [20] presented a method for computing "Imprecise Applications" using MR frameworks. Reduce is run after the Map step in predictable MapReduce applications. ...
... e Internet plus community work service system mainly exists in the form of platform in the ecosystem. For the business interaction system, it provides the trained online model prediction function, and for the full use of the company's cloud computing [28] resources and largescale distributed storage [29], it can improve the model training speed by training the machine learning model on Mathematical Problems in Engineering the cloud platform. rough this system, we encapsulate the original algorithm modeling process which needs manual processing in every step, realize the automation of the whole process, greatly simplify the process of machine learning model, reduce the threshold setting of machine learning model, make the knowledge background without server-side developers easily fit the business demand modeling, and greatly reduce the development cost model establishment. ...
... In contrast, SC-IPaaS does not consider context management and only considers stationary sensors, which influences data reliability. [124]: To integrate WSNs with cloud computing, a three-tier architecture is proposed. In multi-tier frameworks, the infrastructure layer comprised of physical IT resources such as memory, mainframes, cluster, and computer networks for the deployment of a requested framework. ...
... A five-element QoS model including execution time, price and credit is proposed in the literature [3]; in the Fig. 1 Architecture of cloud computing service selection based on SLA constraint literature [4], a QoS description model for Web service is obtained; literatures [5,6] describe the QoS meta-model and QoS attribute of Web service, and use the service publication and acquisition mechanism with QoS constraint information, the QoS evaluation method and the three-dimensional QoS model to support the QoS-based service selection. For QoS models proposed for application in different fields, the QoS attributes concerned are also different [7]. This study on risk evaluation aims at taking the risk level as the core factor of QoS constraint model to provide risk controllable service for QoS service selection strategy under the SLA. ...
... Indeed, the Grid application to the municipality scale can help understand power balances among municipalities, thus providing an in-depth knowledge of its dynamics. On the other hand, it is oriented to reduce the Grid's redundancy, intended as the presence of more than one indicator providing the same piece of information (Huang et al. 2015). More in detail, this task mainly addresses temporal redundancy and, thus, rejects indicators occurring twice with different time horizons when their simultaneous presence doesn't pitch in understanding the ongoing territorial dynamics (Table 2). ...
... In addition, SMEs are at risk of losing a lot of their customers' data over the years because of a weakness in their conventional systems when compared to Cloud computing systems. Thus, using Cloud services that offer a back-up service, like Amazon's S3, would prevent such losses [5]. Using Cloud computing resources has become like using utility services such as electricity, water and gas, all of which are available upon request with payment for use [6]. ...
... Cloud model based on spatial data processing [43], Hadoop and Map Reduce based localized geospatial computing model [44] Storage and easy management of huge sets of geospatial data [45] Efficient sharing and integration of geospatial resources Air Traffic Management Software as a Service Global ATM system and standardized working procedures [46] Development of global ATM system and standardized working procedures [47] Big Data Processing Experimental results based on Hadoop and Map Reduce framework [48] Management of enormous data [49] and extracting useful information from the large data [50]. ...
Reference: IJCSIS pdf
... Sensors also find a variety of usages in military and industrial applications, and are typically micro devices which can be embedded at various places and not detectable with a common eye. They transmit huge amounts of data to the sink node which is used to collect the data from several nodes and subsequently sends it to the cloud for processing and extraction of useful information for the betterment of the society [34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50]. ...