Jinjun Chen

Hunan University of Science and Technology, Hunan, Fujian, China

Are you Jinjun Chen?

Claim your profile

Publications (122)44.32 Total impact

  • Xiao Liu, Yun Yang, Dong Yuan, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific processes are usually time constrained with overall deadlines and local milestones. In scientific workflow systems, due to the dynamic nature of the underlying computing infrastructures such as grid and cloud, execution delays often take place and result in a large number of temporal violations. Since temporal violation handling is expensive in terms of both monetary costs and time overheads, an essential question aroused is “do we need to handle every temporal violation in scientific workflow systems?” The answer would be “true” according to existing works on workflow temporal management which adopt the philosophy similar to the handling of functional exceptions, that is, every temporal violation should be handled whenever it is detected. However, based on our observation, the phenomenon of self-recovery where execution delays can be automatically compensated for by the saved execution time of subsequent workflow activities has been entirely overlooked. Therefore, considering the nonfunctional nature of temporal violations, our answer is “not necessarily true.” To take advantage of self-recovery, this article proposes a novel adaptive temporal violation handling point selection strategy where this phenomenon is effectively utilised to avoid unnecessary temporal violation handling. Based on simulations of both real-world scientific workflows and randomly generated test cases, the experimental results demonstrate that our strategy can significantly reduce the cost on temporal violation handling by over 96% while maintaining extreme low violation rate under normal circumstances.
    ACM Transactions on Software Engineering and Methodology (TOSEM). 02/2014; 23(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.
    IEEE Transactions on Parallel and Distributed Systems 01/2014; 25(2):363-373. · 1.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce is regarded as an adequate programming model for large-scale data-intensive applications. The Hadoop framework is a well-known MapReduce implementation that runs the MapReduce tasks on a cluster system. G-Hadoop is an extension of the Hadoop MapReduce framework with the functionality of allowing the MapReduce tasks to run on multiple clusters in a Grid environment. However, G-Hadoop simply reuses the user authentication and job submission mechanism of Hadoop, which is designed for a single cluster and hence does not suit for the Grid environment. This work proposes a new security model for G-Hadoop. The security model is based on several security solutions such as public key cryptography and the SSL protocol, and is dedicatedly designed for distributed environments like the Grid. This security framework simplifies the users authentication and job submission process of the current G-Hadoop implementation with a single-sign-on approach. In addition, the designed security framework provides a number of different security mechanisms to protect the G-Hadoop system from traditional attacks as well as abusing and misusing.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that processing big graph data can be costly on Cloud. Processing big graph data introduces complex and multiple iterations that raise challenges such as parallel memory bottlenecks, deadlocks, and inefficiency. To tackle the challenges, we propose a novel technique for effectively processing big graph data on Cloud. Specifically, the big data will be compressed with its spatiotemporal features on Cloud. By exploring spatial data correlation, we partition a graph data set into clusters. In a cluster, the workload can be shared by the inference based on time series similarity. By exploiting temporal correlation, in each time series or a single graph edge, temporal data compression is conducted. A novel data driven scheduling is also developed for data processing optimization. The experiment results demonstrate that the spatiotemporal compression and scheduling achieve significant performance gains in terms of data size and data fidelity loss.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • Jinjun Chen, Jianxun Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: From the text: This special issue of Journal of Computer and System Sciences is devoted to papers on dependable and secure computing based on DASC2011 – The 9th IEEE International conference on dependable, autonomic and secure computing held in Sydney Australia from December 12–14, 2011.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. Sub-tree data anonymization, achieving a good trade-off between data utility and information loss, is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data on cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce based algorithms for the two components (TDS and BUG) to gain high scalability by exploiting powerful computation capability of cloud. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Due to its advantages of cost-effectiveness, on-demand provisioning and easy for sharing, cloud computing has grown in popularity with the research community for deploying scientific applications such as workflows. Although such interests continue growing and scientific workflows are widely deployed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place application datasets across globally distributed data centers and schedule tasks according to the data layout to reduce both latency and makespan for workflow execution. In this paper, by utilizing dependencies among datasets and tasks, we propose an efficient data and task coscheduling strategy that can place input datasets in a load balance way and meanwhile, group the mostly related datasets and tasks together. Moreover, data staging is used to overlap task execution with data transmission in order to shorten the start time of tasks. We build a simulation environment on Tianhe supercomputer for evaluating the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve scheduling performance while reducing the total volume of data transfer across data centers. Concurrency and Computation: Practice and Experience, 2013.© 2013 Wiley Periodicals, Inc.
    Concurrency and Computation Practice and Experience 12/2013; 25(18). · 0.85 Impact Factor
  • Jinjun Chen, Jianxun Liu
    Concurrency and Computation Practice and Experience 12/2013; 25(18). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy-sensitive data sets on cloud probably engenders severe privacy concerns because of multi-tenancy. Data encryption and anonymization are two widely-adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost-effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud-based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof-of-concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost-effective framework for privacy preservation can anonymize large-scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost-effective fashion. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 12/2013; 25(18). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, we enter a new era of data explosion which introduces the new problems for big data processing. Current methods for querying streaming XML big data are mostly based on events filtering techniques. It is well known that during the filtering, some data items have to be buffered before the filter can make the proper decision for adopting strategies to deal with them. Furthermore, for a single filter system, the buffer size often increases exponentially in the real application. Cloud is an ideal platform for big XML data processing with its massive storage and powerful computation capability. In this paper, we propose a new multi-filters strategy for querying streaming XML big data on Cloud. We show that the proposed multi-filters strategy can effectively share and reduce the filtering space and time consumption by fully exploit the scalability of Cloud. Furthermore, by deploying our multi-filters collaboration technique, the querying systems together can break the limitation of the theoretic concurrency lower bound. The empirical study shown in this paper demonstrates that our multi-filters strategy outperforms the single filter querying significantly.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we attempt to investigate about how to deal with events stated by uncertainty demand in cloud computing environment between Software as a Service provider, Software as a User and cloud resources provider. The variation considers the supply and demand of resources. The events on uncertainty demand ensure provisioning resources and prevent fluctuation demand. We propose to create resource negotiation and use Fuzzy Optimization for demand fluctuation. This result is relation to the event on uncertainty demand in cloud. The issue is to model the uncertainties on demand to warranty quality of service.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data integrity is an important factor to ensure in almost any data and computation related context. It serves not only as one of the qualities of service, but also an important part of data security and privacy. With the proliferation of cloud computing and the increasing needs in big data analytics, verification of data integrity becomes increasingly important, especially on outsourced data. Therefore, research topics related to data integrity verification have attracted tremendous research interest. Among all the metrics, efficiency and security are two of the most concerned measurements. In this paper, we provide an analysis on authenticator-based efficient data integrity verification. we will analyze and provide a survey on the main aspects of this research problem, summarize the research motivations, methodologies as well as main achievements of several of the representative approaches, then try to bring forth a blueprint for possible future developments.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY The employment of batch processing in workflow is to model and schedule activity instances in multiple workflow cases of the same workflow type to optimize business processes execution dynamically. Although our previous works have preliminarily investigated its model and implementation, it is still necessary to deal with its model design problem. Process mining techniques allow for the automated discovery of process models from event logs and have received notable attentions in researches recently. Following these researches, this paper proposes an approach to mine batch processing workflow models from event logs by considering the batch processing relations among activity instances in multiple workflow cases. The notion of batch processing feature and its corresponding mining algorithm are also presented for discovering the batch processing area in the model by using the input and output data information of activity instances in events. The algorithms presented in this paper can help to enhance the applicability of existing process mining approaches and broaden the process mining spectrum. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 09/2013; 25(13). · 0.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Traditionally, service discovery is often promoted by the centralized approach that typically suffers from single point of failure, poor reliability, poor scalability, to name a few. In view of this challenge, a QoS-aware service discovery method is investigated for elastic cloud computing in an unstructured peer-to-peer network in this paper. Concretely speaking, the method is deployed by two phases, that is, service registering phase and service discovery phase. More specifically, for a peer node engaged in the unstructured peer-to-peer network, it firstly registers its functional and nonfunctional information to its neighbors in a flooding way. With the multiple registered information, the QoS-aware service discovery is promoted in a probabilistic flooding way according to the network traffic. At last, extensive simulations are conducted to evaluate the feasibility of our method. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 09/2013; 25(13). · 0.85 Impact Factor
  • Wanchun Dou, Qi Chen, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed Denial-of-Service attack (DDoS) is a major threat for cloud environment. Traditional defending approaches cannot be easily applied in cloud security due to their relatively low efficiency, large storage, to name a few. In view of this challenge, a Confidence-Based Filtering method, named CBF, is investigated for cloud computing environment, in this paper. Concretely speaking, the method is deployed by two periods, i.e., non-attack period and attack period. More specially, legitimate packets are collected in the non-attack period, for extracting attribute pairs to generate a nominal profile. With the nominal profile, the CBF method is promoted by calculating the score of a particular packet in the attack period, to determine whether to discard it or not. At last, extensive simulations are conducted to evaluate the feasibility of the CBF method. The result shows that CBF has a high scoring speed, a small storage requirement, and an acceptable filtering accuracy. It specifically satisfies the real-time filtering requirements in cloud environment.
    Future Generation Computer Systems 09/2013; 29(7):1838–1850. · 1.86 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing provides massive computation power and storage capacity which enable users to deploy applications without infrastructure investment. Many privacy-sensitive applications like health services are built on cloud for economic benefits and operational convenience. Usually, data sets in these applications are anonymized to ensure data ownersʼ privacy, but the privacy requirements can be potentially violated when new data join over time. Most existing approaches address this problem via re-anonymizing all data sets from scratch after update or via anonymizing the new data incrementally according to the already anonymized data sets. However, privacy preservation over incremental data sets is still challenging in the context of cloud because most data sets are of huge volume and distributed across multiple storage nodes. Existing approaches suffer from poor scalability and inefficiency because they are centralized and access all data frequently when update occurs. In this paper, we propose an efficient quasi-identifier index based approach to ensure privacy preservation and achieve high data utility over incremental and distributed data sets on cloud. Quasi-identifiers, which represent the groups of anonymized data, are indexed for efficiency. An algorithm is designed to fulfil our approach accordingly. Evaluation results demonstrate that with our approach, the efficiency of privacy preservation on large-volume incremental data sets can be improved significantly over existing approaches.
    Journal of Computer and System Sciences 08/2013; 79(5):542–555. · 1.00 Impact Factor
  • Chang Liu, Xuyun Zhang, Chi Yang, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Instead of purchasing and maintaining their own computing infrastructure, scientists can now run data-intensive scientific applications in a hybrid environment such as cloud computing by facilitating its vast storage and computation capabilities. During the scheduling of such scientific applications for execution, various computation data flows will happen between the controller and computing server instances. Amongst various quality-of-service (QoS) metrics, data security is always one of the greatest concerns to scientists because their data may be intercepted or stolen by malicious parties during those data flows, especially for less secure hybrid cloud systems. An existing typical method for addressing this issue is to apply the Internet Key Exchange (IKE) scheme to generate and exchange session keys, and then to apply these keys for performing symmetric-key encryption which will encrypt those data flows. However, the IKE scheme suffers from low efficiency due to its asymmetric-key cryptological operations over a large amount of data and high-density operations which are exactly the characteristics of scientific applications. In this paper, we propose Cloud Computing Background Key Exchange (CCBKE), a novel authenticated key exchange scheme that aims at efficient security-aware scheduling of scientific applications. Our scheme is designed based on the randomness-reuse strategy and the Internet Key Exchange (IKE) scheme. Theoretical analyses and experimental results demonstrate that, compared with the IKE scheme, our CCBKE scheme can significantly improve the efficiency by dramatically reducing time consumption and computation load without sacrificing the level of security.
    Future Generation Computer Systems 07/2013; 29(5):1300–1308. · 1.86 Impact Factor
  • Shaoqian Zhang, Wanchun Dou, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Web service composition lets users create value-added composite Web services on existent services, where top-k composite services are helpful for users to find a satisfying composite service efficiently. However, with an increasing number of Web services and users' various composition preferences, computing top-k composite services dynamically for different users is difficult. In view of this challenge, a top-k composite services selection method is proposed, based on a preference-aware service dominance relationship. Concretely speaking, firstly, user preferences are modeled with the preference-aware service dominance, and then, in local service selection, a multi-index based algorithm is proposed, named Multi-Index, for computing candidate services of each task dynamically. Then, in global optimization, combined with a service lattice, top-k composite services are selected under a dominant number-aware service ranking. At last, an experiment is presented to verify our method.
    Web Services (ICWS), 2013 IEEE 20th International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(6):1234-1244. · 1.80 Impact Factor
  • Jinjun Chen, Rajiv Ranjan
    Concurrency and Computation Practice and Experience 01/2013; · 0.85 Impact Factor

Publication Stats

628 Citations
44.32 Total Impact Points

Institutions

  • 2014
    • Hunan University of Science and Technology
      Hunan, Fujian, China
  • 2012–2014
    • University of Technology Sydney 
      • Faculty of Engineering and Information Technology
      Sydney, New South Wales, Australia
    • Indiana University Bloomington
      Bloomington, Indiana, United States
  • 2004–2012
    • Swinburne University of Technology
      • • Faculty of Information & Communication Technologies
      • • Centre for Internet Computer and E-Commerce
      Melbourne, Victoria, Australia
  • 2011
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2010–2011
    • Nanjing University
      • State Key Laboratory for Novel Software Technology
      Nan-ching, Jiangsu Sheng, China
    • Rochester Institute of Technology
      Rochester, New York, United States