Jinjun Chen

Hunan University of Science and Technology, Hunan, Fujian, China

Are you Jinjun Chen?

Claim your profile

Publications (111)34.58 Total impact

  • [show abstract] [hide abstract]
    ABSTRACT: A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.
    IEEE Transactions on Parallel and Distributed Systems 01/2014; 25(2):363-373. · 1.80 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: MapReduce is regarded as an adequate programming model for large-scale data-intensive applications. The Hadoop framework is a well-known MapReduce implementation that runs the MapReduce tasks on a cluster system. G-Hadoop is an extension of the Hadoop MapReduce framework with the functionality of allowing the MapReduce tasks to run on multiple clusters in a Grid environment. However, G-Hadoop simply reuses the user authentication and job submission mechanism of Hadoop, which is designed for a single cluster and hence does not suit for the Grid environment. This work proposes a new security model for G-Hadoop. The security model is based on several security solutions such as public key cryptography and the SSL protocol, and is dedicatedly designed for distributed environments like the Grid. This security framework simplifies the users authentication and job submission process of the current G-Hadoop implementation with a single-sign-on approach. In addition, the designed security framework provides a number of different security mechanisms to protect the G-Hadoop system from traditional attacks as well as abusing and misusing.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. Sub-tree data anonymization, achieving a good trade-off between data utility and information loss, is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data on cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce based algorithms for the two components (TDS and BUG) to gain high scalability by exploiting powerful computation capability of cloud. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • Jinjun Chen, Jianxun Liu
    Journal of Computer and System Sciences 01/2014; · 1.00 Impact Factor
  • Jinjun Chen, Jianxun Liu
    Concurrency and Computation Practice and Experience 12/2013; 25(18). · 0.85 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: SUMMARY The employment of batch processing in workflow is to model and schedule activity instances in multiple workflow cases of the same workflow type to optimize business processes execution dynamically. Although our previous works have preliminarily investigated its model and implementation, it is still necessary to deal with its model design problem. Process mining techniques allow for the automated discovery of process models from event logs and have received notable attentions in researches recently. Following these researches, this paper proposes an approach to mine batch processing workflow models from event logs by considering the batch processing relations among activity instances in multiple workflow cases. The notion of batch processing feature and its corresponding mining algorithm are also presented for discovering the batch processing area in the model by using the input and output data information of activity instances in events. The algorithms presented in this paper can help to enhance the applicability of existing process mining approaches and broaden the process mining spectrum. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 09/2013; 25(13). · 0.85 Impact Factor
  • Wanchun Dou, Qi Chen, Jinjun Chen
    [show abstract] [hide abstract]
    ABSTRACT: Distributed Denial-of-Service attack (DDoS) is a major threat for cloud environment. Traditional defending approaches cannot be easily applied in cloud security due to their relatively low efficiency, large storage, to name a few. In view of this challenge, a Confidence-Based Filtering method, named CBF, is investigated for cloud computing environment, in this paper. Concretely speaking, the method is deployed by two periods, i.e., non-attack period and attack period. More specially, legitimate packets are collected in the non-attack period, for extracting attribute pairs to generate a nominal profile. With the nominal profile, the CBF method is promoted by calculating the score of a particular packet in the attack period, to determine whether to discard it or not. At last, extensive simulations are conducted to evaluate the feasibility of the CBF method. The result shows that CBF has a high scoring speed, a small storage requirement, and an acceptable filtering accuracy. It specifically satisfies the real-time filtering requirements in cloud environment.
    Future Generation Computer Systems 09/2013; 29(7):1838–1850. · 1.86 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Cloud computing provides massive computation power and storage capacity which enable users to deploy applications without infrastructure investment. Many privacy-sensitive applications like health services are built on cloud for economic benefits and operational convenience. Usually, data sets in these applications are anonymized to ensure data ownersʼ privacy, but the privacy requirements can be potentially violated when new data join over time. Most existing approaches address this problem via re-anonymizing all data sets from scratch after update or via anonymizing the new data incrementally according to the already anonymized data sets. However, privacy preservation over incremental data sets is still challenging in the context of cloud because most data sets are of huge volume and distributed across multiple storage nodes. Existing approaches suffer from poor scalability and inefficiency because they are centralized and access all data frequently when update occurs. In this paper, we propose an efficient quasi-identifier index based approach to ensure privacy preservation and achieve high data utility over incremental and distributed data sets on cloud. Quasi-identifiers, which represent the groups of anonymized data, are indexed for efficiency. An algorithm is designed to fulfil our approach accordingly. Evaluation results demonstrate that with our approach, the efficiency of privacy preservation on large-volume incremental data sets can be improved significantly over existing approaches.
    Journal of Computer and System Sciences 08/2013; 79(5):542–555. · 1.00 Impact Factor
  • Chang Liu, Xuyun Zhang, Chi Yang, Jinjun Chen
    [show abstract] [hide abstract]
    ABSTRACT: Instead of purchasing and maintaining their own computing infrastructure, scientists can now run data-intensive scientific applications in a hybrid environment such as cloud computing by facilitating its vast storage and computation capabilities. During the scheduling of such scientific applications for execution, various computation data flows will happen between the controller and computing server instances. Amongst various quality-of-service (QoS) metrics, data security is always one of the greatest concerns to scientists because their data may be intercepted or stolen by malicious parties during those data flows, especially for less secure hybrid cloud systems. An existing typical method for addressing this issue is to apply the Internet Key Exchange (IKE) scheme to generate and exchange session keys, and then to apply these keys for performing symmetric-key encryption which will encrypt those data flows. However, the IKE scheme suffers from low efficiency due to its asymmetric-key cryptological operations over a large amount of data and high-density operations which are exactly the characteristics of scientific applications. In this paper, we propose Cloud Computing Background Key Exchange (CCBKE), a novel authenticated key exchange scheme that aims at efficient security-aware scheduling of scientific applications. Our scheme is designed based on the randomness-reuse strategy and the Internet Key Exchange (IKE) scheme. Theoretical analyses and experimental results demonstrate that, compared with the IKE scheme, our CCBKE scheme can significantly improve the efficiency by dramatically reducing time consumption and computation load without sacrificing the level of security.
    Future Generation Computer Systems 07/2013; 29(5):1300–1308. · 1.86 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. Sub-tree data anonymization, achieving a good trade-off between data utility and distortion, is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data on cloud. Still, both TDS and BUG suffer from poor performance for certain value of k-anonymity parameter if they are utilized individually. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce based algorithms for two components (TDS and BUG) to gain high scalability by exploiting powerful computation capability of cloud. Experiment evaluations demonstrate that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on; 01/2013
  • [show abstract] [hide abstract]
    ABSTRACT: As the new-generation distributed computing platform, cloud computing environments offer high efficiency and low cost for data-intensive computation in big data applications. Cloud resources and services are available in pay-as-you-go mode, which brings extraordinary flexibility and cost-effectiveness as well as zero investment in their own computing infrastructure. However, these advantages come at a price-people no longer have direct control over their own data. Based on this view, data security becomes a major concern in the adoption of cloud computing. Authenticated Key Exchange (AKE) is essential to a security system that is based on high efficiency symmetric-key encryption. With virtualization technology being applied, existing key exchange schemes such as Internet Key Exchange (IKE) becomes time-consuming when directly deployed into cloud computing environment. In this paper we propose a novel hierarchical key exchange scheme, namely Cloud Background Hierarchical Key Exchange (CBHKE). Based on our previous work, CBHKE aims at providing secure and efficient scheduling for cloud computing environment. In our new scheme, we design a two-phase layer-by-layer iterative key exchange strategy to achieve more efficient AKE without sacrificing the level of data security. Both theoretical analysis and experimental results demonstrate that when deployed in cloud computing environment, efficiency of the proposed scheme is dramatically superior to its predecessors CCBKE and IKE schemes.
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on; 01/2013
  • Shaoqian Zhang, Wanchun Dou, Jinjun Chen
    [show abstract] [hide abstract]
    ABSTRACT: Web service composition lets users create value-added composite Web services on existent services, where top-k composite services are helpful for users to find a satisfying composite service efficiently. However, with an increasing number of Web services and users' various composition preferences, computing top-k composite services dynamically for different users is difficult. In view of this challenge, a top-k composite services selection method is proposed, based on a preference-aware service dominance relationship. Concretely speaking, firstly, user preferences are modeled with the preference-aware service dominance, and then, in local service selection, a multi-index based algorithm is proposed, named Multi-Index, for computing candidate services of each task dynamically. Then, in global optimization, combined with a service lattice, top-k composite services are selected under a dominant number-aware service ranking. At last, an experiment is presented to verify our method.
    Web Services (ICWS), 2013 IEEE 20th International Conference on; 01/2013
  • [show abstract] [hide abstract]
    ABSTRACT: The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using Map Reduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.
    Cloud and Green Computing (CGC), 2013 Third International Conference on; 01/2013
  • [show abstract] [hide abstract]
    ABSTRACT: Cloud computing provides massive computation power and storage capacity which enable users to deploy computation and data-intensive applications without infrastructure investment. Along the processing of such applications, a large volume of intermediate data sets will be generated, and often stored to save the cost of recomputing them. However, preserving the privacy of intermediate data sets becomes a challenging problem because adversaries may recover privacy-sensitive information by analyzing multiple intermediate data sets. Encrypting ALL data sets in cloud is widely adopted in existing approaches to address this challenge. But we argue that encrypting all intermediate data sets are neither efficient nor cost-effective because it is very time consuming and costly for data-intensive applications to en/decrypt data sets frequently while performing any operation on them. In this paper, we propose a novel upper bound privacy leakage constraint-based approach to identify which intermediate data sets need to be encrypted and which do not, so that privacy-preserving cost can be saved while the privacy requirements of data holders can still be satisfied. Evaluation results demonstrate that the privacy-preserving cost of intermediate data sets can be significantly reduced with our approach over existing ones where all data sets are encrypted.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(6):1192-1202. · 1.80 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Massive computation power and storage capacity of cloud computing systems allow scientists to deploy computation and data intensive applications without infrastructure investment, where large application data sets can be stored in the cloud. Based on the pay-as-you-go model, storage strategies and benchmarking approaches have been developed for cost-effectively storing large volume of generated application data sets in the cloud. However, they are either insufficiently cost-effective for the storage or impractical to be used at runtime. In this paper, toward achieving the minimum cost benchmark, we propose a novel highly cost-effective and practical storage strategy that can automatically decide whether a generated data set should be stored or not at runtime in the cloud. The main focus of this strategy is the local-optimization for the tradeoff between computation and storage, while secondarily also taking users' (optional) preferences on storage into consideration. Both theoretical analysis and simulations conducted on general (random) data sets as well as specific real world applications with Amazon's cost model show that the cost-effectiveness of our strategy is close to or even the same as the minimum cost benchmark, and the efficiency is very high for practical runtime utilization in the cloud.
    IEEE Transactions on Parallel and Distributed Systems 01/2013; 24(6):1234-1244. · 1.80 Impact Factor
  • Jinjun Chen, Rajiv Ranjan
    Concurrency and Computation Practice and Experience 01/2013; · 0.85 Impact Factor
  • 01/2013;
  • Gaofeng Zhang, Yun Yang, Jinjun Chen
    [show abstract] [hide abstract]
    ABSTRACT: Cloud computing promises a service-oriented environment where customers can utilise IT services in a pay-as-you-go fashion while saving huge capital investments on their own IT infrastructures. Due to the openness, malicious service providers may exist in these environments. Some of these service providers could record service data in cloud service processes about a customer and then collectively deduce the customer's private information without authorisation. Noise obfuscation is an effective approach in this regard by utilising noise data. For example, it can generate and inject noise service requests into real customer service requests so that service providers are not able to distinguish which ones are real ones. However, existing typical noise obfuscations do not consider the customer-defined privacy-leakage-tolerance in noise obfuscation processes. Specifically, cloud customers could define a boundary of privacy leakage possibility to require noise obfuscation on privacy protection in cloud computing. In other words, under this boundary -- privacy-leakage-tolerance, noise obfuscation could be enhanced by the efficiency improvement on privacy protection, such as reducing noise service requests injected into real ones. So, the customer can obtain a lower cost on noise data in the pay-as-you-go fashion for cloud environments, with a reasonable effectiveness of privacy protection. Therefore, to address this privacy concern, a novel noise enhancing strategy can be presented. We firstly analyse the privacy-leakage-tolerance for cloud customers in terms of noise generation. Then, the creation of a noise generation set can be presented based on the privacy-leakage-tolerance, and the set can guide and enhance existing noise generation strategies by this boundary. Lastly, we present our novel privacy-leakage-tolerance based noise enhancing strategy for privacy protection in cloud computing. The simulation evaluation demonstrates that our strategy can significantly imp- ove the efficiency of privacy protection on existing noise obfuscations in cloud environments.
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on; 01/2013
  • [show abstract] [hide abstract]
    ABSTRACT: With the increasing popularity of cloud computing technologies, more and more service composition processes are enacted and executed in could environment. Compared with the various and approximately infinite application requirements from end users, the web services held by a cloud platform are usually limited. Therefore, it is often a challenging effort to develop a service composition, in such a situation that only part of the functional qualified candidate services could be found inside a cloud platform. In this situation, the absent services will be invocated in a cross-platform way outside the cloud platform. In view of this challenge, a QoS-aware composition method is investigated for supporting cross-platform service invocation in cloud environment. Furthermore, some experiments are deployed to evaluate the method presented in this paper.
    Journal of Computer and System Sciences 09/2012; 78(5):1316–1329. · 1.00 Impact Factor
  • Jinjun Chen, Lizhe Wang
    Journal of Computer and System Sciences 09/2012; 78(5):1279. · 1.00 Impact Factor

Publication Stats

492 Citations
2k Downloads
34.58 Total Impact Points

Institutions

  • 2014
    • Hunan University of Science and Technology
      Hunan, Fujian, China
  • 2012–2014
    • University of Technology Sydney 
      • Faculty of Engineering and Information Technology
      Sydney, New South Wales, Australia
    • Indiana University Bloomington
      Bloomington, Indiana, United States
  • 2004–2012
    • Swinburne University of Technology
      • • Faculty of Information & Communication Technologies
      • • Centre for Internet Computer and E-Commerce
      Melbourne, Victoria, Australia
  • 2011
    • University of Pittsburgh
      Pittsburgh, Pennsylvania, United States
  • 2010
    • Nanjing University
      • State Key Laboratory for Novel Software Technology
      Nan-ching, Jiangsu Sheng, China
    • Rochester Institute of Technology
      Rochester, New York, United States