Jinjun Chen

Qufu Normal University, Küfow, Shandong Sheng, China

Are you Jinjun Chen?

Claim your profile

Publications (135)82.59 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Social networks, though started as a software tool enabling people to connect with each other, have emerged in recent times as platforms for businesses, individuals and government agencies to conduct a number of activities ranging from marketing to emergency situation management. As a result, a large number of social network analytics tools have been developed for a variety of applications. A snapshot of social networks at any particular time, called a social graph, represents the connectivity of nodes and potentially the flow of information amongst the nodes (or vertices) in the graph. Understanding the flow of information in a social graph plays an important role in social network applications. Two specific problems related to information flow have implications in many social network applications: (a) finding a minimum set of nodes one has to know to recover the whole graph (also known as the vertex cover problem) and (b) determining the minimum set of nodes one required to reach all nodes in the graph within a specific number of hops (we refer this as the vertex reach problem). Finding an optimal solution to these problems is NP-Hard. In this paper, we propose approximation based approaches and show that our approaches outperform existing approaches using both a theoretical analysis and experimental results.
    IEEE BigData Congress 2015, New York, USA; 06/2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets. For fast data error detection in big sensor data sets, in this paper, we develop a novel data error detection approach which exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. Specifically, in our proposed approach, the error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. Hence the detection and location process can be dramatically accelerated. Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. Through the experiment on our cloud computing platform of U-Cloud, it is demonstrated that our proposed approach can significantly reduce the time for error detection and location in big data sets generated by large scale sensor network systems with acceptable error detecting accuracy.
    IEEE Transactions on Parallel and Distributed Systems 02/2015; 26(2):329-339. DOI:10.1109/TPDS.2013.2295810 · 2.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems when processing or analysing such large-scale data. Moreover, most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users' preferences, and therefore fails to meet users' personalized requirements. In this paper, we propose a Keyword-Aware Service Recommendation method, named KASR, to address the above challenges. It aims at presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users' preferences, and a user-based Collaborative Filtering algorithm is adopted to generate appropriate recommendations. To improve its scalability and efficiency in big data environment, KASR is implemented on Hadoop, a widely-adopted distributed computing platform using the MapReduce parallel processing paradigm. Finally, extensive experiments are conducted on real-world data sets, and results demonstrate that KASR significantly improves the accuracy and scalability of service recommender systems over existing approaches.
    IEEE Transactions on Parallel and Distributed Systems 12/2014; 25(12):3221-3231. DOI:10.1109/TPDS.2013.2297117 · 2.17 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that processing big graph data can be costly on Cloud. Processing big graph data introduces complex and multiple iterations that raise challenges such as parallel memory bottlenecks, deadlocks, and inefficiency. To tackle the challenges, we propose a novel technique for effectively processing big graph data on Cloud. Specifically, the big data will be compressed with its spatiotemporal features on Cloud. By exploring spatial data correlation, we partition a graph data set into clusters. In a cluster, the workload can be shared by the inference based on time series similarity. By exploiting temporal correlation, in each time series or a single graph edge, temporal data compression is conducted. A novel data driven scheduling is also developed for data processing optimization. The experiment results demonstrate that the spatiotemporal compression and scheduling achieve significant performance gains in terms of data size and data fidelity loss.
    Journal of Computer and System Sciences 12/2014; 80(8). DOI:10.1016/j.jcss.2014.04.022 · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Big data is one of the most referred key words in recent information and communications technology industry. As the new-generation distributed computing platform, cloud environments offer high efficiency and low cost for data-intensive storage and computation for big data applications. Cloud resources and services are available in pay-as-you-go mode, which brings extraordinary flexibility and cost-effectiveness as well as minimal investments in their own computing infrastructure. However, these advantages come at a price—people no longer have direct control over their own data. Based on this view, data security becomes a major concern in the adoption of cloud computing. Authenticated key exchange is essential to a security system that is based on high-efficiency symmetric-key encryptions. With virtualisation technology being applied, existing key exchange schemes such as Internet key exchange become time consuming when directly deployed into cloud computing environment, especially for large-scale tasks that involve intensive user–cloud interactions, such as scheduling and data auditing. In this paper, we propose a novel hierarchical key exchange scheme, namely hierarchical key exchange for big data in cloud, which aims at providing efficient security-aware scheduling and auditing for cloud environments. In this novel key exchange scheme, we developed a two-phase layer-by-layer iterative key exchange strategy to achieve more efficient authenticated key exchange without sacrificing the level of data security. Both theoretical analysis and experimental results demonstrate that when deployed in cloud environments with diverse server layouts, efficiency of the proposed scheme is dramatically superior to its predecessors cloud computing background key exchange and Internet key exchange schemes. Copyright © 2014 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 11/2014; DOI:10.1002/cpe.3426 · 0.78 Impact Factor
  • Source
    Chang Liu, Chi Yang, Xuyun Zhang, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: As cloud computing is being widely adopted for big data processing, data security is becoming one of the major concerns of data owners. Data integrity is an important factor to ensure in almost any data and computation related context. It is not only one of the qualities of service, but also an important part of data security and privacy. With the proliferation of cloud computing and the increasing needs in analytics for big data such as data generated by the Internet of Things, verification of data integrity becomes increasingly important, especially on outsourced data. Therefore, research topics on external data integrity verification have attracted tremendous research interest in recent years. Among all the metrics, efficiency and security are two of the most concerned measurements. In this paper, we will bring forth a big picture through providing an analysis on authenticator-based data integrity verification techniques on cloud and Internet of Things data. We will analyze multiple aspects of the research problem. First, we illustrate the research problem by summarizing research motivations and methodologies. Second, we summarize and compare current achievements of several of the representative approaches. Finally, we introduce our view for possible future developments.
    Future Generation Computer Systems 08/2014; DOI:10.1016/j.future.2014.08.007 · 2.64 Impact Factor
  • Source
    Laurence T. Yang, Jinjun Chen
    08/2014; 1:2–3. DOI:10.1016/j.bdr.2014.08.001
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce is regarded as an adequate programming model for large-scale data-intensive applications. The Hadoop framework is a well-known MapReduce implementation that runs the MapReduce tasks on a cluster system. G-Hadoop is an extension of the Hadoop MapReduce framework with the functionality of allowing the MapReduce tasks to run on multiple clusters in a Grid environment. However, G-Hadoop simply reuses the user authentication and job submission mechanism of Hadoop, which is designed for a single cluster and hence does not suit for the Grid environment. This work proposes a new security model for G-Hadoop. The security model is based on several security solutions such as public key cryptography and the SSL protocol, and is dedicatedly designed for distributed environments like the Grid. This security framework simplifies the users authentication and job submission process of the current G-Hadoop implementation with a single-sign-on approach. In addition, the designed security framework provides a number of different security mechanisms to protect the G-Hadoop system from traditional attacks as well as abusing and misusing.
    Journal of Computer and System Sciences 08/2014; DOI:10.1016/j.jcss.2014.02.006 · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. Sub-tree data anonymization, achieving a good trade-off between data utility and information loss, is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data on cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce based algorithms for the two components (TDS and BUG) to gain high scalability by exploiting powerful computation capability of cloud. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.
    Journal of Computer and System Sciences 08/2014; 80(5). DOI:10.1016/j.jcss.2014.02.007 · 1.09 Impact Factor
  • Jinjun Chen, Jianxun Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: From the text: This special issue of Journal of Computer and System Sciences is devoted to papers on dependable and secure computing based on DASC2011 – The 9th IEEE International conference on dependable, autonomic and secure computing held in Sydney Australia from December 12–14, 2011.
    Journal of Computer and System Sciences 08/2014; 80(5). DOI:10.1016/j.jcss.2014.02.009 · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Community Cloud Computing is an emerging and promising computing model for a specific community with common concerns, such as security, compliance and jurisdiction. It utilizes the spare resources of networked computers to provide the facilities so that the community gains services from the cloud. The effective collaboration among the community clouds offers a powerful computing capacity for complex tasks containing the subtasks that need data exchange. Selecting the best group of community clouds that are the most economy-efficient, communication-efficient, secured, and trusted to accomplish a complex task is very challenging. To address this problem, we first formulate a computational model for multi-community-cloud collaboration, namely $MC^{3}$ . The proposed model is then optimized from four aspects: minimizing the sum of access cost and monetary cost, maximizing the security-level agreement and trust among the community clouds. Furthermore, an efficient and comprehensive selection algorithm is devised to extract the best group of community clouds in $MC^{3}$ . Finally, the extensive simulation experiments and performance analysis of the proposed algorithm are conducted. The results demonstrate that the proposed algorithm outperforms the minimal set coverings based algorithm and the random algorithm. Moreover, the proposed comprehensive community clouds selection algorithm can guarantee good global performance in terms of access cost, monetary cost, security level and trust between user and community clouds.
    IEEE Transactions on Services Computing 07/2014; 7(3):346-358. DOI:10.1109/TSC.2014.2304728 · 1.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.
    IEEE Transactions on Parallel and Distributed Systems 02/2014; 25(2):363-373. DOI:10.1109/TPDS.2013.48 · 2.17 Impact Factor
  • Xiao Liu, Yun Yang, Dong Yuan, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific processes are usually time constrained with overall deadlines and local milestones. In scientific workflow systems, due to the dynamic nature of the underlying computing infrastructures such as grid and cloud, execution delays often take place and result in a large number of temporal violations. Since temporal violation handling is expensive in terms of both monetary costs and time overheads, an essential question aroused is “do we need to handle every temporal violation in scientific workflow systems?” The answer would be “true” according to existing works on workflow temporal management which adopt the philosophy similar to the handling of functional exceptions, that is, every temporal violation should be handled whenever it is detected. However, based on our observation, the phenomenon of self-recovery where execution delays can be automatically compensated for by the saved execution time of subsequent workflow activities has been entirely overlooked. Therefore, considering the nonfunctional nature of temporal violations, our answer is “not necessarily true.” To take advantage of self-recovery, this article proposes a novel adaptive temporal violation handling point selection strategy where this phenomenon is effectively utilised to avoid unnecessary temporal violation handling. Based on simulations of both real-world scientific workflows and randomly generated test cases, the experimental results demonstrate that our strategy can significantly reduce the cost on temporal violation handling by over 96% while maintaining extreme low violation rate under normal circumstances.
    ACM Transactions on Software Engineering and Methodology 02/2014; 23(1). DOI:10.1145/2559938 · 1.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the increase in cloud service providers, and the increasing number of compute services offered, a migration of information systems to the cloud demands selecting the best mix of compute services and virtual machine (VM ) images from an abundance of possibilities. Therefore, a migration process for web applications has to automate evaluation and, in doing so, ensure that Quality of Service (QoS) requirements are met, while satisfying conflicting selection criteria like throughput and cost. When selecting compute services for multiple connected software components, web application engineers must consider heterogeneous sets of criteria and complex dependencies across multiple layers, which is impossible to resolve manually. The previously proposed CloudGenius framework has proven its capability to support migrations of single-component web applications. In this paper, we expand on the additional complexity of facilitating migration support for multi-component web applications. In particular, we present an evolutionary migration process for web application clusters distributed over multiple locations, and clearly identify the most important criteria relevant to the selection problem. Moreover, we present a multi-criteria-based selection algorithm based on Analytic Hierarchy Process (AHP). Because the solution space grows exponentially, we developed a Genetic Algorithm (GA)-based approach to cope with computational complexities in a growing cloud market. Furthermore, a use case example proofs CloudGenius’ applicability. To conduct experiments, we implemented CumulusGenius, a prototype of the selection algorithm and the GA deployable on hadoop clusters. Experiments with CumulusGenius give insights on time complexities and the quality of the GA.
    IEEE Transactions on Computers 01/2014; 64(5):1-1. DOI:10.1109/TC.2014.2317188 · 1.47 Impact Factor
  • Lianyong Qi, Wanchun Dou, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Cloud computing has rendered its ever-increasing advantages in flexible service provisions, which attracts the attentions from large-scale enterprise applications to small-scale smart uses. For example, more and more multimedia services are moving towards cloud to better accommodate people’s daily uses on various smart devices that support cloud, some of which are similar or equivalent in their functionality (e.g., more than 1,000 video services that share similar “video-play” functionality are present in APP Store). In this situation, it is necessary to discriminate these functional-equivalent multimedia services, based on their Quality of Service (QoS) information. However, due to the abundant information of multimedia content, dozens of QoS criteria are often needed to evaluate a multimedia service, which places a heavy burden on users’ multimedia service selection. Besides, the QoS criteria of multimedia services are usually not independent, but correlated, which cannot be accommodated very well by the traditional selection methods, e.g., traditional simple weighting methods. In view of these challenges, we put forward a multimedia service selection method based on weighted Principal Component Analysis (PCA), i.e., Weighted PCA-based Multimedia Service Selection Method (W_PCA_MSSM). The advantage of our proposal is two-fold. First, weighted PCA could reduce the number of QoS criteria for evaluation, by which the service selection process is simplified. Second, PCA could eliminate the correlations between different QoS criteria, which may bring a more accurate service selection result. Finally, the feasibility of W_PCA_MSSM is validated, by a set of experiments deployed on real-world service quality set QWS Dataset.
    Computing 01/2014; DOI:10.1007/s00607-014-0413-x · 1.06 Impact Factor
  • Source
    Jinjun Chen, Jianxun Liu
    Concurrency and Computation Practice and Experience 12/2013; 25(18). DOI:10.1002/cpe.3092 · 0.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Due to its advantages of cost-effectiveness, on-demand provisioning and easy for sharing, cloud computing has grown in popularity with the research community for deploying scientific applications such as workflows. Although such interests continue growing and scientific workflows are widely deployed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place application datasets across globally distributed data centers and schedule tasks according to the data layout to reduce both latency and makespan for workflow execution. In this paper, by utilizing dependencies among datasets and tasks, we propose an efficient data and task coscheduling strategy that can place input datasets in a load balance way and meanwhile, group the mostly related datasets and tasks together. Moreover, data staging is used to overlap task execution with data transmission in order to shorten the start time of tasks. We build a simulation environment on Tianhe supercomputer for evaluating the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve scheduling performance while reducing the total volume of data transfer across data centers. Concurrency and Computation: Practice and Experience, 2013.© 2013 Wiley Periodicals, Inc.
    Concurrency and Computation Practice and Experience 12/2013; 25(18). DOI:10.1002/cpe.3084 · 0.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy-sensitive data sets on cloud probably engenders severe privacy concerns because of multi-tenancy. Data encryption and anonymization are two widely-adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost-effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud-based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof-of-concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost-effective framework for privacy preservation can anonymize large-scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost-effective fashion. Copyright © 2013 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 12/2013; 25(18). DOI:10.1002/cpe.3083 · 0.78 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data integrity is an important factor to ensure in almost any data and computation related context. It serves not only as one of the qualities of service, but also an important part of data security and privacy. With the proliferation of cloud computing and the increasing needs in big data analytics, verification of data integrity becomes increasingly important, especially on outsourced data. Therefore, research topics related to data integrity verification have attracted tremendous research interest. Among all the metrics, efficiency and security are two of the most concerned measurements. In this paper, we provide an analysis on authenticator-based efficient data integrity verification. we will analyze and provide a survey on the main aspects of this research problem, summarize the research motivations, methodologies as well as main achievements of several of the representative approaches, then try to bring forth a blueprint for possible future developments.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013
  • Xinzhi Wang, Xiangfeng Luo, Jinjun Chen
    [Show abstract] [Hide abstract]
    ABSTRACT: Sentimental analyses of the public have been attracting increasing attentions from researchers. This paper focuses on the research problem of social sentiment detection, which aims to identify the sentiments of the public evoked by online microblogs. A general social sentiment model is proposed for this task. The general social sentiment model combining society and phycology knowledge are employed to measure social sentiment state. Then, we detail computation of sentiment vector to extract sentiment distribution of blogger on event. Besides, social state for events are computed based on the general social sentiment model and sentiment vectors. Furthermore, we certify that social sentiment are not independent but are correlated with each other heterogeneously in different events. The dependencies between sentiments can provide guidance in decision-making for government or organization. At last experiments on two real-world collections of events microblogs are conducted to prove the performance of our method.
    Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering; 12/2013

Publication Stats

1k Citations
82.59 Total Impact Points

Institutions

  • 2014
    • Qufu Normal University
      Küfow, Shandong Sheng, China
  • 2011–2014
    • University of Technology Sydney
      • Faculty of Engineering and Information Technology
      Sydney, New South Wales, Australia
  • 2004–2013
    • Swinburne University of Technology
      • • Faculty of Information & Communication Technologies
      • • Centre for Internet Computer and E-Commerce
      Melbourne, Victoria, Australia
  • 2010–2011
    • Nanjing University
      • State Key Laboratory for Novel Software Technology
      Nan-ching, Jiangsu Sheng, China