Jun Li

Soochow University (PRC), Wu-hsien, Jiangsu Sheng, China

Are you Jun Li?

Claim your profile

Publications (5)2.39 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a join query processing algorithm CoLocationHashMapJoin (CHMJ). First the study designs a multi-copy consistency hash algorithm. The algorithm distributes the data of tables over the cluster according to the hash values of the join property, which improves the data locality while ensure data availability. Second, based on the multi-copy consistency hash algorithm, the study proposes a parallel join query processing algorithm called HashMapJoin. HashMapJoin improves the efficiency of join query significantly. CHMJ has been used in Tencent's data warehouse system, and plays an important role in Tencent's daily analysis tasks. The results show that CHMJ improves the efficiency of join query processing by five times comparing to Hive.
    Journal of Software 08/2012; 23(8):2032-2041. DOI:10.3724/SP.J.1001.2012.04124
  • [Show abstract] [Hide abstract]
    ABSTRACT: As organizations start to use data intensive cluster computing systems like Hadoop MapReduce to handle large-scale data, scheduling of jobs become very important in order to achieve efficiency. In the default implementations of Hadoop MapReduce, jobs are scheduled in FIFO order. It easily causes the starvation of small jobs in the event of resources being utilized by large jobs, while Fair Scheduler is inefficient when handling large jobs and it leads to sticky slots problem. In this paper, we proposed a new job scheduling algorithm TDWS. The scheduling algorithm takes account characters of different applications to meet their different needs. In addition, it is also highly robust to heterogeneity and easy to achieve optimal data locality. The experiments demonstrate the feasibility and efficiency of our solution.
    Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on; 06/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: There is a growing interest in designing high-speed network devices to perform packet processing at stream layer. However, TCP processing for 10G backbone traffic is not just to address performance problem but also to cope with abnormal conditions. Some characteristics of real traffic, especially the lack of finish tag for many streams and the complexity of packets reordering, will result in memory exhaustion for hardware-based TCP subsystem which is less flexible for exceptional processing. In this paper, we present a hardware design for backbone traffic which is capable of processing 10G with TCP reassembly and tracking states of millions of parallel TCP streams. The solution has several features: (1) an effective, easy hardware implementation stream replacement algorithm for massive stream table (2) fast one round access to global stream table which enable 10MPPS processing (3) an active release policy for out-of-order data buffers management (4) a design of linkless data structure which ensures time limit for worst case processing. The simulation result shows that the system can process over 99% of the 10G Backbone traffic using reasonable storage resources. A FPGA-based prototype is also implemented for evaluation.
    Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data-intensive applications are increasingly designed to execute on large computing clusters. Our previous observation on Tencent production systems has indicated that join query is one of the most important queries in large-scale data processing. When running a join query on Hive system, the job of the join query is divided into map phase and reduce phase, and requires transferring large amounts of intermediate results over the network, which is inefficient. In this paper, we proposed an algorithm called CHMJ, the general idea of the algorithm is to take advantage of data locality to accelerate calculation. It includes four parts, Data distribution strategy, Parallel HashMapJoin Algorithm, CoLocation Scheduling and Delay scheduling strategy. CHMJ has been adopted in Tencent data warehouse, and plays an important role in Tencent's daily operations. Our relevant experiments demonstrate the feasibility and efficiency of our solution.
    Computers and Communications (ISCC), 2012 IEEE Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: ZFPs (Zinc Finger Proteins) play important roles in various cellular functions, including transcriptional activation, transcriptional repression, cell proliferation, and development. C(2)H(2) (Cys-Cys-His-His motif) ZFPs are the most abundant proteins among the founding members of the ZFP super family in eukaryotes. In this study, we isolate a novel C(2)H(2) ZNF (Zinc Finger) gene ZNFD. It contains an ORF (Open Reading Frame) with a length of 990 bp, encoding 329 amino acids. The predicted protein contains a C(2)H(2) zinc finger. RT-PCR analysis in 18 human adult tissues indicated that it was expressed in five human adult tissues. Green fluorescence protein localization analysis showed that human ZNFD was located in the nucleus of Hela cells. Overexpression of ZNFD in the COS7 cells activates the transcriptional activities of AP1(PMA) (Activator of protein 1, that responds specifically to phorbol ester). Together the data indicate that ZNFD is probably a new type of C(2)H(2) ZFP and the ZNFD protein may act as a transcriptional activator in PKC (protein kinase C) signal pathway to mediate cellular functions.
    Molecular and Cellular Biochemistry 02/2010; 340(1-2):63-71. DOI:10.1007/s11010-010-0401-1 · 2.39 Impact Factor