Zhihui Du

Zhihui Du
Breakthrough Prize Laureate
New Jersey Institute of Technology | NJIT · Department of Computer Science

Ph.D.

About

271
Publications
95,155
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
33,909
Citations

Publications

Publications (271)
Conference Paper
Full-text available
This paper introduces a novel, parallel, and scalable implementation of the VF2 algorithm for subgraph monomor-phism developed in the high-productivity language Chapel. Efficient graph analysis in large and complex network datasets is crucial across numerous scientific domains. We address this need through our enhanced VF2-PS implementation, widely...
Presentation
Full-text available
It is the explanation of the paper "Parallel Longest Common SubSequence Analysis In Chapel".
Conference Paper
Full-text available
Abstract—One of the most critical problems in the field of string algorithms is the longest common subsequence problem (LCS). The problem is NP-hard for an arbitrary number of strings but can be solved in polynomial time for a fixed number of strings. In this paper, we select a typical parallel LCS algorithm and integrate it into our large-scale st...
Chapter
The suffix array is a fundamental data structure to support string analysis efficiently. It took about 26 years for the sequential suffix array construction algorithm to achieve \(\mathcal {O}(n)\) time complexity and in-place sorting. In this paper, we develop the D-Limited Parallel Induce (DLPI) algorithm, the first \(\mathcal {O}(\frac{n}{p})\)...
Preprint
Full-text available
Counting and finding triangles in graphs is often used in real-world analytics for characterizing the cohesiveness and identifying communities in graphs. In this paper, we present novel sequential and parallel triangle counting algorithms based on identifying horizontal-edges in a breadth-first search (BFS) traversal of the graph. The BFS allows ou...
Article
Full-text available
This paper presents the Summed Parallel Infinite Impulse Response (SPIIR) pipeline used for public alerts during the third advanced LIGO and Virgo observation run (O3 run). The SPIIR pipeline uses infinite impulse response (IIR) filters to perform extremely low-latency matched filtering and this process is further accelerated with graphics processi...
Article
Full-text available
Detecting valuable anomalies with high accuracy and low latency from large amounts of streaming data is a challenge. This article focuses on a special kind of stream, the catalog stream, which has a high-level structure to analyze the stream effectively. We first formulate the anomaly detection in catalog streams as a constrained optimization probl...
Conference Paper
Full-text available
The transitive closure of a graph is a new graph where every vertex is directly connected to all vertices to which it had a path in the original graph. Transitive closures are useful for reachability and relationship querying. Finding the transitive closure can be computationally expensive and requires a large memory footprint as the output is typi...
Article
Identifying anomalies, especially weak anomalies in constantly changing targets, is more difficult than in stable targets. In this article, we borrow the dynamics metrics and propose the concept of dynamics signature (DS) in multi‐dimensional feature space to efficiently distinguish the abnormal event from the normal behaviors of a variable star. T...
Article
Full-text available
Data from emerging applications, such as cybersecurity and social networking, can be abstracted as graphs whose edges are updated sequentially in the form of a stream. The challenging problem of interactive graph stream analytics is the quick response of the queries on terabyte and beyond graph stream data from end users. In this paper, a succinct...
Preprint
Full-text available
This paper presents the SPIIR pipeline used for public alerts during the third advanced LIGO and Virgo run (O3 run). The SPIIR pipeline uses infinite impulse response (IIR) to perform zero-latency template filtering and this process is accelerated with graphics processing units (GPUs). It is the first online pipeline to use the coherent statistic,...
Chapter
As emerging applications become more and more distributed and decentralized, it has become a more challenging problem to design and build fault-tolerant network systems with high Quality of Service (QoS) guarantee. In this paper, an optimal replica placement problem is formulated in terms of minimizing the replica placement cost subject to both QoS...
Chapter
Time domain astronomical observation is developing towards a super large field of view and a very high cadence sampling and this requires that TB of star tables should be handled in realtime and PB offline data should be explored efficiently. A large class of typical scientific applications represented by time domain astronomy poses new challenges...
Article
Full-text available
IoT environment has a dynamic nature with high risks of confidentiality, integrity, and availability violations. The loss of information, denial of access, information leakage, collusion, technical failures, and data security breaches are difficult to predict and anticipate in advance. These types of non-stationarity are one of the main issues in t...
Article
Full-text available
Modern algorithms for symmetric and asymmetric encryptions are not suitable to provide security of data that needs data processing. They cannot perform calculations over encrypted data without first decrypting it when risks are high. Residue Number System (RNS) as a homomorphic encryption allows ensuring the confidentiality of the stored informatio...
Article
Early warning during sky survey provides a crucial opportunity to detect low-mass, free-floating planets. In particular, to search short-timescale microlensing (ML) events from high-cadence and wide- field survey in real time, a hybrid method which combines ARIMA (Autoregressive Integrated Moving Average) with LSTM (Long-Short Time Memory) and GRU...
Chapter
Big data has been an important analysis method anywhere we turn today. We hold broad recognition of the value of data, and products obtained through analyzing it. There are multiple steps to the data analysis pipeline, which can be abstracted as a framework provides universal parallel high-performance data analysis. Based on ray, this paper propose...
Chapter
Astronomers hope to give early warnings based on light-detection data when some celestial bodies may behave abnormal in the near future, which provides a new method to detect low-mass, free-floating planets. In particular, to search short-timescale microlensing (ML) events from high-cadence and wide-field survey in real time, we combined ARIMA with...
Chapter
In time-domain astronomy, STLF (Short-Timescale and Large Field-of-view) sky survey is the latest way of sky observation. Compared to traditional sky survey who can only find astronomical phenomena, STLF sky survey can even reveal how short astronomical phenomena evolve. The difference does not only lead the new survey data but also the new analysi...
Chapter
In time-domain astronomy, we need to use the relational database to manage star catalog data. With the development of sky survey technology, the size of star catalog data is larger, and the speed of data generation is faster. So, in this paper, we make a systematic and comprehensive introduction to process the data in time-domain astronomy, and val...
Article
Full-text available
An important operation for data processing is a number comparison. In Residue Number System (RNS), it consists of two steps: the computation of the positional characteristic of the number in RNS representation and comparison of its positional characteristics in the positional number system. In this paper, we propose a new efficient method to comput...
Article
Full-text available
Early warning during sky survey provides a crucial opportunity to detect low-mass, free-floating planets. In particular, to search short-timescale microlensing (ML) events from high-cadence and wide- field survey in real time, a hybrid method which combines ARIMA (Autoregressive Integrated Moving Average) with LSTM (Long-Short Time Memory) and GRU...
Chapter
It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and anal...
Article
Full-text available
Honeypots are designed to trap the attacker with the purpose of investigating its malicious behaviour. Owing to the increasing variety and sophistication of cyber attacks, how to capture high-quality attack data has become a challenge in the context of honeypot area. All-round honeypots, which mean a significant improvement in sensibility, counterm...
Book
This book constitutes the refereed proceedings of the First International Conference on Big Scientific Data Management, BigSDM 2018, held in Beijing, Greece, in November/December 2018. The 24 full papers presented together with 7 short papers were carefully reviewed and selected from 86 submissions. The topics involved application cases in the big...
Article
Directed networks find many applications in computer science, social science and biomedicine, among others. In this paper we propose a new graph mining algorithm that is capable of locating all frequent induced subgraphs in a given set of directed networks. We present an incremental coding scheme for representing the canonical form of a graph, stud...
Preprint
In time-domain astronomy, we need to use the relational database to manage star catalog data. With the development of sky survey technology, the size of star catalog data is larger, and the speed of data generation is faster. So, in this paper, we make a systematic and comprehensive introduction to process the data in time-domain astronomy, and val...
Preprint
Astronomy is well recognized as big data driven science. As the novel observation infrastructures are developed, the sky survey cycles have been shortened from a few days to a few seconds, causing data processing pressure to shift from offline to online. However, existing scientific databases focus on offline analysis of long-term historical data,...
Preprint
It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and anal...
Preprint
In time-domain astronomy, STLF (Short-Timescale and Large Field-of-view) sky survey is the latest way of sky observation. Compared to traditional sky survey who can only find astronomical phenomena, STLF sky survey can even reveal how short astronomical phenomena evolve. The difference does not only lead the new survey data but also the new analysi...
Conference Paper
Full-text available
Lightweight virtualization technology has emerged as an alternative to traditional hypervisor-based virtualization. Containers based on an operating system level virtualization have shown superior performance and more flexibility than virtual machines. Both factors encourage their fast adoption and wide use in cloud environments. Container technolo...
Article
To search short-timescale microlensing (ML) events (TE < 1 day) from high-cadence, wide-field survey in real time, we present an algorithm called NFD (normalized feature deviation) to monitor all the observed light curves and to alert abnormal deviation in the data stream of light curves. The NFD algorithm framework consists of three main modules:...
Article
Full-text available
Atomistic characterization of chemical element distribution is crucial to understanding the role of alloying elements for strengthening mechanism of superalloy. In the present work, the site preferences of two alloying elements X–Y in γ-Ni of Ni-based superalloy are systematically studied using first-principles calculations with and without spin-po...
Experiment Findings
A prediction for the World Cup match
Chapter
Latest astronomy projects observe the spacial objects with astronomical cameras generating images continuously. To identify transient objects, the position of these objects on the images need to be compared against a reference table on the same portion of the sky, which is a complex search task called cross match. We designed Euclidean-Zone (E-Zone...
Article
Low-latency detections of gravitational waves (GWs) from compact stellar binary coalescences are crucial to enable prompt follow-up observations to astrophysical transients by conventional telescopes, as demonstrated by the first joint GW and electromagnetic observations on July 17, 2017. Searching over the GW parameter space with the requirement o...
Article
Power profiling tools based on fast and accurate workload analysis can be useful for job scheduling and resource allocation aiming to optimize the power consumption of large-scale, high-performance computer systems. In this article, we propose a novel method for predicting the power consumption of a complete workload or application by extrapolating...
Conference Paper
Full-text available
Cloud data storages are functioning in the presence of the risks of confidentiality, integrity, and availability related with the loss of information, denial of access for a long time, information leakage, conspiracy and technical failures. In this paper, we provide analysis of reliable, scalable, and confidential distributed data storage based on...
Article
Full-text available
Learners participating in Massive Open Online Courses (MOOC) have a wide range of backgrounds and motivations. Many MOOC learners enroll in the courses to take a brief look; only a few go through the entire content, and even fewer are able to eventually obtain a certificate. We discovered this phenomenon after having examined 92 courses on both xue...
Article
Full-text available
A honeypot is a type of security facility deliberately created to be probed, attacked, and compromised. It is often used for protecting production systems by detecting and deflecting unauthorized accesses. It is also useful for investigating the behavior of attackers, and in particular, unknown attacks. For the past 17 years plenty of effort has bee...
Article
Full-text available
We describe a family of power models that can capture the nonuniform power effects of speed scaling among homogeneous cores on multicore processors. These models depart from traditional ones, which assume that individual cores contribute to power consumption as independent entities. In our approach, we remove this independence assumption and employ...
Article
With the wide deployment of cloud computing in many business enterprises as well as science and engineering domains, high quality security services are increasingly critical for processing workflow applications with sensitive intermediate data. Unfortunately, most existing worklfow scheduling approaches disregard the security requirements of the in...
Preprint
A honeypot is a type of security facility deliberately created to be probed, attacked and compromised. It is often used for protecting production systems by detecting and deflecting unauthorized accesses. It is also useful for investigating the behaviour of attackers, and in particular, unknown attacks. For the past 17 years much effort has been in...
Presentation
Supplementary Materials of Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds
Article
Full-text available
Low-latency detections of gravitational waves (GWs) are crucial to enable prompt follow-up observations to astrophysical transients by conventional telescopes. We have developed a low-latency pipeline using a technique called Summed Parallel Infinite Impulse Response (SPIIR) filtering, realized by a Graphic Processing Unit (GPU). In this paper, we...
Article
The first direct detection of gravitational wave has been realized by LIGO 100 years after Einstein’s theoretical prediction. It opens a new window for human to observe our Universe and initiates the age of Gravitational Wave Astronomy. The data analysis of gravitational wave detection is a typically signal extraction problem and the matched filter...
Research
Supplementary Materials of Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds
Research
Supplementary materials of Scheduling for Workflows with Security-Sensitive Intermediate Data by Selective Tasks Duplication in Clouds
Article
Full-text available
We present the results from an all-sky search for short-duration gravitational waves in the data of the first run of the Advanced LIGO detectors between September 2015 and January 2016. The search algorithms use minimal assumptions on the signal morphology, so they are sensitive to a wide range of sources emitting gravitational waves. The analyses...
Article
Full-text available
A gravitational-wave transient was identified in data recorded by the Advanced LIGO detectors on 2015 September 14. The event, initially designated G184098 and later given the name GW150914, is described in detail elsewhere. By prior arrangement, preliminary estimates of the time, significance, and sky location of the event were shared with 63 team...
Article
Full-text available
This Supplement provides supporting material for Abbott et al. (2016a). We briefly summarize past electromagnetic (EM) follow-up efforts as well as the organization and policy of the current EM follow-up program. We compare the four probability sky maps produced for the gravitational-wave transient GW150914, and provide additional details of the EM...
Article
Full-text available
On 14 September 2015, a gravitational wave signal from a coalescing black hole binary system was observed by the Advanced LIGO detectors. This paper describes the transient noise backgrounds used to determine the significance of the event (designated GW150914) and presents the results of investigations into potential correlated or uncorrelated sour...
Article
Full-text available
PPMLR-MHD is a new magnetohydrodynamics (MHD) model used to simulate the interactions of the solar wind with the magnetosphere, which has been proved to be the key element of the space weather cause-and-effect chain process from the Sun to Earth. Compared to existing MHD methods, PPMLR-MHD achieves the advantage of high order spatial accuracy and l...
Article
We present an archival search for transient gravitational-wave bursts in coincidence with 27 single-pulse triggers from Green Bank Telescope pulsar surveys, using the LIGO, Virgo, and GEO interferometer network. We also discuss a check for gravitational-wave signals in coincidence with Parkes fast radio bursts using similar methods. Data analyzed i...