Mengxia Zhu

Oak Ridge National Laboratory, Oak Ridge, FL, USA

Are you Mengxia Zhu?

Claim your profile

Publications (34)10.77 Total impact

  • Source
    Conference Proceeding: On optimization of scientific workflows to support streaming applications in distributed network environments
    [show abstract] [hide abstract]
    ABSTRACT: Large-scale data-intensive streaming applications in various science fields feature complex DAG-structured workflows comprised of distributed computing modules with intricate inter-module dependencies. Supporting such workflows in high-performance network environments and optimizing their throughput are crucial to collaborative scientific exploration and discovery. We formulate workflow mapping as a frame rate optimization problem and propose an efficient heuristic solution, which is integrated into the Condor-based Scientific Workflow Automation and Management Platform (SWAMP) in place of Condor's default mapping scheme. The SWAMP system is also augmented with several new components to improve the workflow management process. The performance superiority of the proposed solution is verified using both simulations and a real-life scientific workflow for climate modeling deployed in a distributed heterogeneous network environment.
    Workflows in Support of Large-Scale Science (WORKS), 2010 5th Workshop on; 12/2010
  • Conference Proceeding: Automation and management of scientific workflows in distributed network environments
    [show abstract] [hide abstract]
    ABSTRACT: Large-scale computation-intensive applications in various science fields feature complex DAG-structured workflows comprised of distributed computing modules with intricate inter-module dependencies. Supporting such workflows in heterogeneous network environments and optimizing their end-to-end performance are crucial to the success of large-scale collaborative scientific applications. We design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a set of easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in distributed network environments. The current version of SWAMP integrates the graphical user interface of Kepler to compose abstract workflows and employs Condor DAGMan for workflow dispatch and execution. SWAMP provides a web-based user interface to automate and manage workflow executions and uses a special workflow mapper to optimize the end-to-end workflow performance. A case study of the workflow for Spallation Neutron Source datasets in real networks is presented to show the efficacy of the proposed platform.
    Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on; 05/2010
  • Source
    Article: Fusion of threshold rules for target detection in wireless sensor networks.
    TOSN. 01/2010; 6.
  • Article: Integration of sensing and computing in an intelligent decision support system for homeland security defense.
    Pervasive and Mobile Computing. 01/2009; 5:182-200.
  • Source
    Conference Proceeding: Computational monitoring and steering using network-optimized visualization and Ajax web server
    Mengxia Zhu, Qishi Wu, N.S.V. Rao
    [show abstract] [hide abstract]
    ABSTRACT: We describe a system for computational monitoring and steering of an on-going computation or visualization on a remote host such as workstation or supercomputer. Unlike the conventional "launch-and-leave" batch computations, this system enables: (i) continuous monitoring of variables of an on-going remote computation using visualization tools, and (ii) interactive specification of chosen computational parameters to steer the computation. The visualization and control streams are supported over wide-area networks using transport protocols based on stochastic approximation methods to provide stable throughput. Using performance models for transport channels and visualization modules, we develop a visualization pipeline configuration solution that minimizes end-to-end delay over wide- area connections. The user interface utilizes Asynchronous JavaScript and XML (Ajax) technologies to provide an interactive environment that can be accessed by multiple remote users using web browsers. We present experimental results on a geographically distributed deployment to illustrate the effectiveness of the proposed system.
    Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on; 05/2008
  • Conference Proceeding: Optimizing network performance of computing pipelines in distributed environments
    Qishi Wu, Yi Gu, Mengxia Zhu, N.S.V. Rao
    [show abstract] [hide abstract]
    ABSTRACT: Supporting high performance computing pipelines over wide-area networks is critical to enabling large-scale distributed scientific applications that require fast responses for interactive operations or smooth flows for data streaming. We construct analytical cost models for computing modules, network nodes, and communication links to estimate the computing times on nodes and the data transport times over connections. Based on these time estimates, we present the efficient linear pipeline configuration method based on dynamic programming that partitions the pipeline modules into groups and strategically maps them onto a set of selected computing nodes in a network to achieve minimum end-to-end delay or maximum frame rate. We implemented this method and evaluated its effectiveness with experiments on a large set of simulated application pipelines and computing networks. The experimental results show that the proposed method outperforms the streamline and greedy algorithms. These results, together with polynomial computational complexity, make our method a potential scalable solution for large practical deployments.
    Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on; 05/2008
  • Source
    Conference Proceeding: On design of scheduling algorithms for advance bandwidth reservation in dedicated networks
    Yunyue Lin, Qishi Wu, N.S.V. Rao, Mengxia Zhu
    [show abstract] [hide abstract]
    ABSTRACT: There are an increasing number of high- performance networks that provision dedicated channels through circuit-switching or MPLS/GMPLS techniques to support large- scale data transfer. The available bandwidths on these dedicated links vary over time and therefore efficient bandwidth scheduling algorithms are needed to improve the utilization of network resources and satisfy diverse user requirements. Based on different path and bandwidth constraints, we formulate four instant scheduling problems for a data transfer request: (i) variable path with variable bandwidth (VPVB), (ii) fixed path with variable bandwidth (FPVB), (iii) variable path with fixed bandwidth (VPFB), and (iv) fixed path with fixed bandwidth (FPFB), with the common objective to minimize transfer end time for a given data size. We design an optimal algorithm for each of these scheduling problems with polynomial- or pseudo- polynomial-time complexity with respect to the network size and total number of time slots in a bandwidth reservation table.
    INFOCOM Workshops 2008, IEEE; 05/2008
  • Source
    Article: Self-Adaptive Configuration of Visualization Pipeline Over Wide-Area Networks
    [show abstract] [hide abstract]
    ABSTRACT: Next-generation scientific applications require the capability to visualize large archival data sets or on-going computer simulations of physical and other phenomena over wide-area network connections. To minimize the latency in interactive visualizations across wide-area networks, we propose an approach that adaptively decomposes and maps the visualization pipeline onto a set of strategically selected network nodes. This scheme is realized by grouping the modules that implement visualization and networking subtasks and mapping them onto computing nodes with possibly disparate computing capabilities and network connections. Using estimates for communication and processing times of subtasks, we present a polynomial-time algorithm to compute a decomposition and mapping to achieve minimum end-to-end delay of the visualization pipeline. We present experimental results using geographically distributed deployments to demonstrate the effectiveness of this method in visualizing data sets from three application domains.
    IEEE Transactions on Computers 02/2008; · 1.10 Impact Factor
  • Source
    Article: Assessment of data processing to improve reliability of microarray experiments using genomic DNA reference.
    [show abstract] [hide abstract]
    ABSTRACT: Using genomic DNA as common reference in microarray experiments has recently been tested by different laboratories. Conflicting results have been reported with regard to the reliability of microarray results using this method. To explain it, we hypothesize that data processing is a critical element that impacts the data quality. Microarray experiments were performed in a gamma-proteobacterium Shewanella oneidensis. Pair-wise comparison of three experimental conditions was obtained either with two labeled cDNA samples co-hybridized to the same array, or by employing Shewanella genomic DNA as a standard reference. Various data processing techniques were exploited to reduce the amount of inconsistency between both methods and the results were assessed. We discovered that data quality was significantly improved by imposing the constraint of minimal number of replicates, logarithmic transformation and random error analyses. These findings demonstrate that data processing significantly influences data quality, which provides an explanation for the conflicting evaluation in the literature. This work could serve as a guideline for microarray data analysis using genomic DNA as a standard reference.
    BMC Genomics 02/2008; 9 Suppl 2:S5. · 4.07 Impact Factor
  • Conference Proceeding: Efficient pipeline configuration in distributed heterogeneous computing environments.
    Proceedings of the Twenty-Seventh Annual ACM Symposium on Principles of Distributed Computing, PODC 2008, Toronto, Canada, August 18-21, 2008; 01/2008
  • Article: Optimal pipeline decomposition and adaptive network mapping to support distributed remote visualization.
    J. Parallel Distrib. Comput. 01/2007; 67:947-956.
  • Conference Proceeding: A Parallel Computing Approach to Decipher Transcription Network for Large-scale Microarray Datasets.
    Mengxia Zhu, Qishi Wu
    International Conference on Bioinformatics & Computational Biology, BIOCOMP 2007, Volume II, June 25-28, 2007, Las Vegas Nevada, USA; 01/2007
  • Conference Proceeding: A New Approach to Identify Functional Modules Using Random Matrix Theory
    [show abstract] [hide abstract]
    ABSTRACT: The advance in high-throughput genomic technologies including microarrays has generated a tremendous amount of gene expression data for the entire genome. Deciphering transcriptional networks that convey information on members of gene clusters and cluster interactions is a crucial analysis task in the post-sequence era. Most of the existing analysis methods for large-scale genome-wide gene expression profiles involve several steps that often require human intervention. We propose a random matrix theory-based approach to analyze the cross correlations of gene expression data in an entirely automatic and objective manner to eliminate the ambiguities and subjectivity inherent to human decisions. The correlations calculated from experimental measurements typically contain both "genuine" and "random" components. In the proposed approach, we remove the "random" component by testing the statistics of the eigenvalues of the correlation matrix against a "null hypothesis" - a truly random correlation matrix obtained from mutually uncorrelated expression data series. Our investigation on the components of deviating eigenvectors using varimax orthogonal rotation reveals distinct functional modules. We apply the proposed approach to the publicly available yeast cycle expression data and produce a transcriptional network that consists of interacting functional modules. The experimental results nicely conform to those obtained in previously published literatures
    Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on; 10/2006
  • Conference Proceeding: A Scalable Framework for Distributed Virtual Reality Using Heterogeneous Processors.
    Qishi Wu, Jinzhu Gao, Mengxia Zhu
    Advances in Artificial Reality and Tele-Existence, 16th International Conference on Artificial Reality and Telexistence, ICAT 2006, Hangzhou, China, November 29 - December 1, 2006, Proceedings; 01/2006
  • Conference Proceeding: System Design for On-Line Distributed Computational Visualization and Steering.
    Technologies for E-Learning and Digital Entertainment, First International Conference, Edutainment 2006, Hangzhou, China, April 16-19, 2006, Proceedings; 01/2006
  • Source
    Article: CHEETAH: circuit-switched high-speed end-to-end transport architecture testbed
    [show abstract] [hide abstract]
    ABSTRACT: Not Available
    IEEE Communications Magazine 09/2005; 43(8):s11- s17. · 3.79 Impact Factor
  • Source
    Article: On optimal mapping of visualization pipeline onto linear arrangement of network nodes
    [show abstract] [hide abstract]
    ABSTRACT: This paper discusses algorithmic and implementation issues of optimally mapping a visualization pipeline onto a linear arrangement of wide-area network nodes to minimize the total delay. The first network node typically is a data source, the last node could be a display device ranging from a personal computer to a powerwall, and each intermediate node could be a workstation or computational cluster. This mapping scheme appropriately distributes the filtering, geometry generation, rendering, and display modules of the visualization pipeline to the linear arrangement of network nodes to make efficient use of the computing resources at end nodes and also the network bandwidth between them. A regression based network daemon is developed to measure the available bandwidth on a transport link. We present an analytical formulation of this problem by taking into account the computational power of nodes, the bandwidths between them, and the sizes of messages exchanged between visualization modules. We propose a polynomial-time optimal algorithm that uses the dynamic programming method to compute the mapping with the minimum total delay. An OpenGL-based remote visualization system is implemented and deployed at three geographically distributed nodes for preliminary experiments.
    02/2005;
  • Source
    Conference Proceeding: Adaptive visualization pipeline decomposition and mapping onto computer networks
    Mengxia Zhu, Qishi Wu, N.S.V. Rao, S. Iyengar
    [show abstract] [hide abstract]
    ABSTRACT: This paper discusses algorithmic and implementation aspects of a remote visualization system, which adoptively decomposes and maps the visualization pipeline onto a wide-area network. Visualization pipeline modules such as filtering, geometry extraction, rendering, and display are dynamically assigned to network nodes to achieve minimal total delay or maximal frame rate. Polynomial-time optimal algorithms using the dynamic programming method to compute the optimal decomposition and mapping are proposed. We implemented an OpenGL-based remote visualization system. We evaluated its performance using a deployment at three geographically distributed nodes.
    Image and Graphics, 2004. Proceedings. Third International Conference on; 01/2005
  • Article: Fusion of Threshold Rules for Target Detection in Sensor Networks
    [show abstract] [hide abstract]
    ABSTRACT: We consider a network of sensors distributed in a target area providing environmental measurements that are subject to normally distributed, independent additive noise. Each sensor node applies a threshold rule to the measurements to decide the presence of a target; the distance to the target together with the threshold value de- termines its hit and false alarm probabilities or rates using a signal attenuation model. We propose a centralized threshold-OR fusion rule for combining the individual sensor node decisions. Under the statistical independence of sensor measurements, we derive fusion threshold bounds using Chebyshev's inequality based on individual hit and false alarm probabilities but without requiring a priori knowledge of the underlying probability distribu- tions. We derive conditions to ensure that the fused method achieves a higher hit rate and lower false alarm rate compared to the weighted averages of individual sensor parameters. The simulations using Monte Carlo method illustrate significant detection performance improvements of the proposed fusion approach. Categories and Subject Descriptors: I.2.9 (Artificial Intelligence ): Robotics-Sensors
    ACM Transactions on Sensor Networks - TOSN. 01/2005;
  • Source
    Article: Aspect-oriented design of sensor networks.
    J. Parallel Distrib. Comput. 01/2004; 64:853-865.

Institutions

  • 2008
    • Oak Ridge National Laboratory
      • Biosciences Division
      Oak Ridge, FL, USA
    • The University of Memphis
      • Department of Computer Science
      Memphis, TN, USA
  • 2006–2008
    • Southern Illinois University Carbondale
      • Department of Computer Science
      Carbondale, IL, USA
  • 2005
    • Louisiana State University
      • Department of Computer Science
      Baton Rouge, LA, USA