-
[show abstract]
[hide abstract]
ABSTRACT: Large-scale data-intensive streaming applications in various science fields feature complex DAG-structured workflows comprised of distributed computing modules with intricate inter-module dependencies. Supporting such workflows in high-performance network environments and optimizing their throughput are crucial to collaborative scientific exploration and discovery. We formulate workflow mapping as a frame rate optimization problem and propose an efficient heuristic solution, which is integrated into the Condor-based Scientific Workflow Automation and Management Platform (SWAMP) in place of Condor's default mapping scheme. The SWAMP system is also augmented with several new components to improve the workflow management process. The performance superiority of the proposed solution is verified using both simulations and a real-life scientific workflow for climate modeling deployed in a distributed heterogeneous network environment.
Workflows in Support of Large-Scale Science (WORKS), 2010 5th Workshop on; 12/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Large-scale computation-intensive applications in various science fields feature complex DAG-structured workflows comprised of distributed computing modules with intricate inter-module dependencies. Supporting such workflows in heterogeneous network environments and optimizing their end-to-end performance are crucial to the success of large-scale collaborative scientific applications. We design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a set of easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in distributed network environments. The current version of SWAMP integrates the graphical user interface of Kepler to compose abstract workflows and employs Condor DAGMan for workflow dispatch and execution. SWAMP provides a web-based user interface to automate and manage workflow executions and uses a special workflow mapper to optimize the end-to-end workflow performance. A case study of the workflow for Spallation Neutron Source datasets in real networks is presented to show the efficacy of the proposed platform.
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on; 05/2010
-
TOSN. 01/2010; 6.
-
Pervasive and Mobile Computing. 01/2009; 5:182-200.
-
[show abstract]
[hide abstract]
ABSTRACT: We describe a system for computational monitoring and steering of an on-going computation or visualization on a remote host such as workstation or supercomputer. Unlike the conventional "launch-and-leave" batch computations, this system enables: (i) continuous monitoring of variables of an on-going remote computation using visualization tools, and (ii) interactive specification of chosen computational parameters to steer the computation. The visualization and control streams are supported over wide-area networks using transport protocols based on stochastic approximation methods to provide stable throughput. Using performance models for transport channels and visualization modules, we develop a visualization pipeline configuration solution that minimizes end-to-end delay over wide- area connections. The user interface utilizes Asynchronous JavaScript and XML (Ajax) technologies to provide an interactive environment that can be accessed by multiple remote users using web browsers. We present experimental results on a geographically distributed deployment to illustrate the effectiveness of the proposed system.
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on; 05/2008
-
[show abstract]
[hide abstract]
ABSTRACT: Supporting high performance computing pipelines over wide-area networks is critical to enabling large-scale distributed scientific applications that require fast responses for interactive operations or smooth flows for data streaming. We construct analytical cost models for computing modules, network nodes, and communication links to estimate the computing times on nodes and the data transport times over connections. Based on these time estimates, we present the efficient linear pipeline configuration method based on dynamic programming that partitions the pipeline modules into groups and strategically maps them onto a set of selected computing nodes in a network to achieve minimum end-to-end delay or maximum frame rate. We implemented this method and evaluated its effectiveness with experiments on a large set of simulated application pipelines and computing networks. The experimental results show that the proposed method outperforms the streamline and greedy algorithms. These results, together with polynomial computational complexity, make our method a potential scalable solution for large practical deployments.
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on; 05/2008
-
[show abstract]
[hide abstract]
ABSTRACT: There are an increasing number of high- performance networks that provision dedicated channels through circuit-switching or MPLS/GMPLS techniques to support large- scale data transfer. The available bandwidths on these dedicated links vary over time and therefore efficient bandwidth scheduling algorithms are needed to improve the utilization of network resources and satisfy diverse user requirements. Based on different path and bandwidth constraints, we formulate four instant scheduling problems for a data transfer request: (i) variable path with variable bandwidth (VPVB), (ii) fixed path with variable bandwidth (FPVB), (iii) variable path with fixed bandwidth (VPFB), and (iv) fixed path with fixed bandwidth (FPFB), with the common objective to minimize transfer end time for a given data size. We design an optimal algorithm for each of these scheduling problems with polynomial- or pseudo- polynomial-time complexity with respect to the network size and total number of time slots in a bandwidth reservation table.
INFOCOM Workshops 2008, IEEE; 05/2008
-
[show abstract]
[hide abstract]
ABSTRACT: Next-generation scientific applications require the capability to visualize large archival data sets or on-going computer simulations of physical and other phenomena over wide-area network connections. To minimize the latency in interactive visualizations across wide-area networks, we propose an approach that adaptively decomposes and maps the visualization pipeline onto a set of strategically selected network nodes. This scheme is realized by grouping the modules that implement visualization and networking subtasks and mapping them onto computing nodes with possibly disparate computing capabilities and network connections. Using estimates for communication and processing times of subtasks, we present a polynomial-time algorithm to compute a decomposition and mapping to achieve minimum end-to-end delay of the visualization pipeline. We present experimental results using geographically distributed deployments to demonstrate the effectiveness of this method in visualizing data sets from three application domains.
IEEE Transactions on Computers 02/2008; · 1.10 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Using genomic DNA as common reference in microarray experiments has recently been tested by different laboratories. Conflicting results have been reported with regard to the reliability of microarray results using this method. To explain it, we hypothesize that data processing is a critical element that impacts the data quality.
Microarray experiments were performed in a gamma-proteobacterium Shewanella oneidensis. Pair-wise comparison of three experimental conditions was obtained either with two labeled cDNA samples co-hybridized to the same array, or by employing Shewanella genomic DNA as a standard reference. Various data processing techniques were exploited to reduce the amount of inconsistency between both methods and the results were assessed. We discovered that data quality was significantly improved by imposing the constraint of minimal number of replicates, logarithmic transformation and random error analyses.
These findings demonstrate that data processing significantly influences data quality, which provides an explanation for the conflicting evaluation in the literature. This work could serve as a guideline for microarray data analysis using genomic DNA as a standard reference.
BMC Genomics 02/2008; 9 Suppl 2:S5. · 4.07 Impact Factor
-
Proceedings of the Twenty-Seventh Annual ACM Symposium on Principles of Distributed Computing, PODC 2008, Toronto, Canada, August 18-21, 2008; 01/2008
-
J. Parallel Distrib. Comput. 01/2007; 67:947-956.
-
International Conference on Bioinformatics & Computational Biology, BIOCOMP 2007, Volume II, June 25-28, 2007, Las Vegas Nevada, USA; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: The advance in high-throughput genomic technologies including microarrays has generated a tremendous amount of gene expression data for the entire genome. Deciphering transcriptional networks that convey information on members of gene clusters and cluster interactions is a crucial analysis task in the post-sequence era. Most of the existing analysis methods for large-scale genome-wide gene expression profiles involve several steps that often require human intervention. We propose a random matrix theory-based approach to analyze the cross correlations of gene expression data in an entirely automatic and objective manner to eliminate the ambiguities and subjectivity inherent to human decisions. The correlations calculated from experimental measurements typically contain both "genuine" and "random" components. In the proposed approach, we remove the "random" component by testing the statistics of the eigenvalues of the correlation matrix against a "null hypothesis" - a truly random correlation matrix obtained from mutually uncorrelated expression data series. Our investigation on the components of deviating eigenvectors using varimax orthogonal rotation reveals distinct functional modules. We apply the proposed approach to the publicly available yeast cycle expression data and produce a transcriptional network that consists of interacting functional modules. The experimental results nicely conform to those obtained in previously published literatures
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on; 10/2006
-
Advances in Artificial Reality and Tele-Existence, 16th International Conference on Artificial Reality and Telexistence, ICAT 2006, Hangzhou, China, November 29 - December 1, 2006, Proceedings; 01/2006
-
Technologies for E-Learning and Digital Entertainment, First International Conference, Edutainment 2006, Hangzhou, China, April 16-19, 2006, Proceedings; 01/2006
-
IEEE Communications Magazine 09/2005; 43(8):s11- s17. · 3.79 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: This paper discusses algorithmic and implementation issues of optimally mapping a visualization pipeline onto a linear arrangement of wide-area network nodes to minimize the total delay. The first network node typically is a data source, the last node could be a display device ranging from a personal computer to a powerwall, and each intermediate node could be a workstation or computational cluster. This mapping scheme appropriately distributes the filtering, geometry generation, rendering, and display modules of the visualization pipeline to the linear arrangement of network nodes to make efficient use of the computing resources at end nodes and also the network bandwidth between them. A regression based network daemon is developed to measure the available bandwidth on a transport link. We present an analytical formulation of this problem by taking into account the computational power of nodes, the bandwidths between them, and the sizes of messages exchanged between visualization modules. We propose a polynomial-time optimal algorithm that uses the dynamic programming method to compute the mapping with the minimum total delay. An OpenGL-based remote visualization system is implemented and deployed at three geographically distributed nodes for preliminary experiments.
02/2005;
-
[show abstract]
[hide abstract]
ABSTRACT: This paper discusses algorithmic and implementation aspects of a remote visualization system, which adoptively decomposes and maps the visualization pipeline onto a wide-area network. Visualization pipeline modules such as filtering, geometry extraction, rendering, and display are dynamically assigned to network nodes to achieve minimal total delay or maximal frame rate. Polynomial-time optimal algorithms using the dynamic programming method to compute the optimal decomposition and mapping are proposed. We implemented an OpenGL-based remote visualization system. We evaluated its performance using a deployment at three geographically distributed nodes.
Image and Graphics, 2004. Proceedings. Third International Conference on; 01/2005
-
[show abstract]
[hide abstract]
ABSTRACT: We consider a network of sensors distributed in a target area providing environmental measurements that are subject to normally distributed, independent additive noise. Each sensor node applies a threshold rule to the measurements to decide the presence of a target; the distance to the target together with the threshold value de- termines its hit and false alarm probabilities or rates using a signal attenuation model. We propose a centralized threshold-OR fusion rule for combining the individual sensor node decisions. Under the statistical independence of sensor measurements, we derive fusion threshold bounds using Chebyshev's inequality based on individual hit and false alarm probabilities but without requiring a priori knowledge of the underlying probability distribu- tions. We derive conditions to ensure that the fused method achieves a higher hit rate and lower false alarm rate compared to the weighted averages of individual sensor parameters. The simulations using Monte Carlo method illustrate significant detection performance improvements of the proposed fusion approach. Categories and Subject Descriptors: I.2.9 (Artificial Intelligence ): Robotics-Sensors
ACM Transactions on Sensor Networks - TOSN. 01/2005;
-
J. Parallel Distrib. Comput. 01/2004; 64:853-865.