Kenneth Chiu

Binghamton University, Binghamton, NY, United States

Are you Kenneth Chiu?

Claim your profile

Publications (82)11.07 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Computationally-demanding, parallel coupled models are crucial to understanding many important multi-physics/multiscale phenomena. Load-balancing such simulation son large clusters is often done through off-line, static means that often require significant manual input. Dynamic, runtime load-balancing has been shown in our previous work to be effective, but we still used a manually generated performance predictor to guide the load-balancing decisions. In this paper, we show how timing and interaction information obtained by instrumenting the middleware can be used to automatically generate a performance predictor that relates the overall execution time to the execution time of each individual sub model. The performance predictor is evaluated through the new coupled model benchmark employing five constituent sub models that simulates the CCSM coupled climate model.
    Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on; 01/2013
  • Source
    Daihee Kim, J. Walter Larson, Kenneth Chiu
    [Show abstract] [Hide abstract]
    ABSTRACT: Achieving ultra scalability in coupled multiphysics and multiscale models requires dynamic load balancing both within and between their constituent subsystems. Interconstituent dynamic load balance requires runtime resizing -- or malleability -- of subsystem processing element (PE) cohorts. We enhance the Malleable Model Coupling Toolkit's Load Balance Manager (LBM) to incorporate prediction of a coupled system's constituent computation times and coupled model global iteration time. The prediction system employs piecewise linear and cubic spline interpolation of timing measurements to guide constituent cohort resizing. Performance studies of the new LBM using a simplified coupled model test bed similar to a coupled climate model show dramatic improvement ( 77%) in the LBM's convergence rate.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic load balancing both within and between constituent subsystems is required to achieve ultrascalability in coupled multiphysics and multiscale models. Interconstituent dynamic load balancing requires runtime resizing-or malleability-of subsystem processing element (PE) cohorts. In our previous work, we developed and introduced the Malleable Model Coupling Toolkit with a load balance manager implementing and providing a runtime load-balancing algorithm using PE reallocation across subsystems. In this paper, we extend that work by adding the ability to adapt to coupled models that have changing loads during execution. We evaluate the algorithm through a synthetic coupled-model benchmark that uses the LogP performance model as applied to parallel LU decomposition.
    Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on; 01/2012
  • Source
    Daihee Kim, Jay Walter Larson, Kenneth Chiu
    [Show abstract] [Hide abstract]
    ABSTRACT: Model coupling is a method to simulate complex multiphysics and multiscale phenomena. Most approaches involve static data distribution among processes without the consideration of top-level dynamic load balancing. Malleability, the ability to change the number of processes during execution, allows applications to configure themselves to better utilize available system resources. To date, however, malleability has been applied primarily to monolithic applications. We have extended the Model Coupling Toolkit (MCT) to support processing element malleability for coupled models, resulting in the Malleable Model Coupling Toolkit (MMCT). MMCT consists of a load balance manager (LBM) implementing a practical dynamic load-balancing algorithm and a malleable model registry that allows management of dynamically evolving MPI communicators. MMCT requires only standard MPI-2, sockets, and MCT. We benchmark MMCT using a synthetic, simplified coupled model application similar to the Community Climate System Model. Preliminary performance data demonstrate the efficacy of the LBM and a low (≈3%) monitoring overhead.
    Procedia CS. 01/2011; 4:312-321.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: XML has seen wide acceptance in a number of application domains, and contributed to the success of wide-scale grid and scientific computing environments. Performance, however, is still an issue, and limits adoption under some situations where it might otherwise be able to provide significant interoperability, flexibility, and extensibility. As CPUs increasingly have multiple cores, parallel XML parsing can help to address this concern. This paper explores the use of speculation to improve the performance of parallel XML parsing. Building on previous work, we use an initial preparsing stage to build a sketch of the document which we called the skeleton. This skeleton contains enough information so that we can then proceed to do the full parse in parallel using unmodified libxml2. The preparsing itself is parallelized using product machines which we call p-DFAs. During execution, unlikely possibilities are discarded in favor of more likely ones. Statistics are gathered to decide which possibilities are not likely. The results show good performance and scalability on both a 30 CPU Sun E6500 machine running Solaris and a Linux machine with two Intel Xeon L5320 CPUs for a total of 8 physical cores.
    High Performance Computing (HiPC), 2009 International Conference on; 01/2010
  • Yibo Sun, Beilan Wang, Kenneth Chiu
    [Show abstract] [Hide abstract]
    ABSTRACT: In the technique known as network coordinates, the network latency between nodes is modeled as the distance between points in a metric space. Actual network latencies, however, exhibit numerous triangle inequality violations, which result in significant error between the actual latency and the distance as determined by the network coordinates. In this work, we show how graph clustering techniques can be used to find regions of the network space that show low triangle inequality violation within the region. By using techniques to increase the relative edge density in these regions, we improve the accuracy of network coordinates in these regions. We reduce the relative error within a cluster by 15% on average for the Meridian dataset, and by 7% over all; when compared to a single spring relaxation over the whole network.
    Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, Pisa, Italy, February 17-19, 2010; 01/2010
  • Ying Zhang, Yinfei Pan, Kenneth Chiu
    [Show abstract] [Hide abstract]
    ABSTRACT: The importance of XPath in XML filtering systems has led to a significant body of research on improving the processing performance of XPath queries. Most of the work, however, has been in the context of a single processing core. Given the prevalence of multicore processors, we believe that a parallel approach can provide significant benefits for a number of application scenarios. In this paper we thus investigate the use of multiple threads to concurrently process XPath queries on a shared incoming XML document. Using an approach that builds on YFilter, we divide the NFA into several smaller ones for concurrent processing. We implement and test two strategies for load balancing: a static approach and a dynamic approach. We test our approach on an eight-core machine, and show that it provides reasonable speedup up to eight cores.
    IEEE 16th International Conference on Parallel and Distributed Systems, ICPADS 2010, 8-10 Dec. 2010, Shanghai, China; 01/2010
  • Source
    Aquatic Biology 01/2010; 9(9):193-202. · 1.45 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Research in large-scale distributed systems, such as P2P systems, often relies critically on simulations to validate research results. Though systems such as PlanetLab can be used to test on real networks in some cases, there are still significant practical challenges to evaluating large-scale distributed systems on actual hardware. Actual measured datasets such as the ones measured with King's method also have an important role, but often do not provide enough scale or are not representative of the network for which the research is intended. Typically, in such cases, tools such as GT-ITM are used to generate network topologies for the evaluation simulation. These tools work reasonably well at generating physical topologies that are representative of real systems. The behavior of distributed systems, however, depends not just on the physical topologies, but also the routing policies and other factors that affect latency and bandwidth. These aspects may have a considerable impact on any evaluation performed on the generated network, and can lead to significant differences between simulated performance and actual performance. In particular, triangle inequality violations and path inflation can adversely impact large-scale distributed systems. In this paper, we present techniques and approaches for adding such real-world effects to generated networks, in a parametrized approach. We show the parameters can be varied to generate a variety of networks with different characteristics, and compare them to measured datasets.
    IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2010, Taipei, Taiwan, 6-9 September 2010; 01/2010
  • Market-Oriented Grid and Utility Computing, 11/2009: pages 29 - 48; , ISBN: 9780470455432
  • Grid Computing: Infrastructure, Service, and Application, Edited by L. Wang and W. Jie, 01/2009: chapter Cyberinfrastructure in New York State: pages 31-54; CRC Press.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific applications like neuroscience data analysis are usually compute and data-intensive. With the use of the additional capacity offered by distributed resources and suitable middlewares, we can achieve much shorter execution time, distribute compute and storage load, and add greater flexibility to the execution of these scientific applications than we could ever achieve in a single compute resource. In this paper, we present the processing of image registration (IR) for functional magnetic resonance imaging studies on Global Grids. We characterize the application, list its requirements and then transform it to a workflow. We use Gridbus Broker and Gridbus Workflow Engine technologies for executing the neuroscience application on the Grid. We developed a complete web-based portal integrating GUI-based workflow editor, execution management, monitoring and visualization of tasks and resources. We describe each component of the system in detail. We then execute the application on Grid'5000 platform and present extensive performance results. We show that the IR application can have (1) significantly improved makespan, (2) distribution of compute and storage load among resources used, and (3) flexibility when executing multiple times on Grid resources. Copyright © 2009 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2009; 21:2118-2139. · 0.85 Impact Factor
  • Kenneth Chiu, Geoffrey Fox
    [Show abstract] [Hide abstract]
    ABSTRACT: Grid workflows are executed on diverse resources whose interactions are highly complex and hardly predicted. Often the user and the workflow middleware services want to be informed about the performance behavior of workflows, as early as possible, so ...
    Future Generation Comp. Syst. 01/2009; 25:444-445.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Scientific applications like neuroscience data analysis are usually compute and data-intensive. With the use of the additional capacity offered by distributed resources and suitable middlewares, we can achieve much shorter exe- cution time, distribute compute and storage load, and add greater flexibility to the execution of these scientific appli- cations than we could ever achieve in a single compute re- source. In this paper, we present the processing of Image Reg- istration (IR) for Functional Magnetic Resonance Imaging (fMRI) studies on Global Grids. We characterize the ap- plication, list its requirements and then transform it to a workflow. We then execute the application on Grid'5000 platform and present extensive performance results. We show that the IR application can have 1) significantly im- proved makespan, 2) distribution of compute and storage load among resources used, and 3) flexibility when execut- ing multiple times on the Grid with the use of suitable mid- dlewares.
    The IEEE 23rd International Conference on Advanced Information Networking and Applications, AINA 2009, Bradford, United Kingdom, May 26-29, 2009; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is clear from the number of e-science projects involving remote access to instruments that remote access and facility sharing is an emerging area of concern. In this chapter we discuss the Common Instrument Middleware Architecture (CIMA) project funded by the US National Science Foundation (NSF) through the NSF Middleware Initiative. CIMA aims at providing a middleware set to facilitate remoting instruments and sensors as community resources and building collaborations around shared facilities. We provide an overview of CIMA, discuss its relationship to some other instrument remote access projects, examine our experiences in developing and applying the architecture and look forward to future development directions for CIMA and its applications.
    12/2008: pages 393-407;
  • [Show abstract] [Hide abstract]
    ABSTRACT: By leveraging the growing prevalence of multicore CPUs, parallel XML parsing(PXP) can significantly improve the performance of XML, enhancing its suitability for scientific data which is often dominated by floating-point numbers. One approach is to divide the XML document into equal-sized chunks, and parse each chunk in parallel. XML parsing is inherently sequential, however, because the state of an XML parser when reading a given character depends potentially on all preceding characters. In previous work, we addressed this by using a fast preparsing scan to build an outline of the document which we called the skeleton. The skeleton is then used to guide the parallel full parse. The preparse is a sequential phase that limits scalability, however, and so in this paper, we show how the preparse itself can be parallelized using a mechanism we call a meta-DFA. For each state q of the original preparser the meta-DFA incorporates a complete copy of the preparser state machine as a sub-DFA which starts in state q. The meta-DFA thus runs multiple instances of the preparser simultaneously when parsing a chunk, with each possible preparser state at the beginning of a chunk represented by an instance. By pursuing all possibilities simultaneously, the meta-DFA allows each chunk to be preparsed independently in parallel. The parallel full parse following the preparse is performed using libxml2, and outputs DOM trees that are fully compatible with existing applications that use libxml2. Our implementation scales well on a 30 CPU Sun E6500 machine.
    e-Science and Grid Computing, IEEE International Conference on; 01/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: A two component portal system is being developed for collaborative remote instrument and data control and monitoring. The system builds on and enhances the common instrument middleware architecture (CIMA) model for Web services based monitoring of remote scientific instruments and sensors. The architecture supports remote access to multiple instruments from a single portal. Plugin modules are used to provide flexibility and re-use, and the notion of plugin control is being developed. The use of Web 2.0 Pushlet and AJAX technologies has been introduced for push based portlet refresh and updating. An X3D based 3D virtual representation of the instrument provides data collection simulation and (pseudo) real time instrument representation. An important component of the system is a Webs services driven portlet for collaborative image viewing.
    e-Science and Grid Computing, IEEE International Conference on; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Wireless sensor networks can be embedded within complex environments for a wide range of monitoring and control applications including study of fire spread in woods, contamination spread in water bodies, or diffusion of toxic gases through air. In these applications, the collected data is used to drive a forecasting model, typically consisting of a CFD simulation. Traditionally, the data collection process is fixed. Active coupling between the sensor and the model, however, can significantly improve the accuracy, timeliness, and efficiency of forecasting. While there has been significant work in this area, there has not yet been a systematic analysis of how to represent the complex environments, how to represent the model, and how to architect the system. In this paper, we present a generalization of data-coupling applications, and describe a framework for decomposing models, which we apply to a simple example of water contamination.
    Intelligent Sensors, Sensor Networks and Information, 2007. ISSNIP 2007. 3rd International Conference on; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Freshwater lakes provide a number of important ecosystem services such as supply of drinking water, support of biotic diversity, transportation of commercial goods, and opportunity for recreation. Wireless sensor networks allow continuous, fine-grained, in situ measurements of key variables such as water temperature, dissolved gases, pH, conductivity, and chlorophyll. Instrumenting lakes with sensors capable of sampling environmental variables is becoming a standard practice. Furthermore, many limnologists around the world are interested in getting access to and performing research on data collected from lakes around the globe to provide local, regional and even global understanding of lake ecosystems. To that end, a number of limnologists, information technology experts, and engineers have joined forces to create a new, grassroots, international network, the Global Lake Ecological Observatory Network. One of our goals is to build a global scalable, persistent network of lake ecology observatories. However, implementing and designing technology that meets requirements of a large-scale distributed observing systems such as GLEON has, thus far, been challenging and instructive. In this paper, we describe several key conceptual challenges in building GLEON network. We also describe several practical issues and lessons learned during operation of a typical GLEON site.
    Intelligent Sensors, Sensor Networks and Information, 2007. ISSNIP 2007. 3rd International Conference on; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present cyberinfrastructure and grid computing efforts in New York State. In particular, we focus on fundamental efforts in Binghamton and Buffalo, including the design, development, and deployment of the New York State Grid, as well as a grass-roots New York State Initiative.
    22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, Miami, Florida USA, April 14-18, 2008; 01/2008

Publication Stats

1k Citations
11.07 Total Impact Points

Institutions

  • 2006–2010
    • Binghamton University
      • Department of Computer Science
      Binghamton, NY, United States
  • 2009
    • Victoria University of Wellington
      Wellington, Wellington, New Zealand
  • 2006–2008
    • State University of New York
      New York City, New York, United States
  • 2005
    • Indiana University South Bend
      South Bend, Indiana, United States
  • 2002
    • Indiana University Bloomington
      • Department of Computer Science
      Bloomington, IN, United States