Conference Paper

Comparative Studies Simplified in GPFlow.

DOI: 10.1007/978-3-540-69389-5_56 Conference: Computational Science - ICCS 2008, 8th International Conference, Kraków, Poland, June 23-25, 2008, Proceedings, Part III
Source: DBLP


We present a novel, web-accessible scientific workflow system which makes large-scale comparative studies accessible without
programming or excessive configuration requirements. GPFlow allows a workflow defined on single input values to be automatically
lifted to operate over collections of input values and supports the formation and processing of collections of values without
the need for explicit iteration constructs. We introduce a new model for collection processing based on key aggregation and
slicing which guarantees processing integrity and facilitates automatic association of inputs, allowing scientific users to
manage the combinatorial explosion of data values inherent in large scale comparative studies. The approach is demonstrated
using a core task from comparative genomics, and builds upon our previous work in supporting combined interactive and batch
operation, through a lightweight web-based user interface.

2 Reads
  • Cladistics 01/1989; 5(2):164-166. DOI:10.1111/j.1096-0031.1989.tb00562.x · 6.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.
    Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6; 01/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Increasingly scientists are using collections of software tools in their research. These tools are typically used in concert, often necessitating laborious and error prone manual data reformatting and transfer. We present an intuitive workflow environment to support scientists with their research. The workflow, GPFlow, wraps legacy tools, presenting a high level, interactive web based frontend to scientists. The workflow backend is realized by a commercial grade workflow engine (BizTalk). The workflow model is inspired by spreadsheets and is novel in its support for an intuitive method of interaction as required by many scientists e.g. bioinformaticians. We apply GPFlow to two bioinformatics experiments and demonstrate its flexibility and simplicity.
    Grid and Cooperative Computing Workshops - GCC 2006, 5th International Conference, Changsha, Hunan, China, 21-23 October 2006, Proceedings; 01/2006
Show more