Article

Special Issue: Concurrency and Computation: Practice and Experience from the Microsoft eScience Workshop

If you want to read the PDF, try requesting it from the authors.

Abstract

Facilitating time to discovery by scientists through computing is an idea that has been around for a significant amount of time. Often, however, domain scientists and computer scientists work in parallel rather than in collaboration with one another. It has been a goal of this workshop (the fifth of its kind) to showcase the examples of such collaborations or tools that enable these collaborations to flourish. This editorial provides an overview of papers requested from a sampling of applied scientific collaborations that were presented at the 2008 Microsoft eScience Workshop. Copyright © 2010 John Wiley & Sons, Ltd.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
In the large-scale geography information system application, the distributed file system is the key approach for data storage and retrieving, but the traditional distributed spatial index's efficiency is low. We present a dynamic pyramid Rtree(DPR) index for the massive geospatial files management, the method is based on the pyramid spatial index and Rtree index, which can be adjusted with the data distributed dynamically, we described index building and data accessing, the experimentation result shows that our method's efficiency in distributed file system is higher than Hilbert tree index and Quad tree index in range query.
Article
Full-text available
We are now seeing governments and funding agencies looking at ways to increase the value and pace of scientific research through increased or open access to both data and publications. In this point of view article, we wish to look at another aspect of these twin revolutions, namely, how to enable developers, designers and researchers to build intuitive,multimodal, user-centric, scientific applications that can aid and enable scientific research.
Conference Paper
Full-text available
Bioinformatics is dominated by on-line databases and sophisticated web-accessible tools. As such, it is ideally placed to benefit from the rapid, purpose specific combination of services and tools achievable via web mashups. The recent introduction of a number of sophisticated frameworks has greatly simplified the mashup creation process, making them accessible to scientists with limited programming expertise. We investigate the feasibility of mashups as a new approach to bioinformatic experimentation, focusing on an exploratory niche between interactive web usage and robust workflows and attempting to identify the range of computations for which mashups may be employed. While we treat each of the major frameworks, we illustrate the ideas with a series of examples developed under the Popfly framework.
Article
Full-text available
By making research content more reusable, and providing a social infrastructure which facilitates sharing, the human aspects of the scholarly knowledge cycle may be accelerated and ‘time-to-discovery’ reduced. We propose that the key to this is the sharing of methods and processes. We present myExperiment, a social web site for discovering, sharing and curating Scientific Workflows and experiment plans, and describe how myExperiment facilitates the management and sharing of research workflows, supports a social model for content curation tailored to the researcher and community, and supports Open Science by exposing content and functionality to the users’ tools and applications. Based on this we introduce the notion of the Research Object – the work objects that are built, transformed and published in the course of scientific experiments – and suggest that by encapsulating methods with results we can achieve research that is more reusable and repeatable and hence rapid and robust.
Conference Paper
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of cloud computing, collaboration services, and research repositories will be discussed.
Article
Bioinformatics is dominated by online databases and sophisticated web-accessible tools. As such, it is ideally placed to benefit from the rapid, purpose specific combination of services achievable via web mashups. The recent introduction of a number of sophisticated frameworks has greatly simplified the mashup creation process, making them accessible to scientists with limited programming expertise. In this paper we investigate the feasibility of mashups as a new approach to bioinformatic experimentation, focusing on an exploratory niche between interactive web usage and robust workflows, and attempting to identify the range of computations for which mashups may be employed. While we treat each of the major frameworks, we illustrate the ideas with a series of examples developed under the Popfly framework‡. Copyright © 2010 John Wiley & Sons, Ltd.
Article
The cancer Biomedical Informatics Grid (caBIG) is revolutionizing the way medical researchers share information and collaborate. A key to caBIG's continued success will be interoperability. However, to date, only a single code base (in Java) has been used to create a set of tools and run-time services for caBIG. This paper presents the first significant exploration into the use of Microsoft's .NET Framework and Visual Studio for caBIG. Given its substantial existing community, a .NET-based set of tools for caBIG can significantly increase the pool of qualified software designers and developers for caBIG. Arguably more importantly, a second development foundation could facilitate revisiting a broad set of design decisions made to date in caBIG that have perhaps been unduly based directly or indirectly on a single underlying software technology. We begin by describing issues we have encountered in building relatively simple .NET-based clients to existing caBIG services. Next, we describe how we leverage Microsoft ADO.NET Data Services as the foundation for caBIG Data Services, in particular for the caBIO data set. We find that ADO.NET Data Services has a uniquely strong potential to facilitate rapid development and deployment. We conclude with a discussion of the roadmap of our project's future activities. Copyright © 2010 John Wiley & Sons, Ltd.
Article
Carbon-climate, like other environmental sciences, has been changing. Large-scale synthesis studies are becoming more common. These synthesis studies are often conducted by science teams that are geographically distributed and on data sets that are global in scale. A broad array of collaboration and data analytics tools are now available that could support these science teams. However, building tools that scientists actually use is difficult. Also, moving scientists from an informal collaboration structure to one mediated by technology often exposes inconsistencies in the understanding of the rules of engagement between collaborators. We have developed a scientific collaboration portal, called fluxdata.org, which serves the community of scientists providing and analyzing the global FLUXNET carbon-flux synthesis data set. The key things we learned or re-learned during our portal development include: minimize the barrier to entry, provide features on a just-in-time basis, development of requirements is an on-going process, provide incentives to change leaders and leverage the opportunity they represent, automate as much as possible, and you can only learn how to make it better if people depend on it enough to give you feedback. In addition, we also learned that splitting the portal roles between scientists and computer scientists improved user adoption and trust. The fluxdata.org portal has now been in operation for ∼2 years and has become central to the FLUXNET synthesis efforts. Published in 2010 by John Wiley & Sons, Ltd.
Article
Scientists face many severe challenges in extracting value from the increasingly large volumes of data they generate. In this paper we describe the requirements we have derived from working across a wide range of e-science projects. In particular, the CARMEN neuroinformatics project has exposed a range of challenges due to a need to analyse and share large volumes of data. We have identified the four key activities required by scientists with whom we work, and designed an integrated system - e-Science Central - to provide them. This exploits three emerging technologies: Software as a Service to avoid the need for users to deploy and maintain any of their own software; Social Networking to allow users to collaborate by sharing data, services and workflows in a controlled manner; and Cloud Computing to provide scalable compute resources. The system can be used through any web browser, but also provides an API so applications can build on the core functionality. We describe the requirements, and the design that flows from them. This includes data storage with in-built versioning and signing, an in-browser workflow editor, and a job scheduling system that allows workflows to be run both on local "private" clouds and the Microsoft Azure Cloud.
Article
The combination of low-cost in situ sensors, internet connectivity, and commodity computing is changing earth science research. This unprecedented data availability is enabling science synthesis; studies that span disciplines, bridge local field, modeling and remote sensing methodologies and-or span local, regional, and global scales. Data discovery, retrieval, and heterogeneity act as cyberinfrastructure barriers to synthesis. Much of the available data is discovered by serendipity or word of mouth from independent, disconnected nodes with different interfaces and semantics. Before scientific analysis, data must be harmonized to a common understandable format. As the number of publishers increase, particularly when the publishers are small groups, this problem can only worsen. We have been developing SciScope (www.sciscope.org), a search tool that aggregates data set information and presents a simple map-based interface across diverse data publishers. SciScope unites data catalog, semantic, structural and syntactic mediation, and Geographical Information Systems (GIS) functions. The current catalog contains 1.7 million observation sites from five different sources in the United States and Australia. SciScope encourages participatory science. Data consumers can find and retrieve data sets simply as well as annotate data sets. Data contributors can register and upload data sets as well as extend the semantic mediation to enable richer data searches. SciScopes can be federated to bridge between science communities that may not be willing or able to share a single catalog. Copyright © 2010 John Wiley & Sons, Ltd.
Article
Languages are among the most complex systems that evolution has created. With an unforeseen speed many of these unique results of evolution are currently disappearing: every two weeks one of the 6500 still spoken languages is dying and many are subject to extreme changes due to globalization. Experts understood the need to document the languages and preserve the cultural and linguistic treasures embedded in them for future generations. Also linguistic theory will need to consider the variation of the linguistic systems encoded in languages to improve our understanding of how human minds process language material, thus accessibility to all types of resources is increasingly crucial. Deeper insights into human language processing and a higher degree of integration and interoperability between resources will also improve our language processing technology. The DOBES programme is focussing on the documentation and preservation of language material. The Max Planck Institute developed the Language Archiving Technology to help researchers when creating, archiving and accessing language resources. The recently started CLARIN research infrastructure has as main goals to achieve a broad visibility and an easy accessibility of language resources.
SciScope: A Participatory Geoscientific Web Application. Special Issue of Concurrency and Computation: Practice and Experience
  • B Beran
  • C Van Ingen
  • R Fatland
Beran B, van Ingen C, Fatland R. SciScope: A Participatory Geoscientific Web Application. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1597. Microsoft eScience Workshop.
e-Science Central for CARMEN: Science as a Service. Special Issue of Concurrency and Computation: Practice and Experience
  • P Watson
  • H Hiden
  • S Woodman
Watson P, Hiden H, Woodman S. e-Science Central for CARMEN: Science as a Service. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1611. Microsoft eScience Workshop.
Biomashups: The New World of Exploratory Bioinformatics? Special Issue of Concurrency and Computation: Practice and Experience
  • J Hogan
  • J Sumitomo
  • P Roe
  • F Newell
Hogan J, Sumitomo J, Roe P, Newell F. Biomashups: The New World of Exploratory Bioinformatics? Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1598. Microsoft eScience Workshop.
A Data-centered Collaboration Portal to Support Global Carbon-Flux Analysis. Special Issue of Concurrency and Computation: Practice and Experience
  • D Agarwal
  • M Humphrey
  • N Bkkewilder
  • K Jackson
  • M Goode
  • C Van Ingen
Agarwal D, Humphrey M, Bkkewilder N, Jackson K, Goode M, van Ingen C. A Data-centered Collaboration Portal to Support Global Carbon-Flux Analysis. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1600. Microsoft eScience Workshop.
Publication and consumption of caBIG data services using .NET. Special Issue of Concurrency and Computation: Practice and Experience
  • Li J Beekwilder
Humphrey M, Li J, Beekwilder N. Publication and consumption of caBIG data services using.NET. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1599. Microsoft eScience Workshop.
Towards Open Science: The myExperiment approach. Special Issue of Concurrency and Computation: Practice and Experience
  • D De Roure
  • C Gobel
  • S Aleksejevs
  • S Bechhofer
  • J Bhagat
  • D Cruickshank
  • P Fisher
  • D Hull
  • D Michaelides
  • D Newman
  • R Proctor
  • Y Lin
  • M Poschen
De Roure D, Gobel C, Aleksejevs S, Bechhofer S, Bhagat J, Cruickshank D, Fisher P, Hull D, Michaelides D, Newman D, Proctor R, Lin Y, Poschen M. Towards Open Science: The myExperiment approach. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1601. Microsoft eScience Workshop.
Archiving and accessing language resources. Special Issue of Concurrency and Computation: Practice and Experience
  • P Wittenburg
Wittenburg P. Archiving and accessing language resources. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1605. Microsoft eScience Workshop.
Publication and consumption of caBIG data services using .NET. Special Issue of Concurrency and Computation: Practice and Experience
  • M Humphrey
  • J Li
  • N Beekwilder
Humphrey M, Li J, Beekwilder N. Publication and consumption of caBIG data services using.NET. Special Issue of Concurrency and Computation: Practice and Experience 2008. DOI: 10.1002/cpe.1599. Microsoft eScience Workshop.