Astro-WISE: Chaining to the Universe

Source: arXiv

ABSTRACT The recent explosion of recorded digital data and its processed derivatives
threatens to overwhelm researchers when analysing their experimental data or
when looking up data items in archives and file systems. While current hardware
developments allow to acquire, process and store 100s of terabytes of data at
the cost of a modern sports car, the software systems to handle these data are
lagging behind. This general problem is recognized and addressed by various
scientific communities, e.g., DATAGRID/EGEE federates compute and storage power
over the high-energy physical community, while the astronomical community is
building an Internet geared Virtual Observatory, connecting archival data.
These large projects either focus on a specific distribution aspect or aim to
connect many sub-communities and have a relatively long trajectory for setting
standards and a common layer. Here, we report "first light" of a very different
solution to the problem initiated by a smaller astronomical IT community. It
provides the abstract "scientific information layer" which integrates
distributed scientific analysis with distributed processing and federated
archiving and publishing. By designing new abstractions and mixing in old ones,
a Science Information System with fully scalable cornerstones has been
achieved, transforming data systems into knowledge systems. This break-through
is facilitated by the full end-to-end linking of all dependent data items,
which allows full backward chaining from the observer/researcher to the
experiment. Key is the notion that information is intrinsic in nature and thus
is the data acquired by a scientific experiment. The new abstraction is that
software systems guide the user to that intrinsic information by forcing full
backward and forward chaining in the data modelling.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most often, astronomers are interested in a source (e.g., moving, variable, or extreme in some colour index) that lies on a few pixels of an image. However, the classical approach in astronomical data processing is the processing of the entire image or set of images even when the sole source of interest may exist on only a few pixels of one or a few images. This is because pipelines have been written and designed for instruments with fixed detector properties (e.g., image size, calibration frames, overscan regions, etc.). Furthermore, all metadata and processing parameters are based on an instrument or a detector. Accordingly, out of many thousands of images for a survey, this can lead to unnecessary processing of data that is both time-consuming and wasteful. We describe the architecture and an implementation of sub-image processing in Astro-WISE. The architecture enables a user to select, retrieve and process only the relevant pixels in an image where the source exists. We show that lineage data collected during the processing and analysis of datasets can be reused to perform selective reprocessing (at sub-image level) on datasets while the remainder of the dataset is untouched, a difficult process to automate without lineage.
    Experimental Astronomy 01/2012; 35(1-2). DOI:10.1007/s10686-012-9295-0 · 2.66 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper reviews the Astro-WISE infrastructure and demonstrates that the Astro-WISE Information System provides a Grid itself. We describe the integration of Astro-WISE with an external Grid infrastructure (BiGGrid). The integration is performed on all infrastructural layers (data storage, metadata and processing layers) with Astro-WISE as a “master” infrastructure. We report the use of the integrated infrastructure for the processing of Astro-WISE hosted data and for the future development of Astro-WISE and Target projects.
    Experimental Astronomy 01/2012; 35(1-2). DOI:10.1007/s10686-012-9293-2 · 2.66 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Astro-WISE information system was developed to handle data processing for the KIDS survey. In this paper we describe the adaptation of the WISE concept to allow scaling to support archives containing tens of petabytes of stored data and the changes we introduced to accommodate the system for the LOFAR Long Term Archive. With this we provide an example of how Astro-WISE technology can be adapted to support a wider range and scale of data.
    Experimental Astronomy 01/2012; 35(1-2). DOI:10.1007/s10686-012-9305-2 · 2.66 Impact Factor