The Palomar‐Quest digital synoptic sky survey

Astronomische Nachrichten (Impact Factor: 1.4). 02/2008; 329(3):263 - 265. DOI: 10.1002/asna.200710948
Source: arXiv

ABSTRACT We describe briefly the Palomar-Quest (PQ) digital synoptic sky survey, including its parameters, data processing, status, and plans. Exploration of the time domain is now the central scientific and technological focus of the survey. To this end, we have developed a real-time pipeline for detection of transient sources.We describe some of the early results, and lessons learned which may be useful for other, similar projects, and time-domain astronomy in general. Finally, we discuss some issues and challenges posed by the real-time analysis and scientific exploitation of massive data streams from modern synoptic sky surveys. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box. Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the text
    International Journal of Modern Physics D 06/2009; · 1.03 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Astronomy has been at the forefront of the development of the techniques and methodologies of data intensive science for over a decade with large sky surveys and distributed efforts such as the Virtual Observatory. However, it faces a new data deluge with the next generation of synoptic sky surveys which are opening up the time domain for discovery and exploration. This brings both new scientific opportunities and fresh challenges, in terms of data rates from robotic telescopes and exponential complexity in linked data, but also for data mining algorithms used in classification and decision making. In this paper, we describe how an informatics-based approach-part of the so-called "fourth paradigm" of scientific discovery-is emerging to deal with these. We review our experiences with the Palomar-Quest and Catalina Real-Time Transient Sky Surveys; in particular, addressing the issue of the heterogeneity of data associated with transient astronomical events (and other sensor networks) and how to manage and analyze it.
    Distributed and Parallel Databases 08/2012; 30(5-6). · 0.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: New modes of discovery are enabled by the growth of data and computational resources (i.e., cyberinfrastructure) in the sciences. This cyberinfrastructure includes structured databases, virtual observatories (distributed data, as described in Section 20.2.1 of this chapter), high-performance computing (petascale machines), distributed computing (e.g., the Grid, the Cloud, and peer-to-peer networks), intelligent search and discovery tools, and innovative visualization environments. Data streams from experiments, sensors, and simulations are increasingly complex and growing in volume. This is true in most sciences, including astronomy, climate simulations, Earth observing systems, remote sensing data collections, and sensor networks. At the same time, we see an emerging confluence of new technologies and approaches to science, most clearly visible in the growing synergism of the four modes of scientific discovery: sensors-modeling-computing-data (Eastman et al. 2005). This has been driven by numerous developments, including the information explosion, development of large-array sensors, acceleration in high-performance computing (HPC) power, advances in algorithms, and efficient modeling techniques. Among these, the most extreme is the growth in new data. Specifically, the acquisition of data in all scientific disciplines is rapidly accelerating and causing a data glut (Bell et al. 2007). It has been estimated that data volumes double every year—for example, the NCSA (National Center for Supercomputing Applications) reported that their users cumulatively generated one petabyte of data over the first 19 years of NCSA operation, but they then generated their next one petabyte in the next year alone, and the data production has been growing by almost 100% each year after that (Butler 2008). The NCSA example is just one of many demonstrations of the exponential (annual data-doubling) growth in scientific data collections. In general, this putative data-doubling is an inevitable result of several compounding factors: the proliferation of data-generating devices, sensors, projects, and enterprises; the 18-month doubling of the digital capacity of these microprocessor-based sensors and devices (commonly referred to as "Moore’s law"); the move to digital for nearly all forms of information; the increase in human-generated data (both unstructured information on the web and structured data from experiments, models, and simulation); and the ever-expanding capability of higher density media to hold greater volumes of data (i.e., data production expands to fill the available storage space). These factors are consequently producing an exponential data growth rate, which will soon (if not already) become an insurmountable technical challenge even with the great advances in computation and algorithms. This technical challenge is compounded by the ever-increasing geographic dispersion of important data sources—the data collections are not stored uniformly at a single location, or with a single data model, or in uniform formats and modalities (e.g., images, databases, structured and unstructured files, and XML data sets)—the data are in fact large, distributed, heterogeneous, and complex. The greatest scientific research challenge with these massive distributed data collections is consequently extracting all of the rich information and knowledge content contained therein, thus requiring new approaches to scientific research. This emerging data-intensive and data-oriented approach to scientific research is sometimes called discovery informatics or X-informatics (where X can be any science, such as bio, geo, astro, chem, eco, or anything; Agresti 2003; Gray 2003; Borne 2010). This data-oriented approach to science is now recognized by some (e.g., Mahootian and Eastman 2009; Hey et al. 2009) as the fourth paradigm of research, following (historically) experiment/observation, modeling/analysis, and computational science.
    Advances in Machine Learning and Data Mining for Astronomy. 03/2012;

Full-text (2 Sources)

Available from
May 27, 2014