The LSST data mining research agenda

AIP Conference Proceedings 11/2008; 1082(1). DOI: 10.1063/1.3059074
Source: arXiv


We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more. Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 2008

Download full-text


Available from: Alexander S. Szalay,
  • Source

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box. Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the text
    International Journal of Modern Physics D 06/2009; 19(7). DOI:10.1142/S0218271810017160 · 1.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the serendipitous discovery of a heavily reddened Wolf-Rayet star that we name WR142b. While photometrically monitoring a cataclysmic variable, we detected weak variability in a nearby field star. Low-resolution spectroscopy revealed a strong emission line at 7100 Ang., suggesting an unusual object and prompting further study. A spectrum taken with the Hobby-Eberly Telescope confirms strong HeII emission and a NIV 7112 Ang. line consistent with a nitrogen-rich Wolf-Rayet star of spectral class WN6. Analysis of the HeII line strengths reveals no detectable hydrogen in WR142b. A blue-sensitive spectrum obtained with the Large Binocular Telescope shows no evidence for a hot companion star. The continuum shape and emission line ratios imply a reddening of E(B-V)=2.2 to 2.5 mag. If not for the dust extinction, this new Wolf-Rayet star could be visible to the naked eye.
    The Astronomical Journal 11/2011; 143(6). DOI:10.1088/0004-6256/143/6/136 · 4.02 Impact Factor
Show more