Article

# From Sky to Earth: Data Science Methodology Transfer

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

## Abstract

We describe here the parallels in astronomy and earth science datasets, their analyses, and the opportunities for methodology transfer from astroinformatics to geoinformatics. Using example of hydrology, we emphasize how meta-data and ontologies are crucial in such an undertaking. Using the infrastructure being designed for EarthCube - the Virtual Observatory for the earth sciences - we discuss essential steps for better transfer of tools and techniques in the future e.g. domain adaptation. Finally we point out that it is never a one-way process and there is enough for astroinformatics to learn from geoinformatics as well.

## No full-text available

Conference Paper
Full-text available
Systematic Review of Methodologies in Data Science Abstract—In this paper, the problem of finding outstanding research works in the topic of methodologies of Data Science was addressed. Although there are survey-type publications on this science, these do not explicitly highlight the use of a state-of-the- art review methodology that would help in the construction of the survey. Within the present investigation, a systematic review was applied to identify the relevant publications concerning the methodological issue in Data Science. A total of 3,451 articles were reviewed by title using the search strings established in the systematic review. Likewise, this number of articles was filtered to select only those that met the inclusion and exception criteria. As a result of the application of these variables, only 24 articles out of the total were identified as relevant to the methodological issue of data science. Index Terms- Data Science, Methodology, Systematic Review
Article
We present a review of data types and statistical methods often encountered in astronomy. The aim is to provide an introduction to statistical applications in astronomy for statisticians and computer scientists. We highlight the complex, often hierarchical, nature of many astronomy inference problems and advocate for cross-disciplinary collaborations to address these challenges.
Article
Full-text available
We present ~47,000 periodic variables found during the analysis of 5.4 million variable star candidates within a 20,000 square degree region covered by the Catalina Surveys Data Release-1 (CSDR1). Combining these variables with type-ab RR Lyrae from our previous work, we produce an on-line catalog containing periods, amplitudes, and classifications for ~61,000 periodic variables. By cross-matching these variables with those from prior surveys, we find that > 90% of the ~8,000 known periodic variables in the survey region are recovered. For these sources we find excellent agreement between our catalog and prior values of luminosity, period and amplitude, as well as classification. We investigate the rate of confusion between objects classified as contact binaries and type-c RR Lyrae (RRc's) based on periods, colours, amplitudes, metalicities, radial velocities and surface gravities. We find that no more than few percent of these variables in these classes are misidentified. By deriving distances for this clean sample of ~5,500 RRc's, we trace the path of the Sagittarius tidal streams within the Galactic halo. Selecting 146 outer-halo RRc's with SDSS radial velocities, we confirm the presence of a coherent halo structure that is inconsistent with current N-body simulations of the Sagittarius tidal stream. We also find numerous long-period variables that are very likely associated within the Sagittarius tidal streams system. Based on the examination of 31,000 contact binary light curves we find evidence for two subgroups exhibiting irregular lightcurves. One subgroup presents significant variations in mean brightness that are likely due to chromospheric activity. The other subgroup shows stable modulations over more than a thousand days and thereby provides evidence that the O'Connell effect is not due to stellar spots.
Article
Full-text available
With increasing size and complexity of the implementations of information systems, it is necessary to use some logical construct (or architecture) for defining and controlling the interfaces and the integration of all of the components of the system. This paper defines information systems architecture by creating a descriptive framework from disciplines quite independent of information systems, then by analogy specifies information systems architecture based upon the neutral, objective framework. Also, some preliminary conclusions about the implications of the resultant descriptive framework are drawn. The discussion is limited to architecture and does not include a strategic planning methodology.
Article
Full-text available
The goal of having networks of seamlessly connected people, software agents and IT systems remains elusive. Early integration efforts focused on connectivity at the physical and syntactic layers. Great strides were made; there are many commercial tools available, for example to assist with enterprise application integration. It is now recognized that physical and syntactic connectivity is not adequate. A variety of research systems have been developed addressing some of the semantic issues. In this paper, we argue that ontologies in particular and semantics-based technologies in general will play a key role in achieving seamless connectivity. We give a detailed introduction to ontologies, summarize the current state of the art for applying ontologies to achieve semantic connectivity and highlight some key challenges.
Article
Full-text available
The Flexible Image Transport System (FITS) has been used by astronomers for over 30 years as a data interchange and archiving format; FITS files are now handled by a wide range of astronomical software packages. Since the FITS format definition document (the standard'') was last printed in this journal in 2001, several new features have been developed and standardized, notably support for 64-bit integers in images and tables, variable-length arrays in tables, and new world coordinate system conventions which provide a mapping from an element in a data array to a physical coordinate on the sky or within a spectrum. The FITS Working Group of the International Astronomical Union has therefore produced this new version 3.0 of the FITS standard, which is provided here in its entirety. In addition to describing the new features in FITS, numerous editorial changes were made to the previous version to clarify and reorganize many of the sections. Also included are some appendices which are not formally part of the standard. The FITS standard is likely to undergo further evolution, in which case the latest version may be found on the FITS Support Office Web site at http://fits.gsfc.nasa.gov/, which also provides many links to FITS-related resources.
Article
Iris is an extensible application that provides astronomers with a user-friendly interface capable of ingesting broad-band data from many different sources in order to build, explore, and model spectral energy distributions (SEDs). Iris takes advantage of the standards defined by the International Virtual Observatory Alliance, but hides the technicalities of such standards by implementing different layers of abstraction on top of them. Such intermediate layers provide hooks that users and developers can exploit in order to extend the capabilities provided by Iris. For instance, custom Python models can be combined in arbitrary ways with the Iris built-in models or with other custom functions. As such, Iris offers a platform for the development and integration of SED data, services, and applications, either from the user's system or from the web. In this paper we describe the built-in features provided by Iris for building and analyzing SEDs. We also explore in some detail the Iris framework and software development kit, showing how astronomers and software developers can plug their code into an integrated SED analysis environment.
Article
The Kepler Science Operations Center (SOC) is responsible for the configuration and management of the SOC Science Processing Pipeline, processing of the science data, distributing data and reports to the Science Office, exporting processed data for archiving to the Data Management Center at the Space Telescope Science Institute, and generation and management of the target and aperture definitions. We present an overview of the SOC procedures and workflows for the data the SOC manages and processes. There are several levels of reviews, approvals, and processing for the various types of data. We describe the process flow from data receipt through data processing and export, as well as the procedures in place for accomplishing the tasks. The tools used to accomplish the goals of Kepler science operations will be presented and discussed as well. These include command-line tools and graphical user interfaces, as well as commercial products. The tools provide a wide range of functionality for the SOC including pipeline operation, configuration management, and process workflow implementation. For a demonstration of the Kepler Science Operations Center's processes, procedures, and tools, we present the life of a quarter's worth of data, from target and aperture table generation through archiving the data collected with those tables.
Article
We report on the methodology and first results from the Deep Lens Survey transient search. We utilize image subtraction on survey data to yield all sources of optical variability down to 24th magnitude. Images are analyzed immediately after acquisition, at the telescope and in near-real time, to allow for followup in the case of time-critical events. All classes of transients are posted to the web upon detection. Our observing strategy allows sensitivity to variability over several decades in timescale. The DLS is the first survey to classify and report all types of photometric and astrometric variability detected, including solar system objects, variable stars, supernovae, and short timescale phenomena. Three unusual optical transient events were detected, flaring on thousand-second timescales. All three events were seen in the B passband, suggesting blue color indices for the phenomena. One event (OT 20020115) is determined to be from a flaring Galactic dwarf star of spectral type dM4. From the remaining two events, we find an overall rate of \eta = 1.4 events deg-2 day-1 on thousand-second timescales, with a 95% confidence limit of \eta < 4.3. One of these events (OT 20010326) originated from a compact precursor in the field of galaxy cluster Abell 1836, and its nature is uncertain. For the second (OT 20030305) we find strong evidence for an extended extragalactic host. A dearth of such events in the R passband yields an upper 95% confidence limit on short timescale astronomical variability between 19.5 < R < 23.4 of \eta_R < 5.2. We report also on our ensemble of astrometrically variable objects, as well as an example of photometric variability with an undetected precursor. Comment: 24 pages, 12 figures, 3 tables. Accepted for publication in ApJ. Variability data available at http://dls.bell-labs.com/transients.html
Virtual Observatories of the Future San Francisco: Astron. Soc. Pacific Djorgovski, The First Year of MAXI: Monitoring Variable X-ray Sources
• R Brunner
• S G Djorgovski
• A S G Szalay
• R S G Williams
• A J Drake
• A A Mahabal
Brunner, R., Djorgovski, S.G., & Szalay, A. (eds.) 2001, Virtual Observatories of the Future, ASPCS, vol. 225. San Francisco: Astron. Soc. Pacific Djorgovski, S.G., & Williams, R. 2005, ASPCS, 345, 517. San Francisco: Astron. Soc. Pacific Djorgovski, S. G., Drake, A. J., Mahabal, A. A. et al., 2011, in " The First Year of MAXI: Monitoring Variable X-ray Sources ", eds. T. Mihara & N. Kawai, Tokyo: JAXA Special Publ., https://arxiv.org/abs/1102.5004
• R J Williams
• S Barthelmy
• R Denny
Williams, R. J., Barthelmy, S., Denny, R., et al., 2012, in SPIE Conference 8448: Observatory Operations: Strategies, Processes, and Systems IV Zachman, J. A., 1987, IBM Syst. J., 26, 276
• A G A Brown
• A Vallenari
• T Prusti
• The Gaia
Brown, A. G. A., Vallenari, A., Prusti, T. and the Gaia Collaboration, 2016, A&A, 595, 2
• W D Pence
• L Chiappetti
• C G Page
• R A Shaw
• E Stobie
Pence, W. D., Chiappetti, L., Page, C. G., Shaw, R. A., Stobie, E., 2010, A&A, 524, 42
• A C Becker
• D M Wittman
• Boeshaar
Becker, A. C., Wittman, D. M., Boeshaar, et al., 2004, ApJ, 611, 418
• M G R Pan
• J N Winn
• R Vanderspek
Raskin, R., and Pan, M., 2003, in ISWC-03, Sanibel Island, Florida, Eds. Naveen Ashish and Carole Goble Ricker, G. R., Winn, J. N., Vanderspek, R. et al., 2015, JATIS, 1, 4003
• A J Drake
• M J Graham
• S G Djorgovski
Drake, A. J., Graham, M. J., Djorgovski, S. G. et al., 2014, ApJS, 213, 9
• A Fox
• C Eichelberger
• J Hughes
• S Santa Lyon
• Clara
• J R Hall
• K Ibrahim
• T Klaus
Fox, A., Eichelberger, C., Hughes, J., Lyon, S., 2013, IEEE Big Data Conference, Santa Clara, CA Hall, J. R., Ibrahim, K., Klaus, T.C et al., 2010, Proc. SPIE, Vol. 7737, 77370H Hanisch, R. J. ADASS XIX, Eds. Y. Mizumoto, Koh-Ichiro Morita, and Masatoshi Ohishi., 2010, ASPCS, 434, 65
• O Laurino
• J Budynkiewicz
• Busko
Laurino, O, Budynkiewicz, J., Busko, et al. 2014, Astronomy and Computing, 7, 81
• R Brunner
• S G Djorgovski
• Szalay
Brunner, R., Djorgovski, S.G., & Szalay, A. (eds.) 2001, Virtual Observatories of the Future, ASPCS, vol. 225. San Francisco: Astron. Soc. Pacific
The First Year of MAXI: Monitoring Variable X-ray Sources
• S G Djorgovski
• A J Drake
• A A Mahabal
Djorgovski, S.G., & Williams, R. 2005, ASPCS, 345, 517. San Francisco: Astron. Soc. Pacific Djorgovski, S. G., Drake, A. J., Mahabal, A. A. et al., 2011, in "The First Year of MAXI: Monitoring Variable X-ray Sources", eds. T. Mihara & N. Kawai, Tokyo: JAXA Special Publ., https://arxiv.org/abs/1102.5004
• A J Drake
• S G Djorgovski
• A Mahabal
Drake, A. J., Djorgovski, S. G., Mahabal, A. et al., 2012, IAUS, 285, 306
• O Laurino
• I Busko
• M Cresitello-Dittmar
Laurino, O., Busko, I., Cresitello-Dittmar, M. et al., 2013, ASPC, 475, 275
• A A Mahabal
• S G Djorgovski
• A J Drake
Mahabal, A.A., Djorgovski, S. G., Drake, A. J. et al., 2011, BASI, 39, 387