David W. Hogg

NYU Langone Medical Center, New York, New York, United States

Are you David W. Hogg?

Claim your profile

Publications (295)940.24 Total impact

  • [Show abstract] [Hide abstract] ABSTRACT: Chemical tagging promises to use detailed abundance measurements to identify spatially separated stars that were in fact born together (in the same molecular cloud), long ago. This idea has not previously yielded scientific successes, probably because of the noise and incompleteness in chemical-abundance measurements. However, we have succeeded in substantially improving spectroscopic measurements with The Cannon, which has delivered 15 individual abundances for 100,000 stars observed as part of the APOGEE spectroscopic survey, with precisions around 0.04 dex. We test the chemical-tagging hypothesis by looking at clusters in abundance space and confirming that they are clustered in phase space. We identify (by the k-means algorithm) overdensities of stars in the 15-dimensional chemical-abundance space delivered by The Cannon, and plot the associated stars in phase space. We use only abundance-space information (no positional information) to identify stellar groups. We find that clusters in abundance space are indeed clusters in phase space. We recover some known phase-space clusters and find other interesting structures. This confirms the chemical-tagging hypothesis and verifies the precision of the abundance measurements delivered by The Cannon. This is the first-ever project to identify phase-space structures by blind search purely in abundance space; the prospects for future data sets are very good.
    No preview · Article · Jan 2016
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: We present an algorithm capable of detecting diffuse, dim sources of any size in an astronomical image. These sources often defeat traditional methods for source finding, which expand regions around points of high intensity. Extended sources often have no bright points and are only detectable when viewed as a whole, so a more sophisticated approach is required. Our algorithm operates at all scales simultaneously by considering a tree of nested candidate bounding boxes, and inverts a hierarchical Bayesian generative model to obtain the probability of sources existing at given locations and sizes. This model naturally accommodates the detection of nested sources, and no prior knowledge of the distribution of a source, or even the background, is required. The algorithm scales nearly linear with the number of pixels making it feasible to run on large images, and requires minimal parameter tweaking to be effective. We demonstrate the algorithm on several types of astronomical and artificial images.
    Full-text · Article · Jan 2016
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: $K2$'s Campaign 9 ($K2$C9) will conduct a $\sim$3.4 deg$^{2}$ survey toward the Galactic bulge from 7/April through 1/July of 2016 that will leverage the spatial separation between $K2$ and the Earth to facilitate measurement of the microlens parallax $\pi_{\rm E}$ for $\gtrsim$120 microlensing events, including several planetary in nature as well as many short-timescale microlensing events, which are potentially indicative of free-floating planets (FFPs). These satellite parallax measurements will in turn allow for the direct measurement of the masses of and distances to the lensing systems. In this white paper we provide an overview of the $K2$C9 space- and ground-based microlensing survey. Specifically, we detail the demographic questions that can be addressed by this program, including the frequency of FFPs and the Galactic distribution of exoplanets, the observational parameters of $K2$C9, and the array of ground-based resources dedicated to concurrent observations. Finally, we outline the avenues through which the larger community can become involved, and generally encourage participation in $K2$C9, which constitutes an important pathfinding mission and community exercise in anticipation of $WFIRST$.
    Full-text · Article · Dec 2015
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: The mass of a star is arguably its most fundamental parameter. For red giant stars, tracers luminous enough to be observed across the Galaxy, mass implies a stellar evolution age. It has proven to be extremely difficult to infer ages and masses directly from red giant spectra using existing methods. From the KEPLER and APOGEE surveys, samples of several thousand stars exist with high-quality spectra and asteroseismic masses. Here we show that from these data we can build a data-driven spectral model using The Cannon, which can determine stellar masses to $\sim$ 0.07 dex from APOGEE DR12 spectra of red giants; these imply age estimates accurate to $\sim$ 0.2 dex (40 percent). We show that The Cannon constrains these ages foremost from spectral regions with CN absorption lines, elements whose surface abundances reflect mass-dependent dredge-up. We deliver an unprecedented catalog of 80,000 giants (including 20,000 red-clump stars) with mass and age estimates, spanning the entire disk (from the Galactic center to R $\sim$ 20 kpc). We show that the age information in the spectra is not simply a corollary of the birth-material abundances [Fe/H] and [$\alpha$/Fe], and that even within a mono-abundance population of stars, there are age variations that vary sensibly with Galactic position. Such stellar age constraints across the Milky Way open up new avenues in Galactic archeology.
    Preview · Article · Nov 2015
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: In area and depth, the Pan-STARRS1 (PS1) 3$\pi$ survey is unique among many-epoch, multi-band surveys and has enormous potential for all-sky identification of variable sources. PS1 has observed the sky typically seven times in each of its five bands ($grizy$) over 3.5 years, but unlike SDSS not simultaneously across the bands. Here we develop a new approach for quantifying statistical properties of non-simultaneous, sparse, multi-color lightcurves through light-curve structure functions, effectively turning PS1 into a $\sim 35$-epoch survey. We use this approach to estimate variability amplitudes and timescales $(\omega_r, \tau)$ for all point-sources brighter than $r_{\mathrm{P1}}=21.5$ mag in the survey. With PS1 data on SDSS Stripe 82 as ``ground truth", we use a Random Forest Classifier to identify QSOs and RR Lyrae based on their variability and their mean PS1 and WISE colors. We find that, aside from the Galactic plane, QSO and RR Lyrae samples of purity $\sim$75\% and completeness $\sim$92\% can be selected. On this basis we have identified a sample of $\sim 1,000,000$ QSO candidates, as well as an unprecedentedly large and deep sample of $\sim$150,000 RR Lyrae candidates with distances from $\sim$10 kpc to $\sim$120 kpc. Within the Draco dwarf spheroidal, we demonstrate a distance precision of 6\% for RR Lyrae candidates. We provide a catalog of all likely variable point sources and likely QSOs in PS1, a total of $25.8\times 10^6$ sources.
    Preview · Article · Nov 2015 · The Astrophysical Journal
  • [Show abstract] [Hide abstract] ABSTRACT: Mapping Nearby Galaxies at Apache Point Observatory (MaNGA), one of three core programs in the Sloan Digital Sky Survey-IV (SDSS-IV), is an integral-field spectroscopic (IFS) survey of roughly 10,000 nearby galaxies. It employs dithered observations using 17 hexagonal bundles of 2 arcsec fibers to obtain resolved spectroscopy over a wide wavelength range of 3,600-10,300A. To map the internal variations within each galaxy, we need to perform accurate {\it spectral surface photometry}, which is to calibrate the specific intensity at every spatial location sampled by each individual aperture element of the integral field unit. The calibration must correct only for the flux loss due to atmospheric throughput and the instrument response, but not for losses due to the finite geometry of the fiber aperture. This requires the use of standard star measurements to strictly separate these two flux loss factors (throughput versus geometry), a difficult challenge with standard single-fiber spectroscopy techniques due to various practical limitations. Therefore, we developed a technique for spectral surface photometry using multiple small fiber-bundles targeting standard stars simultaneously with galaxy observations. We discuss the principles of our approach and how they compare to previous efforts, and we demonstrate the precision and accuracy achieved. MaNGA's relative calibration between the wavelengths of H$\alpha$ and H$\beta$ has a root-mean-square (RMS) of 1.7%, while that between [NII] $\lambda$6583A and [OII] $\lambda$3727A has an RMS of 4.7%. Using extinction-corrected star formation rates and gas-phase metallicities as an illustration, this level of precision guarantees that flux calibration errors will be sub-dominant when estimating these quantities. The absolute calibration is better than 5% for more than 89% of MaNGA's wavelength range.
    No preview · Article · Nov 2015 · The Astronomical Journal
  • [Show abstract] [Hide abstract] ABSTRACT: We present a modular, extensible likelihood framework for spectroscopic inference based on synthetic model spectra. The subtraction of an imperfect model from a continuously sampled spectrum introduces covariance between adjacent datapoints (pixels) into the residual spectrum. For the high signal-to-noise data with large spectral range that is commonly employed in stellar astrophysics, that covariant structure can lead to dramatically underestimated parameter uncertainties (and, in some cases, biases). We construct a likelihood function that accounts for the structure of the covariance matrix, utilizing the machinery of Gaussian process kernels. This framework specifically addresses the common problem of mismatches in model spectral line strengths (with respect to data) due to intrinsic model imperfections (e.g., in the atomic/molecular databases or opacity prescriptions) by developing a novel local covariance kernel formalism that identifies and self-consistently downweights pathological spectral line "outliers." By fitting many spectra in a hierarchical manner, these local kernels provide a mechanism to learn about and build data-driven corrections to synthetic spectral libraries. An open-source software implementation of this approach is available at http://iancze.github.io/Starfish, including a sophisticated probabilistic scheme for spectral interpolation when using model libraries that are sparsely sampled in the stellar parameters. We demonstrate some salient features of the framework by fitting the high-resolution V-band spectrum of WASP-14, an F5 dwarf with a transiting exoplanet, and the moderate-resolution K-band spectrum of Gliese 51, an M5 field dwarf. © 2015. The American Astronomical Society. All rights reserved..
    No preview · Article · Oct 2015 · The Astrophysical Journal
  • [Show abstract] [Hide abstract] ABSTRACT: Tidal streams of globular clusters are ideal tracers of the Galactic gravitational potential. Compared to the few known, complex and diffuse dwarf-galaxy streams, they are kinematically cold, have thin morphologies and are abundant in the halo of the Milky Way. Their coldness and thinness in combination with potential epicyclic substructure in the vicinity of the stream progenitor turns them into high-precision scales. With the example of Palomar 5, we demonstrate how modeling of a globular cluster stream allows us to simultaneously measure the properties of the disrupting globular cluster, its orbital motion, and the gravitational potential of the Milky Way.
    No preview · Article · Sep 2015
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: We map the distribution of dust in M31 at 25pc resolution, using stellar photometry from the Panchromatic Hubble Andromeda Treasury. We develop a new mapping technique that models the NIR color-magnitude diagram (CMD) of red giant branch (RGB) stars. The model CMDs combine an unreddened foreground of RGB stars with a reddened background population viewed through a log-normal column density distribution of dust. Fits to the model constrain the median extinction, the width of the extinction distribution, and the fraction of reddened stars. The resulting extinction map has >4 times better resolution than maps of dust emission, while providing a more direct measurement of the dust column. There is superb morphological agreement between the new map and maps of the extinction inferred from dust emission by Draine et al. 2014. However, the widely-used Draine & Li (2007) dust models overpredict the observed extinction by a factor of ~2.5, suggesting that M31's true dust mass is lower and that dust grains are significantly more emissive than assumed in Draine et al. (2014). The discrepancy we identify is consistent with similar findings in the Milky Way by the Planck Collaboration (2015), but has a more complex dependence on parameters from the Draine & Li (2007) dust models. We also show that the discrepancy with the Draine et al. (2014) map is lowest where the interstellar radiation field has a harder spectrum than average. We discuss possible improvements to the CMD dust mapping technique, and explore further applications.
    Full-text · Article · Sep 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Astronomical observations are affected by several kinds of noise, each with its own causal source; there is photon noise, stochastic source variability, and residuals coming from imperfect calibration of the detector or telescope. The precision of NASA Kepler photometry for exoplanet science---the most precise photometric measurements of stars ever made---appears to be limited by unknown or untracked variations in spacecraft pointing and temperature, and unmodeled stellar variability. Here we present the Causal Pixel Model (CPM) for Kepler data, a data-driven model intended to capture variability but preserve transit signals. The CPM works at the pixel level so that it can capture very fine-grained information about the variation of the spacecraft. The CPM predicts each target pixel value from a large number of pixels of other stars sharing the instrument variabilities while not containing any information on possible transits in the target star. In addition, we use the target star's future and past (auto-regression). By appropriately separating, for each data point, the data into training and test sets, we ensure that information about any transit will be perfectly isolated from the model. The method has four hyper-parameters (the number of predictor stars, the auto-regressive window size, and two L2-regularization amplitudes for model components), which we set by cross-validation. We determine a generic set of hyper-parameters that works well for most of the stars and apply the method to a corresponding set of target stars. We find that we can consistently outperform (for the purposes of exoplanet detection) the Kepler Pre-search Data Conditioning (PDC) method for exoplanet discovery.
    Preview · Article · Aug 2015
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Several long, dynamically cold stellar streams have been observed around the Milky Way Galaxy, presumably formed from the tidal disruption of globular clusters. In integrable potentials---where all orbits are dynamically regular---tidal debris phase-mixes close to the orbit of the progenitor system. However, cosmological simulations of structure formation suggest that the Milky Way's dark matter halo is expected not to be fully integrable; an appreciable fraction of orbits will be chaotic. This paper examines the influence of chaos on the phase-space morphology of cold tidal streams. We find very stark results: Streams in chaotic regions look very different from those in regular regions. We find that streams (simulated using test particle ensembles of nearby orbits) can be sensitive to chaos on a much shorter time-scale than any standard prediction (from the Lyapunov or frequency-diffusion times). For example, on a weakly chaotic orbit with a chaotic timescale predicted to be >1000 orbital periods (>1000 Gyr), the resulting stellar stream is, after just a few 10's of orbits, substantially more diffuse than any formed on a nearby but regular orbit. We find that the enhanced diffusion of the stream stars can be understood by looking at the variance in orbital frequencies of orbit ensembles centered around the parent (progenitor) orbit. Our results suggest that long, cold streams around our Galaxy must exist only on regular (or very nearly regular) orbits; they potentially provide a map of the regular regions of the Milky Way potential. This suggests a promising new direction for the use of tidal streams to constrain the distribution of dark matter around our Galaxy.
    Preview · Article · Jul 2015 · Monthly Notices of the Royal Astronomical Society
  • [Show abstract] [Hide abstract] ABSTRACT: We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.
    No preview · Article · May 2015
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: The extended Kepler mission, K2, is now providing photometry of new fields every three months in a search for transiting planets. In a recent study, Foreman-Mackey and collaborators presented a list of 36 planet candidates orbiting 31 stars in K2 Campaign 1. In this contribution, we present stellar and planetary properties for all systems. We combine ground-based seeing-limited survey data and adaptive optics imaging with an automated transit analysis scheme to validate 18 candidates as planets and identify 6 candidates as likely false positives. Of particular interest is EPIC 201912552, a bright (K=8.9) M2 dwarf hosting a 2.24 \pm 0.25 Earth radius planet with an equilibrium temperature of 271 \pm 16 K and an orbital period of 33 days. We also present two new open-source software packages that enabled this analysis: isochrones, a flexible tool for fitting theoretical stellar models to observational data to determine stellar properties, and vespa, a new general-purpose procedure to calculate false positive probabilities and statistically validate transiting exoplanets.
    Preview · Article · Mar 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: We have undertaken the largest systematic study of the high-mass stellar initial mass function (IMF) to date using the optical color-magnitude diagrams (CMDs) of 85 resolved, young (4 Myr < t < 25 Myr), intermediate mass star clusters (10^3-10^4 Msun), observed as part of the Panchromatic Hubble Andromeda Treasury (PHAT) program. We fit each cluster's CMD to measure its mass function (MF) slope for stars >2 Msun. For the ensemble of clusters, the distribution of stellar MF slopes is best described by $\Gamma=+1.45^{+0.03}_{-0.06}$ with a very small intrinsic scatter. The data also imply no significant dependencies of the MF slope on cluster age, mass, and size, providing direct observational evidence that the measured MF represents the IMF. This analysis implies that the high-mass IMF slope in M31 clusters is universal with a slope ($\Gamma=+1.45^{+0.03}_{-0.06}$) that is steeper than the canonical Kroupa (+1.30) and Salpeter (+1.35) values. Using our inference model on select Milky Way (MW) and LMC high-mass IMF studies from the literature, we find $\Gamma_{\rm MW} \sim+1.15\pm0.1$ and $\Gamma_{\rm LMC} \sim+1.3\pm0.1$, both with intrinsic scatter of ~0.3-0.4 dex. Thus, while the high-mass IMF in the Local Group may be universal, systematics in literature IMF studies preclude any definitive conclusions; homogenous investigations of the high-mass IMF in the local universe are needed to overcome this limitation. Consequently, the present study represents the most robust measurement of the high-mass IMF slope to date. We have grafted the M31 high-mass IMF slope onto widely used sub-solar mass Kroupa and Chabrier IMFs and show that commonly used UV- and Halpha-based star formation rates should be increased by a factor of ~1.3-1.5 and the number of stars with masses >8 Msun are ~25% fewer than expected for a Salpeter/Kroupa IMF. [abridged]
    Full-text · Article · Feb 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Photometry of stars from the K2 extension of NASA's Kepler mission is afflicted by systematic effects caused by small (few-pixel) drifts in the telescope pointing and other spacecraft issues. We present a method for searching K2 light curves for evidence of exoplanets by simultaneously fitting for these systematics and the transit signals of interest. This method is more computationally expensive than standard search algorithms but we demonstrate that it can be efficiently implemented and used to discover transit signals. We apply this method to the full Campaign 1 dataset and report a list of 36 planet candidates transiting 31 stars, along with an analysis of the pipeline performance and detection efficiency based on artificial signal injections and recoveries. For all planet candidates, we present posterior distributions on the properties of each system based strictly on the transit observables.
    Preview · Article · Feb 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Using the example of the tidal stream of the Milky Way globular cluster Palomar 5 (Pal 5), we demonstrate how observational data on streams can be efficiently reduced in dimensionality and modeled in a Bayesian framework. Our approach combines detection of stream overdensities by a Difference-of-Gaussians process with fast streakline models, a continuous likelihood function built from these models, and inference with MCMC. By generating $\approx10^7$ model streams, we show that the geometry of the Pal 5 debris yields powerful constraints on the solar position and motion, the Milky Way and Pal 5 itself. All 10 model parameters were allowed to vary over large ranges without additional prior information. Using only SDSS data and a few radial velocities from the literature, we find that the distance of the Sun from the Galactic Center is $8.30\pm0.25$ kpc, and the transverse velocity is $253\pm16$ km/s. Both estimates are in excellent agreement with independent measurements of these quantities. Assuming a standard disk and bulge model, we determine the Galactic mass within Pal 5's apogalactic radius of 19 kpc to be $(2.1\pm0.4)\times10^{11}$ M$_\odot$. Moreover, we find the potential of the dark halo with a flattening of $q_z = 0.95^{+0.16}_{-0.12}$ to be essentially spherical within the radial range that is effectively probed by Pal 5. We also determine Pal 5's mass, distance and proper motion independently from other methods, which enables us to perform vital cross-checks. We conclude that with more observational data and by using additional prior information, the precision of this method can be significantly increased.
    Full-text · Article · Feb 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: New spectroscopic surveys offer the promise of consistent stellar parameters and abundances ('stellar labels') for hundreds of thousands of stars in the Milky Way: this poses a formidable spectral modeling challenge. In many cases, there is a sub-set of reference objects for which the stellar labels are known with high(er) fidelity. We take advantage of this with The Cannon, a new data-driven approach for determining stellar labels from spectroscopic data. The Cannon learns from the 'known' labels of reference stars how the continuum-normalized spectra depend on these labels by fitting a flexible model at each wavelength; then, The Cannon uses this model to derive labels for the remaining survey stars. We illustrate The Cannon by training the model on only 543 stars in 19 clusters as reference objects, with Teff, log g and [Fe/H] as the labels, and then applying it to the spectra of 56,000 stars from APOGEE DR10. The Cannon is very accurate. Its stellar labels compare well to the stars for which APOGEE pipeline (ASPCAP) labels are provided in DR10, with rms differences that are basically identical to the stated ASPCAP uncertainties. Beyond the reference labels, The Cannon makes no use of stellar models nor any line-list, but needs a set of reference objects that span label-space. The Cannon performs well at lower signal-to-noise, as it delivers comparably good labels even at one ninth the APOGEE observing time. We discuss the limitations of The Cannon and its future potential, particularly, to bring different spectroscopic surveys onto a consistent scale of stellar labels.
    Preview · Article · Jan 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Neutron stars are a prime laboratory for testing physical processes under conditions of strong gravity, high density, and extreme magnetic fields. Among the zoo of neutron star phenomena, magnetars stand out for their bursting behaviour, ranging from extremely bright, rare giant flares to numerous, less energetic recurrent bursts. The exact trigger and emission mechanisms for these bursts are not known; favoured models involve either a crust fracture and subsequent energy release into the magnetosphere, or explosive reconnection of magnetic field lines. In the absence of a predictive model, understanding the physical processes responsible for magnetar burst variability is difficult. Here, we develop an empirical model that decomposes magnetar bursts into a superposition of small spike-like features with a simple functional form, where the number of model components is itself part of the inference problem. The cascades of spikes that we model might be formed by avalanches of reconnection, or crust rupture aftershocks. Using Markov Chain Monte Carlo (MCMC) sampling augmented with reversible jumps between models with different numbers of parameters, we characterise the posterior distributions of the model parameters and the number of components per burst. We relate these model parameters to physical quantities in the system, and show for the first time that the variability within a burst does not conform to predictions from ideas of self-organised criticality. We also examine how well the properties of the spikes fit the predictions of simplified cascade models for the different trigger mechanisms.
    Full-text · Article · Jan 2015 · The Astrophysical Journal
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: The third generation of the Sloan Digital Sky Survey (SDSS-III) took data from 2008 to 2014 using the original SDSS wide-field imager, the original and an upgraded multi-object fiber-fed optical spectrograph, a new near-infrared high-resolution spectrograph, and a novel optical interferometer. All the data from SDSS-III are now made public. In particular, this paper describes Data Release 11 (DR11) including all data acquired through 2013 July, and Data Release 12 (DR12) adding data acquired through 2014 July (including all data included in previous data releases), marking the end of SDSS-III observing. Relative to our previous public release (DR10), DR12 adds one million new spectra of galaxies and quasars from the Baryon Oscillation Spectroscopic Survey (BOSS) over an additional 3000 sq. deg of sky, more than triples the number of H-band spectra of stars as part of the Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE), and includes repeated accurate radial velocity measurements of 5500 stars from the Multi-Object APO Radial Velocity Exoplanet Large-area Survey (MARVELS). The APOGEE outputs now include measured abundances of 15 different elements for each star. In total, SDSS-III added 5200 sq. deg of ugriz imaging; 155,520 spectra of 138,099 stars as part of the Sloan Exploration of Galactic Understanding and Evolution 2 (SEGUE-2) survey; 2,497,484 BOSS spectra of 1,372,737 galaxies, 294,512 quasars, and 247,216 stars over 9376 sq. deg; 618,080 APOGEE spectra of 156,593 stars; and 197,040 MARVELS spectra of 5,513 stars. Since its first light in 1998, SDSS has imaged over 1/3 the Celestial sphere in five bands and obtained over five million astronomical spectra.
    Full-text · Article · Jan 2015 · The Astrophysical Journal Supplement Series
  • [Show abstract] [Hide abstract] ABSTRACT: A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the n-dimensional setting, however, it requires the inversion of an n ×n covariance matrix, C, as well as the evaluation of its determinant, det(C) . In many cases, such as regression using Gaussian processes, the covariance matrix is of the form C = σ(2) I + K , where K is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n(3)) work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an O (n log(2) n) algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant det(C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining K. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.
    No preview · Article · Jan 2015 · IEEE Transactions on Pattern Analysis and Machine Intelligence

Publication Stats

26k Citations
940.24 Total Impact Points

Institutions

  • 2008-2015
    • NYU Langone Medical Center
      New York, New York, United States
  • 2001-2015
    • CUNY Graduate Center
      New York, New York, United States
    • Institute for Advanced Study
      Princeton Junction, New Jersey, United States
    • Fermi National Accelerator Laboratory (Fermilab)
      Batavia, Illinois, United States
  • 2014
    • Columbia University
      New York City, New York, United States
    • The Ohio State University
      Columbus, Ohio, United States
  • 2012
    • Pennsylvania State University
      • Department of Astronomy and Astrophysics
      University Park, Maryland, United States
    • Harvard-Smithsonian Center for Astrophysics
      • Smithsonian Astrophysical Observatory
      Cambridge, Massachusetts, United States
  • 2011-2012
    • Max Planck Institute for Astronomy
      Heidelburg, Baden-Württemberg, Germany
    • Universität Heidelberg
      Heidelburg, Baden-Württemberg, Germany
    • New York University
      New York, New York, United States
    • Vanderbilt University
      • Department of Physics and Astronomy
      Nashville, Michigan, United States
  • 2009
    • York University
      • Department of Physics and Astronomy
      Toronto, Ontario, Canada
  • 2007
    • The University of Arizona
      • Department of Astronomy
      Tucson, AZ, United States
  • 2002-2007
    • Johns Hopkins University
      • Department of Physics and Astronomy
      Baltimore, Maryland, United States
  • 1995-2007
    • California Institute of Technology
      • • Department of Astronomy
      • • Jet Propulsion Laboratory
      Pasadena, California, United States
  • 2006
    • The Catholic University of America
      • Department of Physics
      Washington, Washington, D.C., United States
  • 1999
    • National Research Council
      Roma, Latium, Italy
  • 1991
    • University of Toronto
      Toronto, Ontario, Canada