David W. Hogg

Columbia University, New York City, New York, United States

Are you David W. Hogg?

Claim your profile

Publications (271)910.44 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present first results from the third GRavitational lEnsing Accuracy Testing (GREAT3) challenge, the third in a sequence of challenges for testing methods of inferring weak gravitational lensing shear distortions from simulated galaxy images. GREAT3 was divided into experiments to test three specific questions, and included simulated space- and ground-based data with constant or cosmologically-varying shear fields. The simplest (control) experiment included parametric galaxies with a realistic distribution of signal-to-noise, size, and ellipticity, and a complex point spread function (PSF). The other experiments tested the additional impact of realistic galaxy morphology, multiple exposure imaging, and the uncertainty about a spatially-varying PSF; the last two questions will be explored in Paper II. The 24 participating teams competed to estimate lensing shears to within systematic error tolerances for upcoming Stage-IV dark energy surveys, making 1525 submissions overall. GREAT3 saw considerable variety and innovation in the types of methods applied. Several teams now meet or exceed the targets in many of the tests conducted (to within the statistical errors). We conclude that the presence of realistic galaxy morphology in simulations changes shear calibration biases by $\sim 1$ per cent for a wide range of methods. Other effects such as truncation biases due to finite galaxy postage stamps, and the impact of galaxy type as measured by the S\'{e}rsic index, are quantified for the first time. Our results generalize previous studies regarding sensitivities to galaxy size and signal-to-noise, and to PSF properties such as seeing and defocus. Almost all methods' results support the simple model in which additive shear biases depend linearly on PSF ellipticity.
    MNRAS accepted. 12/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.
    11/2014;
  • Source
    Dustin Lang, David W. Hogg, David J. Schlegel
    [Show abstract] [Hide abstract]
    ABSTRACT: We present photometry of images from the Wide-Field Infrared Survey Explorer (WISE; Wright et al. 2010) of over 400 million sources detected by the Sloan Digital Sky Survey (SDSS; York et al. 2000). We use a "forced photometry" technique, using measured SDSS source positions, star-galaxy separation and galaxy profiles to define the sources whose fluxes are to be measured in the WISE images. We perform photometry with The Tractor image modeling code, working on our "unWISE" coaddds and taking account of the WISE point-spread function and a noise model. The result is a measurement of the flux of each SDSS source in each WISE band. Many sources have little flux in the WISE bands, so often the measurements we report are consistent with zero. However, for many sources we get three- or four-sigma measurements; these sources would not be reported by the WISE pipeline and will not appear in the WISE catalog, yet they can be highly informative for some scientific questions. In addition, these small-signal measurements can be used in stacking analyses at catalog level. The forced photometry approach has the advantage that we measure a consistent set of sources between SDSS and WISE, taking advantage of the resolution and depth of the SDSS images to interpret the WISE images; objects that are resolved in SDSS but blended together in WISE still have accurate measurements in our photometry. Our results, and the code used to produce them, are publicly available at http://unwise.me.
    10/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High dynamic-range imagers aim to block out or null light from a very bright primary star to make it possible to detect and measure far fainter companions; in real systems a small fraction of the primary light is scattered, diffracted, and unocculted. We introduce S4, a flexible data-driven model for the unocculted (and highly speckled) light in the P1640 spectroscopic coronograph. The model uses Principal Components Analysis (PCA) to capture the spatial structure and wavelength dependence of the speckles but not the signal produced by any companion. Consequently, the residual typically includes the companion signal. The companion can thus be found by filtering this error signal with a fixed companion model. The approach is sensitive to companions that are of order a percent of the brightness of the speckles, or up to $10^{-7}$ times the brightness of the primary star. This outperforms existing methods by a factor of 2-3 and is close to the shot-noise physical limit.
    The Astrophysical Journal 08/2014; 794(2). · 6.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for determining the Galactic gravitational potential based on forward modeling of tidal stellar streams. We use this method to test the performance of smooth and static analytic potentials in representing realistic dark matter halos, which have substructure and are continually evolving by accretion. Our FAST-FORWARD method uses a Markov Chain Monte Carlo algorithm to compare, in 6D phase space, an "observed" stream to models created in trial analytic potentials. We analyze a large sample of streams evolved in the Via Lactea II (VL2) simulation, which represents a realistic Galactic halo potential. The recovered potential parameters are in agreement with the best fit to the global, present-day VL2 potential. However, merely assuming an analytic potential limits the dark matter halo mass measurement to an accuracy of 5 to 20%, depending on the choice of analytic parametrization. Collectively, mass estimates using streams from our sample reach this fundamental limit, but individually they can be highly biased. Individual streams can both under- and overestimate the mass, and the bias is progressively worse for those with smaller perigalacticons, motivating the search for tidal streams at galactocentric distances larger than 70 kpc. We estimate that the assumption of a static and smooth dark matter potential in modeling of the GD-1 and Pal5-like streams introduces an error of up to 50% in the Milky Way mass estimates.
    The Astrophysical Journal 06/2014; 795(1). · 6.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: No true extrasolar Earth analog is known. Hundreds of planets have been found around Sun-like stars that are either Earth-sized but on shorter periods, or else on year-long orbits but somewhat larger. Under strong assumptions, exoplanet catalogs have been used to make an extrapolated estimate of the rate at which Sun-like stars host Earth analogs. These studies are complicated by the fact that every catalog is censored by non-trivial selection effects and detection efficiencies, and every property (period, radius, etc.) is measured noisily. Here we present a general hierarchical probabilistic framework for making justified inferences about the population of exoplanets, taking into account survey completeness and, for the first time, observational uncertainties. We are able to make fewer assumptions about the distribution than previous studies; we only require that the occurrence rate density be a smooth function of period and radius (employing a Gaussian process). By applying our method to synthetic catalogs, we demonstrate that it produces more accurate estimates of the whole population than standard procedures based on weighting by inverse detection efficiency. We apply the method to an existing catalog of small planet candidates around G dwarf stars (Petigura et al. 2013). We confirm a previous result that the radius distribution changes slope near Earth's radius. We find that the rate density of Earth analogs is about 0.02 (per star per natural logarithmic bin in period and radius) with large uncertainty. This number is much smaller than previous estimates made with the same data but stronger assumptions.
    The Astrophysical Journal 06/2014; 795(1). · 6.73 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We describe a system that builds a high dynamic-range and wide-angle image of the night sky by combining a large set of input images. The method makes use of pixel-rank information in the individual input images to improve a "consensus" pixel rank in the combined image. Because it only makes use of ranks and the complexity of the algorithm is linear in the number of images, the method is useful for large sets of uncalibrated images that might have undergone unknown non-linear tone mapping transformations for visualization or aesthetic reasons. We apply the method to images of the night sky (of unknown provenance) discovered on the Web. The method permits discovery of astronomical objects or features that are not visible in any of the input images taken individually. More importantly, however, it permits scientific exploitation of a huge source of astronomical images that would not be available to astronomical research without our automatic system.
    06/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The dark matter halo of the Milky Way is expected to be triaxial and filled with substructure. It is hoped that streams or shells of stars produced by tidal disruption of stellar systems will provide precise measures of the gravitational potential to test these predictions. We develop a method for inferring the Galactic potential with tidal streams based on the idea that the stream stars were once close in phase space. Our method can flexibly adapt to any form for the Galactic potential: it works in phase-space rather than action-space and hence relies neither on our ability to derive actions nor on the integrability of the potential. Our model is probabilistic, with a likelihood function and priors on the parameters. The method can properly account for finite observational uncertainties and missing data dimensions. We test our method on synthetic datasets generated from N-body simulations of satellite disruption in a static, multi-component Milky Way including a triaxial dark matter halo with observational uncertainties chosen to mimic current and near-future surveys of various stars. We find that with just four well-measured stream stars, we can infer properties of a triaxial potential with precisions of order 5-7 percent. Without proper motions we obtain 15 percent constraints on potential parameters and precisions around 25 percent for recovering missing phase-space coordinates. These results are encouraging for the eventual goal of using flexible, time-dependent potential models combined with larger data sets to unravel the detailed shape of the dark matter distribution around the Milky Way.
    The Astrophysical Journal 05/2014; 794(1). · 6.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Ly$\alpha$ forest flux probability distribution function (PDF) is an established probe of the intergalactic medium (IGM) astrophysics, especially the temperature-density relationship of the IGM. We measure the flux PDF from 3393 Baryon Oscillations Spectroscopic Survey (BOSS) quasars from SDSS Data Release 9, and compare with mock spectra that include careful modeling of the noise, continuum, and astrophysical uncertainties. The BOSS flux PDFs, measured at $\langle z \rangle = [2.3,2.6,3.0]$, are compared with PDFs created from mock spectra drawn from a suite of hydrodynamical simulations that sample the IGM temperature-density relationship, $\gamma$, and temperature at mean-density, $T_0$, where $T(\Delta) = T_0 \Delta^{\gamma-1}$. We find that a significant population of partial Lyman-limit systems with a column-density distribution slope of $\beta_\mathrm{pLLS} \sim -2$ are required to explain the data at the low-flux end of flux PDF, while uncertainties in the mean \lya\ forest transmission affect the high-flux end. After modelling the LLSs and marginalizing over mean-transmission uncertainties, we find that $\gamma=1.6$ best describes the data over our entire redshift range, although constraints on $T_0$ are affected by systematic uncertainties. Isothermal or inverted temperature-density relationships ($\gamma \leq 1$) are disfavored at a significance of over 4$\sigma$.
    05/2014;
  • 04/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a new method for constraining the Milky Way halo gravitational potential by simultaneously fitting multiple tidal streams. This method requires full three-dimensional positions and velocities for all stars in the streams, but does not require identification of any specific stream, nor determination of stream membership for any star. We exploit the principle that the action distribution of stream stars is most clustered---that is, most informative---when the potential used to calculate the actions is closest to the true potential. We measure the amount of clustering with the Kullback-Leibler Divergence (KLD) or relative entropy, a statistical measure of information which also provides uncertainties for our parameter estimates. We show, for toy Gaia-like data in a spherical isochrone potential, that maximizing the KLD of the action distribution relative to a smoother distribution recovers the true values of the potential parameters. The precision depends on the observational errors and the number and type of streams in the sample; we find that with the phase-space structure and observational uncertainties expected in the Gaia red-giant-star data set, we measure the enclosed mass at the average radius of the sample stars accurate to 3% and precise to a factor two. Recovery of the scale radius is also precise to roughly a factor two, and is biased 50% high by the small galactocentric distance range of stars in our mock sample (1-25 kpc, or about three scale radii). About 15 streams with at least 100 stars, including 2-3 large streams, are needed to place limits on the enclosed mass; about 40 are required to obtain bounds on the scale radius, primarily to get sufficient distance range. This finding underlines the need for ground-based spectroscopic follow-up to complete the radial velocity catalog for faint stars ($V>17$) observed by Gaia.
    04/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$-dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$, as well as the evaluation of its determinant, $\det(C)$. In many cases, the covariance matrix is of the form $C = \sigma^2 I + K$, where $K$ is computed using a specified kernel, which depends on the data and additional parameters (called hyperparameters in Gaussian process computations). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal O(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal O (n\log^2 n) $ algorithm for inversion, as discussed in Ambikasaran and Darve, $2013$. More importantly, we show that this factorization enables the evaluation of the determinant $\det(C)$, permitting the direct calculation of probabilities in high dimensions under fairly broad assumption about the kernel defining $K$. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels, and apply it to a real data set obtained from the $Kepler$ Mission.
    03/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recoiling supermassive black holes (SMBHs) are considered one plausible physical mechanism to explain high velocity shifts between narrow and broad emission lines sometimes observed in quasar spectra. If the sphere of influence of the recoiling SMBH is such that only the accretion disc is bound, the dusty torus would be left behind, hence the SED should then present distinctive features (i.e. a mid-infrared deficit). Here we present results from fitting the Spectral Energy Distributions (SEDs) of 32 Type-1 AGN with high velocity shifts between broad and narrow lines. The aim is to find peculiar properties in the multi-wavelength SEDs of such objects by comparing their physical parameters (torus and disc luminosity, intrinsic reddening, and size of the 12$\mu$m emitter) with those estimated from a control sample of $\sim1000$ \emph{typical} quasars selected from the Sloan Digital Sky Survey in the same redshift range. We find that all sources, with the possible exception of J1154+0134, analysed here present a significant amount of 12~$\mu$m emission. This is in contrast with a scenario of a SMBH displaced from the center of the galaxy, as expected for an undergoing recoil event.
    03/2014; 441(1).
  • Fengji Hou, Jonathan Goodman, David W. Hogg
    [Show abstract] [Hide abstract]
    ABSTRACT: The fully marginalized likelihood, or Bayesian evidence, is of great importance in probabilistic data analysis, because it is involved in calculating the posterior probability of a model or re-weighting a mixture of models conditioned on data. It is, however, extremely challenging to compute. This paper presents a geometric-path Monte Carlo method, inspired by multi-canonical Monte Carlo to evaluate the fully marginalized likelihood. We show that the algorithm is very fast and easy to implement and produces a justified uncertainty estimate on the fully marginalized likelihood. The algorithm performs efficiently on a trial problem and multi-companion model fitting for radial velocity data. For the trial problem, the algorithm returns the correct fully marginalized likelihood, and the estimated uncertainty is also consistent with the standard deviation of results from multiple runs. We apply the algorithm to the problem of fitting radial velocity data from HIP 88048 ($\nu$ Oph) and Gliese 581. We evaluate the fully marginalized likelihood of 1, 2, 3, and 4-companion models given data from HIP 88048 and various choices of prior distributions. We consider prior distributions with three different minimum radial velocity amplitude $K_{\mathrm{min}}$. Under all three priors, the 2-companion model has the largest marginalized likelihood, but the detailed values depend strongly on $K_{\mathrm{min}}$. We also evaluate the fully marginalized likelihood of 3, 4, 5, and 6-planet model given data from Gliese 581 and find that the fully marginalized likelihood of the 5-planet model is too close to that of the 6-planet model for us to confidently decide between them.
    01/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present AGNfitter: a Markov Chain Monte Carlo algorithm developed to fit the spectral energy distributions (SEDs) of active galactic nuclei (AGN) with different physical models of AGN components. This code is well suited to determine in a robust way multiple parameters and their uncertainties, which quantify the physical processes responsible for the panchromatic nature of active galaxies and quasars. We describe the technicalities of the code and test its capabilities in the context of X-ray selected obscured AGN using multiwavelength data from the XMM-COSMOS survey.
    Proceedings of the International Astronomical Union 01/2014; 9(S304).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Near-future data from ESA's Gaia mission will provide precise, full phase-space information for hundreds of millions of stars out to heliocentric distances of ~10 kpc. This "horizon" for full phase-space measurements is imposed by the Gaia parallax errors degrading to worse than 10%, and could be significantly extended by an accurate distance indicator. Recent work has demonstrated how Spitzer observations of RR Lyrae stars can be used to make distance estimates accurate to 2%, effectively extending the Gaia, precise-data horizon by a factor of ten in distance and a factor of 1000 in volume. This Letter presents one approach to exploit data of such accuracy to measure the Galactic potential using small samples of stars associated with debris from satellite destruction. The method is tested with synthetic observations of 100 stars from the end point of a simulation of satellite destruction: the shape, orientation, and depth of the potential used in the simulation are recovered to within a few percent. The success of this simple test with such a small sample in a single debris stream suggests that constraints from multiple streams could be combined to examine the Galaxy's dark matter halo in even more detail --- a truly unique opportunity that is enabled by the combination of Spitzer and Gaia with our intimate perspective on the Galaxy.
    01/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In today's mailing, Hogg et al. propose image modeling techniques to maintain 10-ppm-level precision photometry in Kepler data with only two working reaction wheels. While these results are relevant to many scientific goals for the repurposed mission, all modeling efforts so far have used a toy model of the Kepler telescope. Because the two-wheel performance of Kepler remains to be determined, we advocate for the consideration of an alternate strategy for a >1 year program that maximizes the science return from the "low-torque" fields across the ecliptic plane. Assuming we can reach the precision of the original Kepler mission, we expect to detect 800 new planet candidates in the first year of such a mission. Our proposed strategy has benefits for transit timing variation and transit duration variation studies, especially when considered in concert with the future TESS mission. We also expect to help address the first key science goal of Kepler: the frequency of planets in the habitable zone as a function of spectral type.
    09/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Kepler's immense photometric precision to date was maintained through satellite stability and precise pointing. In this white paper, we argue that image modeling--fitting the Kepler-downlinked raw pixel data--can vastly improve the precision of Kepler in pointing-degraded two-wheel mode. We argue that a non-trivial modeling effort may permit continuance of photometry at 10-ppm-level precision. We demonstrate some baby steps towards precise models in both data-driven (flexible) and physics-driven (interpretably parameterized) modes. We demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters. In particular, we show that we can infer the device flat-field at higher than pixel resolution; that is, we can infer pixel-to-pixel variations in intra-pixel sensitivity. These results are relevant to almost any scientific goal for the repurposed mission; image modeling ought to be a part of any two-wheel repurpose for the satellite. We make other recommendations for Kepler operations, but fundamentally advocate that the project stick with its core mission of finding and characterizing Earth analogs. [abridged]
    09/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Sloan Digital Sky Survey (SDSS) has been in operation since 2000 April. This paper presents the tenth public data release (DR10) from its current incarnation, SDSS-III. This data release includes the first spectroscopic data from the Apache Point Observatory Galaxy Evolution Experiment (APOGEE), along with spectroscopic data from the Baryon Oscillation Spectroscopic Survey (BOSS) taken through 2012 July. The APOGEE instrument is a near-infrared R~22,500 300-fiber spectrograph covering 1.514--1.696 microns. The APOGEE survey is studying the chemical abundances and radial velocities of roughly 100,000 red giant star candidates in the bulge, bar, disk, and halo of the Milky Way. DR10 includes 178,397 spectra of 57,454 stars, each typically observed three or more times, from APOGEE. Derived quantities from these spectra (radial velocities, effective temperatures, surface gravities, and metallicities) are also included.DR10 also roughly doubles the number of BOSS spectra over those included in the ninth data release. DR10 includes a total of 1,507,954 BOSS spectra, comprising 927,844 galaxy spectra; 182,009 quasar spectra; and 159,327 stellar spectra, selected over 6373.2 square degrees.
    The Astrophysical Journal Supplement Series 07/2013; 211(2). · 16.24 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Theoretically, bound binaries of massive black holes are expected as the natural outcome of mergers of massive galaxies. From the observational side, however, massive black hole binaries remain elusive. Velocity shifts between narrow and broad emission lines in quasar spectra are considered a promising observational tool to search for spatially unresolved, dynamically bound binaries. In this series of papers we investigate the nature of such candidates through analyses of their spectra, images and multi-wavelength spectral energy distributions. Here we investigate the properties of the optical spectra, including the evolution of the broad line profiles, of all the sources identified in our previous study. We find a diverse phenomenology of broad and narrow line luminosities, widths, shapes, ionization conditions and time variability, which we can broadly ascribe to 4 classes based on the shape of the broad line profiles: 1) Objects with bell-shaped broad lines with big velocity shifts (>1000 km/s) compared to their narrow lines show a variety of broad line widths and luminosities, modest flux variations over a few years, and no significant change in the broad line peak wavelength. 2) Objects with double-peaked broad emission lines tend to show very luminous and broadened lines, and little time variability. 3) Objects with asymmetric broad emission lines show a broad range of broad line luminosities and significant variability of the line profiles. 4) The remaining sources tend to show moderate to low broad line luminosities, and can be ascribed to diverse phenomena. We discuss the implications of our findings in the context of massive black hole binary searches.
    Monthly Notices of the Royal Astronomical Society 05/2013; 433(2). · 5.52 Impact Factor

Publication Stats

9k Citations
910.44 Total Impact Points

Institutions

  • 2014
    • Columbia University
      New York City, New York, United States
    • Carnegie Mellon University
      • Department of Physics
      Pittsburgh, Pennsylvania, United States
  • 2001–2014
    • CUNY Graduate Center
      New York City, New York, United States
  • 1996–2013
    • California Institute of Technology
      • Department of Astronomy
      Pasadena, California, United States
  • 2012
    • Pennsylvania State University
      • Department of Astronomy and Astrophysics
      University Park, Maryland, United States
  • 2008–2012
    • Max Planck Institute for Astronomy
      Heidelburg, Baden-Württemberg, Germany
  • 2011
    • Universität Heidelberg
      Heidelburg, Baden-Württemberg, Germany
    • Vanderbilt University
      • Department of Physics and Astronomy
      Nashville, Michigan, United States
  • 2009
    • York University
      • Department of Physics and Astronomy
      Toronto, Ontario, Canada
  • 2007–2009
    • Princeton University
      • Department of Astrophysical Sciences
      Princeton, New Jersey, United States
  • 2001–2008
    • Fermi National Accelerator Laboratory (Fermilab)
      Batavia, Illinois, United States
  • 2002–2007
    • Johns Hopkins University
      • Department of Physics and Astronomy
      Baltimore, Maryland, United States
  • 2006
    • The Catholic University of America
      • Department of Physics
      Washington, Washington, D.C., United States
  • 2005–2006
    • New York University
      • Department of Physics
      New York City, NY, United States
    • The University of Arizona
      • Department of Astronomy
      Tucson, Arizona, United States
  • 1999–2001
    • Institute for Advanced Study
      Princeton Junction, New Jersey, United States