• [Show abstract] [Hide abstract]
    ABSTRACT: Given a large collection of epidemiological data consisting of the count of d contagious diseases for l locations of duration n, how can we find patterns, rules and outliers? For example, the Project Tycho provides open access to the count infections for U.S. states from 1888 to 2013, for 56 contagious diseases (e.g., measles, influenza), which include missing values, possible recording errors, sudden spikes (or dives) of infections, etc. So how can we find a combined model, for all these diseases, locations, and time-ticks? In this paper, we present FUNNEL, a unifying analytical model for large scale epidemiological data, as well as a novel fitting algorithm, FUNNELFIT, which solves the above problem. Our method has the following properties: (a) Sense-making: it detects important patterns of epidemics, such as periodicities, the appearance of vaccines, external shock events, and more; (b) Parameter-free: our modeling framework frees the user from providing parameter values; (c) Scalable: FUNNELFIT is carefully designed to be linear on the input size; (d) General: our model is general and practical, which can be applied to various types of epidemics, including computer-virus propagation, as well as human diseases. Extensive experiments on real data demonstrate that FUNNELFIT does indeed discover important properties of epidemics: (P1) disease seasonality, e.g., influenza spikes in January, Lyme disease spikes in July and the absence of yearly periodicity for gonorrhea; (P2) disease reduction effect, e.g., the appearance of vaccines; (P3) local/state-level sensitivity, e.g., many measles cases in NY; (P4) external shock events, e.g., historical flu pandemics; (P5) detect incongruous values, i.e., data reporting errors.