Sergey Kirshner

Purdue University, ウェストラファイエット, Indiana, United States

Are you Sergey Kirshner?

Claim your profile

Publications (35)

  • [Show abstract] [Hide abstract] ABSTRACT: The recent interest in modeling complex networks has fueled the development of generative graph models, such as Kronecker Product Graph Model (KPGM) and mixed KPGM (mKPGM). The Kronecker family of models are appealing because of their elegant fractal structure, as well as their ability to capture important network characteristics such as degree, diameter, and (in the case of mKPGM) clustering and population variance. In addition, scalable sampling algorithms for KPGMs made the analysis of large-scale, sparse networks feasible for the first time. In this work, we show that the scalable sampling methods, in contrast to prior belief, do not in fact sample from the underlying KPGM distribution and often result in sampling graphs that are very unlikely. To address this issue, we develop a new representation that exploits the structure of Kronecker models and facilitates the development of novel grouped sampling methods that are provably correct. In this paper, we outline efficient algorithms to sample from mKPGMs and KPGMs based on these ideas. Notably, our mKPGM algorithm is the first available scalable sampling method for this model and our KPGM algorithm is both faster and more accurate than previous scalable methods. We conduct both theoretical analysis and empirical evaluation to demonstrate the strengths of our algorithms and show that we can sample a network with 75 million edges in 87 seconds on a single processor.
    Conference Paper · Dec 2014
  • Guy Feldman · Anindya Bhadra · Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: We consider the problem of feature selection in a high-dimensional multiple predictors, multiple responses regression setting. Assuming that regression errors are i.i.d. when they are in fact dependent leads to inconsistent and inefficient feature estimates. We relax the i.i.d. assumption by allowing the errors to exhibit a tree-structured dependence. This allows a Bayesian problem formulation with the error dependence structure treated as an auxiliary variable that can be integrated out analytically with the help of the matrix-tree theorem. Mixing over trees results in a flexible technique for modelling the graphical structure for the regression errors. Furthermore, the analytic integration results in a collapsed Gibbs sampler for feature selection that is computationally efficient. Our approach offers significant performance gains over the competing methods in simulations, especially when the features themselves are correlated. In addition to comprehensive simulation studies, we apply our method to a high-dimensional breast cancer data set to identify markers significantly associated with the disease. Copyright © 2014 John Wiley & Sons, Ltd.
    Article · Mar 2014
  • Guobin Fu · Stephen P. Charles · Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: An ensemble of stochastic daily rainfall projections has been generated for 30 stations across south‐eastern Australia using the downscaling nonhomogeneous hidden Markov model, which was driven by atmospheric predictors from four climate models for three IPCC emissions scenarios (A1B, A2, and B1) and for two periods (2046–2065 and 2081–2100). The results indicate that the annual rainfall is projected to decrease for both periods for all scenarios and climate models, with the exception of a few scenarios of no statistically significant changes. However, there is a seasonal difference: two downscaled GCMs consistently project a decline of summer rainfall, and two an increase. In contrast, all four downscaled GCMs show a decrease of winter rainfall. Because winter rainfall accounts for two‐thirds of the annual rainfall and produces the majority of streamflow for this region, this decrease in winter rainfall would cause additional water availability concerns in the southern Murray–Darling basin, given that water shortage is already a critical problem in the region. In addition, the annual maximum daily rainfall is projected to intensify in the future, particularly by the end of the 21st century; the maximum length of consecutive dry days is projected to increase, and correspondingly, the maximum length of consecutive wet days is projected to decrease. These changes in daily sequencing, combined with fewer events of reduced amount, could lead to drier catchment soil profiles and further reduce runoff potential and, hence, also have streamflow and water availability implications. Copyright © 2012 John Wiley & Sons, Ltd.
    Article · Dec 2013 · Hydrological Processes
  • [Show abstract] [Hide abstract] ABSTRACT: Droughts are characterized by drought indexes that measure the departures of meteorological and hydrological variables, such as precipitation and streamflow, from their long-term averages. Although many drought indexes have been proposed in the literature, most use predefined thresholds for identifying drought classes, ignoring the inherent uncertainties in characterizing droughts. This study employs a hidden Markov model (HMM) for the probabilistic classification of drought states. Apart from explicitly accounting for the time dependence in the drought states, the HMM-based drought index (HMM-DI) provides model uncertainty in drought classification. The proposed HMM-DI is used to assess drought characteristics in Indiana by using monthly precipitation and streamflow data. The HMM-DI results were compared to those from standard indexes and the differences in classification results from the two models were examined. In addition to providing the probabilistic classification of drought states, the HMM is suited for analyzing the spatio-temporal characterization of droughts of different severities.
    Article · Jul 2013 · Journal of Hydrologic Engineering
  • Source
    Guobin Fu · Stephen P. Charles · Francis H.S. Chiew · [...] · Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: Statistical downscaling has mainly been used for site (point) scales to provide daily rainfall series for climate change impact studies. The objectives of this study are to compare three methods of applying statistical downscaling to catchment rainfall and evaluating their hydrological response with a hydrological model: (a) statistically downscaling to sites and then interpolating to gridded rainfall which is accumulated to catchment average rainfall; (b) statistically downscaling to catchment average rainfall directly; and (c) statistical downscaling to grid cells and then accumulating to catchment average rainfall. Results indicate that statistical downscaling can be successfully applied at catchment average and grid cell scales. All three methods of application performed similarly for a range of rainfall characteristics, with directly downscaled catchment average rainfall producing a relatively better result for extreme daily rainfall indices. However, hydrological simulation indicated that the direct downscaling of catchment average rainfall did not have any advantages over the other two downscaling application methods in terms of the runoff statistics evaluated. In addition, all three methods of downscaling application could simulate the spatial correlation of daily and annual runoff across the nine focus catchments investigated. The advantages and limitations of applying statistical downscaling to the assessment of hydrological response to climate change are also discussed.
    Full-text available · Article · Jun 2013 · Journal of Hydrology
  • Guobin Fu · S.P. Charlse · F.H.S Chiew · [...] · Sergey Kirshner
    Article · Mar 2013 · Journal of Hydrology
  • Sebastian I. Moreno · Jennifer Neville · Sergey Kirshner
    Conference Paper · Jan 2013
  • Source
    Lin Yuan · Sergey Kirshner · Robert Givan
    [Show abstract] [Hide abstract] ABSTRACT: We propose a novel approach for density estimation with exponential families for the case when the true density may not fall within the chosen family. Our approach augments the sufficient statistics with features designed to accumulate probability mass in the neighborhood of the observed points, resulting in a non-parametric model similar to kernel density estimators. We show that under mild conditions, the resulting model uses only the sufficient statistics if the density is within the chosen exponential family, and asymptotically, it approximates densities outside of the chosen exponential family. Using the proposed approach, we modify the exponential random graph model, commonly used for modeling small-size graph distributions, to address the well-known issue of model degeneracy.
    Full-text available · Article · Jun 2012
  • Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: We propose a new approach for estimation of joint densities for continuous observations using latent tree models for copulas, joint distributions with uniform U(0, 1) marginals. Latent tree copulas combine the advantages of the parametrization of the joint density using only bivariate distributions with the ability to approximate complex dependencies with the help of latent variables. The proposed model can also be used to organize the variables in a tree hierarchy. We describe algorithms for estimating binary latent tree copulas from data for both Gaussian and non-Gaussian copulas.
    Article · Jan 2012
  • A. M. Greene · A. W. Robertson · S. Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: The recognition that local extremes are often associated with particular synoptic weather types opens a new pathway for the estimation of weather-within-climate probability distribution functions and the analysis of linkages between global warming, low-frequency modes of climate variability and the statistics of extreme weather. Probabilistic network models, which characterize daily weather sequences in terms of Markovian transitions among a small set of discrete states, can serve to implement this paradigm, in addition providing improved inference via stochastic simulation. Such models enable both the quantification of sampling variability and the testing of assumptions inherent in the asymptotic framework of classical extreme-value theory, as applied in particular settings. Here we deploy a nonhomogeneous hidden Markov model (NHMM) for the analysis of daily Indian summer monsoon rainfall both during the 20th century and with respect to projected changes, as inferred from the global climate models constituting the CMIP3 ensemble. Utilizing observation-based return levels as criteria, we examine the spatiotemporal distribution of return levels estimated from stochastic simulations produced by the NHMM, with a focus on their association with the NHMM-defined weather states. We assess the sensitivity of parameter (and thus return-level) uncertainty to sample size and highlight advantages of the state-based approach. We also discuss estimates of past and projected monsoon rainfall extremes, as inferred through the agency of the NHMM.
    Article · Dec 2011
  • Dalton Lunga · Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks.
    Article · May 2011
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: Much of the past work on mining and modeling networks has focused on understanding the observed properties of single example graphs. However, in many real-life applications it is important to characterize the structure of populations of graphs. In this work, we investigate the distributional properties of Kronecker product graph models (KPGMs). Specifically, we examine whether these models can represent the natural variability in graph properties observed across multiple networks and find surprisingly that they cannot. By considering KPGMs from a new viewpoint, we can show the reason for this lack of variance theoretically - which is primarily due to the generation of each edge independently from the others. Based on this understanding we propose a generalization of KPGMs that uses tied parameters to increase the variance of the model, while preserving the expectation. We then show experimentally, that our mixed-KPGM can adequately capture the natural variability across a population of networks.
    Full-text available · Conference Paper · Nov 2010
  • Source
    Barnabás Póczos · Sergey Kirshner · Csaba Szepesvári
    [Show abstract] [Hide abstract] ABSTRACT: We propose a new method for a non- parametric estimation of Renyi and Shan- non information for a multivariate distribu- tion using a corresponding copula, a multi- variate distribution over normalized ranks of the data. As the information of the distri- bution is the same as the negative entropy of its copula, our method estimates this in- formation by solving a Euclidean graph opti- mization problem on the empirical estimate of the distribution's copula. Owing to the properties of the copula, we show that the resulting estimator of Renyi information is strongly consistent and robust. Further, we demonstrate its applicability in image regis- tration in addition to simulated experiments.
    Full-text available · Article · Jan 2010 · Journal of Machine Learning Research
  • Sergey Kirshner · Padhraic Smyth
    Article · Dec 2008
  • Source
    A. M. Greene · A. W. Robertson · S. Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: A 70-year record of daily monsoon-season rainfall at a network of 13 stations in central western India is analyzed using a 4-state homogeneous hidden Markov model. The diagnosed states are seen to play distinct roles in the seasonal march of the monsoon, can be associated with ‘active’ and ‘break’ monsoon phases and capture the northward propagation of convective disturbances associated with the intraseasonal oscillation. Interannual variations in station rainfall are found to be associated with the alternation, from year to year, in the frequency of occurrence of wet and dry states; this mode of variability is well correlated with both all-India monsoon rainfall and an index characterizing the strength of the El Niño Southern Oscillation. Analysis of low-passed time series suggests that variations in state frequency are responsible for the modulation of monsoon rainfall on multidecadal time-scales as well. Copyright © 2008 Royal Meteorological Society
    Full-text available · Article · Apr 2008 · Quarterly Journal of the Royal Meteorological Society
  • Sergey Kirshner · Barnabás Póczos
    [Show abstract] [Hide abstract] ABSTRACT: We propose a new algorithm for independent component and independent subspace analysis problems. This algorithm uses a contrast based on the Schweizer-Wolff measure of pairwise dependence (Schweizer & Wolff, 1981), a non-parametric measure computed on pairwise ranks of the variables. Our algorithm frequently outperforms state of the art ICA methods in the normal setting, is significantly more robust to outliers in the mixed signals, and performs well even in the presence of noise. Our method can also be used to solve independent subspace analysis (ISA) problems by grouping signals recovered by ICA methods. We provide an extensive empirical evaluation using simulated, sound, and image data.
    Conference Paper · Jan 2008
  • Sergey Kirshner · Padhraic Smyth
    [Show abstract] [Hide abstract] ABSTRACT: Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow countably many tree-structured mixture components. The resulting Bayesian framework allows us to deal with the problem of selecting the number of mixture components by computing the posterior distribution over the number of components and integrating out the components by Bayesian model averaging. We apply the proposed framework to identify the number and the properties of predominant precipitation patterns in historical archives of climate data.
    Conference Paper · Jun 2007
  • Sergey Kirshner
    [Show abstract] [Hide abstract] ABSTRACT: We utilize the ensemble of trees framework, a tractable mixture over super- exponential number of tree-structured distributions (1), to develop a new model for multivariate density estimation. The model is based on a construction of tree- structured copulas - multivariate distributions with uniform on (0,1) marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these tree-averaged models for both the real-valued and the categorical case. Based on the tree-averaged framework, we propose a new model for joint precipitation amounts data on networks of rain stations.
    Conference Paper · Jan 2007
  • Source
    Andrew W. Robertson · Sergey Kirshner · Padhraic Smyth · [...] · Bryson C. Bates
    [Show abstract] [Hide abstract] ABSTRACT: Daily rainfall occurrence and amount at 11 stations over North Queensland are examined for summers 1958–1998, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms of the occurrence of five discrete ‘weather states’, identified by the HMM. Three states are characterized respectively by very wet, moderately wet, and dry conditions at most stations; two states have enhanced rainfall along the coast and dry conditions inland. Each HMM rainfall state is associated with a distinct atmospheric circulation regime. The two wet states are accompanied by monsoonal circulation patterns with large-scale ascent, low-level inflow from the north-west, and a phase reversal with height; the dry state is characterized by circulation anomalies of the opposite sense. Two of the states show significant associations with midlatitude synoptic waves.Variability of the monsoon on time-scales from subseasonal to interdecadal is interpreted in terms of changes in the frequency of occurrence of the five HMM rainfall states. Large subseasonal variability is identified in terms of active and break phases, and a highly variable monsoon onset date. The occurrence of the very wet and dry states is somewhat modulated by the Madden–Julian oscillation. On interannual time-scales, there are clear relationships with the El Niño–Southern Oscillation and Indian Ocean sea surface temperatures (SSTs). Interdecadal monsoonal variability is characterized by stronger monsoons during the 1970s, and weaker monsoons plus an increased prevalence of drier states in the later part of the record.Stochastic simulations of daily rainfall occurrence and amount at the 11 stations are generated by introducing predictors based on large-scale precipitation from (a) reanalysis data, (b) an atmospheric general circulation model (GCM) run with observed SST forcing and (c) antecedent June–August Pacific SST anomalies. The reanalysis large-scale precipitation yields relatively accurate station-level simulations of the interannual variability of daily rainfall amount and occurrence, with rainfall intensity less well simulated. At some stations, interannual variations in 10-day dry-spell frequency are also simulated reasonably well. The interannual quality of the simulations is markedly degraded when the GCM simulations are used as inputs, while antecedent Pacific SST inputs yield an anomaly correlation skill comparable to that of the GCM. Copyright © 2006 Royal Meteorological Society
    Full-text available · Article · Dec 2006 · Quarterly Journal of the Royal Meteorological Society
  • A. W. Robertson · A. Ihler · S. Kirshner · [...] · M. Ghil
    [Show abstract] [Hide abstract] ABSTRACT: The northward-propagating intraseasonal oscillation is a prominent feature of the Indian summer monsoon, leading to breaks and active phases of the monsoon. Recent studies suggest that it plays an important role in modulating monsoon seasonal rainfall totals, and that it may convey some sub-seasonal predictability to rainfall. We present a multichannel singular spectrum analysis (MSSA) of NOAA interpolated daily outgoing longwave radiation fields over the domain (0-30N, 65E-100E) for the June-September season, 1974-04. The MSSA is applied to the unfiltered daily OLR fields, by firstly decomposing the fields into the leading principal components that then form the channels of the MSSA. A northward-propagating oscillatory mode with a period of about 35 days is clearly isolated by the analysis, but accounting for only about 5% of the total daily OLR variance. To determine the significance of this mode for rainfall variability over India, we compute the mutual information between it and daily station rainfall occurrence and amount. Results are interpreted on weekly and seasonal time scales.
    Article · May 2006