Sergey Kirshner

Purdue University, West Lafayette, Indiana, United States

Are you Sergey Kirshner?

Claim your profile

Publications (32)26.7 Total impact

  • the IEEE International Conference on Data Mining; 12/2014
  • Guy Feldman, Anindya Bhadra, Sergey Kirshner
    [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of feature selection in a high-dimensional multiple predictors, multiple responses regression setting. Assuming that regression errors are i.i.d. when they are in fact dependent leads to inconsistent and inefficient feature estimates. We relax the i.i.d. assumption by allowing the errors to exhibit a tree-structured dependence. This allows a Bayesian problem formulation with the error dependence structure treated as an auxiliary variable that can be integrated out analytically with the help of the matrix-tree theorem. Mixing over trees results in a flexible technique for modelling the graphical structure for the regression errors. Furthermore, the analytic integration results in a collapsed Gibbs sampler for feature selection that is computationally efficient. Our approach offers significant performance gains over the competing methods in simulations, especially when the features themselves are correlated. In addition to comprehensive simulation studies, we apply our method to a high-dimensional breast cancer data set to identify markers significantly associated with the disease. Copyright © 2014 John Wiley & Sons, Ltd.
    03/2014; 3(1). DOI:10.1002/sta4.60
  • Guobin Fu, Stephen P. Charles, Sergey Kirshner
    [Show abstract] [Hide abstract]
    ABSTRACT: An ensemble of stochastic daily rainfall projections has been generated for 30 stations across south‐eastern Australia using the downscaling nonhomogeneous hidden Markov model, which was driven by atmospheric predictors from four climate models for three IPCC emissions scenarios (A1B, A2, and B1) and for two periods (2046–2065 and 2081–2100). The results indicate that the annual rainfall is projected to decrease for both periods for all scenarios and climate models, with the exception of a few scenarios of no statistically significant changes. However, there is a seasonal difference: two downscaled GCMs consistently project a decline of summer rainfall, and two an increase. In contrast, all four downscaled GCMs show a decrease of winter rainfall. Because winter rainfall accounts for two‐thirds of the annual rainfall and produces the majority of streamflow for this region, this decrease in winter rainfall would cause additional water availability concerns in the southern Murray–Darling basin, given that water shortage is already a critical problem in the region. In addition, the annual maximum daily rainfall is projected to intensify in the future, particularly by the end of the 21st century; the maximum length of consecutive dry days is projected to increase, and correspondingly, the maximum length of consecutive wet days is projected to decrease. These changes in daily sequencing, combined with fewer events of reduced amount, could lead to drier catchment soil profiles and further reduce runoff potential and, hence, also have streamflow and water availability implications. Copyright © 2012 John Wiley & Sons, Ltd.
    Hydrological Processes 12/2013; 27(25). DOI:10.1002/hyp.9483 · 2.70 Impact Factor
  • Journal of Hydrologic Engineering 07/2013; 18(7):834-845. DOI:10.1061/(ASCE)HE.1943-5584.0000699 · 1.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical downscaling has mainly been used for site (point) scales to provide daily rainfall series for climate change impact studies. The objectives of this study are to compare three methods of applying statistical downscaling to catchment rainfall and evaluating their hydrological response with a hydrological model: (a) statistically downscaling to sites and then interpolating to gridded rainfall which is accumulated to catchment average rainfall; (b) statistically downscaling to catchment average rainfall directly; and (c) statistical downscaling to grid cells and then accumulating to catchment average rainfall. Results indicate that statistical downscaling can be successfully applied at catchment average and grid cell scales. All three methods of application performed similarly for a range of rainfall characteristics, with directly downscaled catchment average rainfall producing a relatively better result for extreme daily rainfall indices. However, hydrological simulation indicated that the direct downscaling of catchment average rainfall did not have any advantages over the other two downscaling application methods in terms of the runoff statistics evaluated. In addition, all three methods of downscaling application could simulate the spatial correlation of daily and annual runoff across the nine focus catchments investigated. The advantages and limitations of applying statistical downscaling to the assessment of hydrological response to climate change are also discussed.
    Journal of Hydrology 06/2013; 492:254–265. DOI:10.1016/j.jhydrol.2013.03.041 · 2.69 Impact Factor
  • Source
    Journal of Hydrology 03/2013; · 2.69 Impact Factor
  • Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA; 01/2013
  • Source
    Lin Yuan, Sergey Kirshner, Robert Givan
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel approach for density estimation with exponential families for the case when the true density may not fall within the chosen family. Our approach augments the sufficient statistics with features designed to accumulate probability mass in the neighborhood of the observed points, resulting in a non-parametric model similar to kernel density estimators. We show that under mild conditions, the resulting model uses only the sufficient statistics if the density is within the chosen exponential family, and asymptotically, it approximates densities outside of the chosen exponential family. Using the proposed approach, we modify the exponential random graph model, commonly used for modeling small-size graph distributions, to address the well-known issue of model degeneracy.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The recognition that local extremes are often associated with particular synoptic weather types opens a new pathway for the estimation of weather-within-climate probability distribution functions and the analysis of linkages between global warming, low-frequency modes of climate variability and the statistics of extreme weather. Probabilistic network models, which characterize daily weather sequences in terms of Markovian transitions among a small set of discrete states, can serve to implement this paradigm, in addition providing improved inference via stochastic simulation. Such models enable both the quantification of sampling variability and the testing of assumptions inherent in the asymptotic framework of classical extreme-value theory, as applied in particular settings. Here we deploy a nonhomogeneous hidden Markov model (NHMM) for the analysis of daily Indian summer monsoon rainfall both during the 20th century and with respect to projected changes, as inferred from the global climate models constituting the CMIP3 ensemble. Utilizing observation-based return levels as criteria, we examine the spatiotemporal distribution of return levels estimated from stochastic simulations produced by the NHMM, with a focus on their association with the NHMM-defined weather states. We assess the sensitivity of parameter (and thus return-level) uncertainty to sample size and highlight advantages of the state-based approach. We also discuss estimates of past and projected monsoon rainfall extremes, as inferred through the agency of the NHMM.
  • Source
    Dalton Lunga, Sergey Kirshner
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a novel model for generating graphs similar to a given example graph. Unlike standard approaches that compute features of graphs in Euclidean space, our approach obtains features on a surface of a hypersphere. We then utilize a von Mises-Fisher distribution, an exponential family distribution on the surface of a hypersphere, to define a model over possible feature values. While our approach bears similarity to a popular exponential random graph model (ERGM), unlike ERGMs, it does not suffer from degeneracy, a situation when a significant probability mass is placed on unrealistic graphs. We propose a parameter estimation approach for our model, and a procedure for drawing samples from the distribution. We evaluate the performance of our approach both on the small domain of all 8-node graphs as well as larger real-world social networks.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Much of the past work on mining and modeling networks has focused on understanding the observed properties of single example graphs. However, in many real-life applications it is important to characterize the structure of populations of graphs. In this work, we investigate the distributional properties of Kronecker product graph models (KPGMs). Specifically, we examine whether these models can represent the natural variability in graph properties observed across multiple networks and find surprisingly that they cannot. By considering KPGMs from a new viewpoint, we can show the reason for this lack of variance theoretically - which is primarily due to the generation of each edge independently from the others. Based on this understanding we propose a generalization of KPGMs that uses tied parameters to increase the variance of the model, while preserving the expectation. We then show experimentally, that our mixed-KPGM can adequately capture the natural variability across a population of networks.
    Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on; 11/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new method for a non- parametric estimation of Renyi and Shan- non information for a multivariate distribu- tion using a corresponding copula, a multi- variate distribution over normalized ranks of the data. As the information of the distri- bution is the same as the negative entropy of its copula, our method estimates this in- formation by solving a Euclidean graph opti- mization problem on the empirical estimate of the distribution's copula. Owing to the properties of the copula, we show that the resulting estimator of Renyi information is strongly consistent and robust. Further, we demonstrate its applicability in image regis- tration in addition to simulated experiments.
  • Source
    Sergey Kirshner, Padhraic Smyth
    12/2008; 3(4). DOI:10.1214/08-BA326B
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A 70-year record of daily monsoon-season rainfall at a network of 13 stations in central western India is analyzed using a 4-state homogeneous hidden Markov model. The diagnosed states are seen to play distinct roles in the seasonal march of the monsoon, can be associated with ‘active’ and ‘break’ monsoon phases and capture the northward propagation of convective disturbances associated with the intraseasonal oscillation. Interannual variations in station rainfall are found to be associated with the alternation, from year to year, in the frequency of occurrence of wet and dry states; this mode of variability is well correlated with both all-India monsoon rainfall and an index characterizing the strength of the El Niño Southern Oscillation. Analysis of low-passed time series suggests that variations in state frequency are responsible for the modulation of monsoon rainfall on multidecadal time-scales as well. Copyright © 2008 Royal Meteorological Society
    Quarterly Journal of the Royal Meteorological Society 04/2008; 134(633):875 - 887. DOI:10.1002/qj.254 · 5.13 Impact Factor
  • Source
    Sergey Kirshner, Barnabás Póczos
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a new algorithm for independent component and independent subspace analysis problems. This algorithm uses a contrast based on the Schweizer-Wolff measure of pairwise dependence (Schweizer & Wolff, 1981), a non-parametric measure computed on pairwise ranks of the variables. Our algorithm frequently outperforms state of the art ICA methods in the normal setting, is significantly more robust to outliers in the mixed signals, and performs well even in the presence of noise. Our method can also be used to solve independent subspace analysis (ISA) problems by grouping signals recovered by ICA methods. We provide an extensive empirical evaluation using simulated, sound, and image data.
    Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008; 01/2008
  • Source
    Sergey Kirshner, Padhraic Smyth
    [Show abstract] [Hide abstract]
    ABSTRACT: Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow countably many tree-structured mixture components. The resulting Bayesian framework allows us to deal with the problem of selecting the number of mixture components by computing the posterior distribution over the number of components and integrating out the components by Bayesian model averaging. We apply the proposed framework to identify the number and the properties of predominant precipitation patterns in historical archives of climate data.
    Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007; 06/2007
  • Source
    Sergey Kirshner
    [Show abstract] [Hide abstract]
    ABSTRACT: We utilize the ensemble of trees framework, a tractable mixture over super- exponential number of tree-structured distributions (1), to develop a new model for multivariate density estimation. The model is based on a construction of tree- structured copulas - multivariate distributions with uniform on (0,1) marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these tree-averaged models for both the real-valued and the categorical case. Based on the tree-averaged framework, we propose a new model for joint precipitation amounts data on networks of rain stations.
    Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007; 01/2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Daily rainfall occurrence and amount at 11 stations over North Queensland are examined for summers 1958–1998, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms of the occurrence of five discrete ‘weather states’, identified by the HMM. Three states are characterized respectively by very wet, moderately wet, and dry conditions at most stations; two states have enhanced rainfall along the coast and dry conditions inland. Each HMM rainfall state is associated with a distinct atmospheric circulation regime. The two wet states are accompanied by monsoonal circulation patterns with large-scale ascent, low-level inflow from the north-west, and a phase reversal with height; the dry state is characterized by circulation anomalies of the opposite sense. Two of the states show significant associations with midlatitude synoptic waves.Variability of the monsoon on time-scales from subseasonal to interdecadal is interpreted in terms of changes in the frequency of occurrence of the five HMM rainfall states. Large subseasonal variability is identified in terms of active and break phases, and a highly variable monsoon onset date. The occurrence of the very wet and dry states is somewhat modulated by the Madden–Julian oscillation. On interannual time-scales, there are clear relationships with the El Niño–Southern Oscillation and Indian Ocean sea surface temperatures (SSTs). Interdecadal monsoonal variability is characterized by stronger monsoons during the 1970s, and weaker monsoons plus an increased prevalence of drier states in the later part of the record.Stochastic simulations of daily rainfall occurrence and amount at the 11 stations are generated by introducing predictors based on large-scale precipitation from (a) reanalysis data, (b) an atmospheric general circulation model (GCM) run with observed SST forcing and (c) antecedent June–August Pacific SST anomalies. The reanalysis large-scale precipitation yields relatively accurate station-level simulations of the interannual variability of daily rainfall amount and occurrence, with rainfall intensity less well simulated. At some stations, interannual variations in 10-day dry-spell frequency are also simulated reasonably well. The interannual quality of the simulations is markedly degraded when the GCM simulations are used as inputs, while antecedent Pacific SST inputs yield an anomaly correlation skill comparable to that of the GCM. Copyright © 2006 Royal Meteorological Society
    Quarterly Journal of the Royal Meteorological Society 12/2006; 132(615):519 - 542. DOI:10.1256/qj.05.75 · 5.13 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The northward-propagating intraseasonal oscillation is a prominent feature of the Indian summer monsoon, leading to breaks and active phases of the monsoon. Recent studies suggest that it plays an important role in modulating monsoon seasonal rainfall totals, and that it may convey some sub-seasonal predictability to rainfall. We present a multichannel singular spectrum analysis (MSSA) of NOAA interpolated daily outgoing longwave radiation fields over the domain (0-30N, 65E-100E) for the June-September season, 1974-04. The MSSA is applied to the unfiltered daily OLR fields, by firstly decomposing the fields into the leading principal components that then form the channels of the MSSA. A northward-propagating oscillatory mode with a period of about 35 days is clearly isolated by the analysis, but accounting for only about 5% of the total daily OLR variance. To determine the significance of this mode for rainfall variability over India, we compute the mutual information between it and daily station rainfall occurrence and amount. Results are interpreted on weekly and seasonal time scales.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Daily rainfall occurrence and amount at 11 stations over North Queensland are examined during summer 1958--1997, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms of the occurrence of five discrete "weather states," identified by the HMM. Three states are characterized respectively by very wet, moderately wet, and dry conditions at most stations; two states have enhanced rainfall along the coast and dry conditions inland. Each HMM rainfall state is associated with a distinct atmospheric circulation regime. The two wet states are accompanied by monsoonal circulation patterns, with large-scale ascent, low-level inflow from the northwest, and a phase reversal with height. An upper-level monsoon trough to the east depresses the tropopause, especially for the very-wet state. The dry state is characterized by the opposite circulation anomalies. The coastal rainfall states are characterized by low-level southeasterlies from the ocean, and NW--SE midlatitude troughs. Variability of the monsoon on daily time scales and longer is interpreted in terms of the estimated daily sequence five HMM rainfall states. Large sub-seasonal variability is identified in terms of active and break phases, and a highly variable monsoon onset date. The occurrence of the very-wet and dry states is found to be somewhat modulated by the Madden-Julian oscillation. Stochastic simulations of daily rainfall occurrence and amount at the 11 stations are generated by introducing predictors based on large-scale precipitation fields from reanalysis data, and an atmospheric general circulation model.

Publication Stats

245 Citations
26.70 Total Impact Points

Institutions

  • 2013–2014
    • Purdue University
      • Department of Statistics
      West Lafayette, Indiana, United States
  • 2005–2008
    • University of Alberta
      • Department of Computing Science
      Edmonton, Alberta, Canada
  • 2002–2006
    • University of California, Irvine
      • • Donald Bren School of Information and Computer Sciences
      • • Department of Computer Science
      Irvine, CA, United States