Rahul Mazumder’s research while affiliated with Massachusetts Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares
  • Article

October 2014

·

555 Reads

·

507 Citations

Journal of Machine Learning Research

·

Rahul Mazumder

·

·

Reza Zadeh

The matrix-completion problem has attracted a lot of attention, largely as a result of the celebrated Netflix competition. Two popular approaches for solving the problem are nuclear-norm-regularized matrix approximation (Candes and Tao, 2009, Mazumder, Hastie and Tibshirani, 2010), and maximum-margin matrix factorization (Srebro, Rennie and Jaakkola, 2005). These two procedures are in some cases solving equivalent problems, but with quite different algorithms. In this article we bring the two approaches together, leading to an efficient algorithm for large matrix factorization and completion that outperforms both of these. We develop a software package "softImpute" in R for implementing our approaches, and a distributed version for very large matrices using the "Spark" cluster programming environment.


Assessing the Significance of Global and Local Correlations under Spatial Autocorrelation; a Nonparametric Approach

January 2014

·

114 Reads

·

48 Citations

Biometrics

Júlia Viladomat

·

Rahul Mazumder

·

·

[...]

·

We propose a method to test the correlation of two random fields when they are both spatially autocorrelated. In this scenario, the assumption of independence for the pair of observations in the standard test does not hold, and as a result we reject in many cases where there is no effect (the precision of the null distribution is overestimated). Our method recovers the null distribution taking into account the autocorrelation. It uses Monte-Carlo methods, and focuses on permuting, and then smoothing and scaling one of the variables to destroy the correlation with the other, while maintaining at the same time the initial autocorrelation. With this simulation model, any test based on the independence of two (or more) random fields can be constructed. This research was motivated by a project in biodiversity and conservation in the Biology Department at Stanford University.


Fig. 2 Grain-size frequency distribution of the bed material
Turbulence, suspension and downstream fining over a sand-gravel mixture bed
  • Article
  • Full-text available

June 2013

·

165 Reads

·

16 Citations

International Journal of Sediment Research

Flume experiments were carried out to study the turbulence and its impact on suspension and segregation of grain-sizes under unidirectional flow conditions over the sand-gravel mixture bed. The components of fluid velocity with fluctuations were measured vertically using 3-D Micro-acoustic Doppler velocimeter (ADV). The theoretical models for velocity and sediment suspension have been developed based on the concept of mixing length that includes the damping effect of turbulence due to sediment suspension in the flow over the sand-gravel mixture bed. Statistical analysis of segregation of grain-sizes along downstream of the bed has been performed using the principle of unsupervised learning or clustering problem. Exploratory data analysis suggests that there is a progressive downstream fining of sediment sizes with selective depositions of gravels, sand-gravels and sand materials along the stream, which may be segmented into three regions such as, the upstream, the transitional and the downstream respectively. This contribution is relevant to understand the direction of ancient rivers, the bed material character in the river form, sorting process and its role in controlling the sediment flux through landscape.

Download

Figure 1: [Left panel] The objective values of the primal criterion (1) and the dual criterion (19) corresponding to the covariance matrix W produced by glasso algorithm as a function of the iteration index (each column/row update). [Middle Panel] The successive differences of the primal objective values-the zero crossings indicate non-monotonicity. [Right Panel] The successive differences in the dual objective valuesthere are no zero crossings, indicating that glasso produces a monotone sequence of dual objective values.
The Graphical Lasso: New Insights and Alternatives

November 2011

·

506 Reads

·

284 Citations

Electronic Journal of Statistics

The graphical lasso [5] is an algorithm for learning the structure in an undirected Gaussian graphical model, using ℓ[subscript 1] regularization to control the number of zeros in the precision matrix Θ = Σ[superscript −1] [2, 11]. The R package GLASSO [5] is popular, fast, and allows one to efficiently build a path of models for different values of the tuning parameter. Convergence of GLASSO can be tricky; the converged precision matrix might not be the inverse of the estimated covariance, and occasionally it fails to converge with warm starts. In this paper we explain this behavior, and propose new algorithms that appear to outperform GLASSO. By studying the “normal equations” we see that, GLASSO is solving the dual of the graphical lasso penalized likelihood, by block coordinate ascent; a result which can also be found in [2]. In this dual, the target of estimation is Σ, the covariance matrix, rather than the precision matrix Θ. We propose similar primal algorithms P-GLASSO and DP-GLASSO, that also operate by block-coordinate descent, where Θ is the optimization target. We study all of these algorithms, and in particular different approaches to solving their coordinate sub-problems. We conclude that DP-GLASSO is superior from several points of view.


Modeling item--item similarities for personalized recommendations on Yahoo! front page

November 2011

·

151 Reads

·

16 Citations

The Annals of Applied Statistics

We consider the problem of algorithmically recommending items to users on a Yahoo! front page module. Our approach is based on a novel multilevel hierarchical model that we refer to as a User Profile Model with Graphical Lasso (UPG). The UPG provides a personalized recommendation to users by simultaneously incorporating both user covariates and historical user interactions with items in a model based way. In fact, we build a per-item regression model based on a rich set of user covariates and estimate individual user affinity to items by introducing a latent random vector for each user. The vector random effects are assumed to be drawn from a prior with a precision matrix that measures residual partial associations among items. To ensure better estimates of a precision matrix in high-dimensions, the matrix elements are constrained through a Lasso penalty. Our model is fitted through a penalized-quasi likelihood procedure coupled with a scalable EM algorithm. We employ several computational strategies like multi-threading, conjugate gradients and heavily exploit problem structure to scale our computations in the E-step. For the M-step we take recourse to a scalable variant of the Graphical Lasso algorithm for covariance selection. Through extensive experiments on a new data set obtained from Yahoo! front page and a benchmark data set from a movie recommender application, we show that our UPG model significantly improves performance compared to several state-of-the-art methods in the literature, especially those based on a bilinear random effects model (BIRE). In particular, we show that the gains of UPG are significant compared to BIRE when the number of users is large and the number of items to select from is small. For large item sets and relatively small user sets the results of UPG and BIRE are comparable. The UPG leads to faster model building and produces outputs which are interpretable.


A Flexible, Scalable and Efficient Algorithmic Framework for Primal Graphical Lasso

October 2011

·

46 Reads

·

6 Citations

We propose a scalable, efficient and statistically motivated computational framework for Graphical Lasso (Friedman et al., 2007b) - a covariance regularization framework that has received significant attention in the statistics community over the past few years. Existing algorithms have trouble in scaling to dimensions larger than a thousand. Our proposal significantly enhances the state-of-the-art for such moderate sized problems and gracefully scales to larger problems where other algorithms become practically infeasible. This requires a few key new ideas. We operate on the primal problem and use a subtle variation of block-coordinate-methods which drastically reduces the computational complexity by orders of magnitude. We provide rigorous theoretical guarantees on the convergence and complexity of our algorithm and demonstrate the effectiveness of our proposal via experiments. We believe that our framework extends the applicability of Graphical Lasso to large-scale modern applications like bioinformatics, collaborative filtering and social networks, among others.


Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

August 2011

·

113 Reads

·

174 Citations

Journal of Machine Learning Research

We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter ρ\rho. Suppose the co- variance graph formed by thresholding the entries of the sample covariance matrix at ρ\rho is decomposed into connected components. We show that the vertex-partition induced by the thresholded covariance graph is exactly equal to that induced by the estimated concentration graph. This simple rule, when used as a wrapper around existing algorithms, leads to enormous performance gains. For large values of ρ\rho, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large scale graphical lasso problem.


Spectral Regularization Algorithms for Learning Large Incomplete Matrices

March 2010

·

1,091 Reads

·

1,303 Citations

Journal of Machine Learning Research

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.


Figure 3: Figure shows the penalized least-squares criterion (7) with the log-penalty (11) 
SparseNet: Coordinate Descent With Nonconvex Penalties

January 2010

·

618 Reads

·

473 Citations

We address the problem of sparse selection in linear models. A number of non-convex penalties have been proposed for this purpose, along with a variety of convex-relaxation algorithms for finding good solutions. In this paper we pursue the coordinate-descent approach for optimization, and study its convergence properties. We characterize the properties of penalties suitable for this approach, study their corresponding threshold functions, and describe a df -standardizing reparametrization that assists our pathwise algorithm. The MC+ penalty (Zhang 2010) is ideally suited to this task, and we use it to demonstrate the performance of our algorithm.


Regularization methods for learning incomplete matrices

June 2009

·

141 Reads

·

8 Citations

We use convex relaxation techniques to provide a sequence of solutions to the matrix completion problem. Using the nuclear norm as a regularizer, we provide simple and very efficient algorithms for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm iteratively replaces the missing elements with those obtained from a thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions. Comment: 10 pages, 1 figure


Citations (13)


... Although easy to apply, these methods may result in information loss or introduce significant bias [29]. In contrast, algorithms such as k-Nearest Neighbor, Expectation-Maximization, Matrix Factorization, and Multiple Imputation using Chained Equations [20,[30][31][32] consider multiple influencing parameters of the real system and their interrelationships as comprehensively as possible, thereby reducing imputation bias. Recently, interpolation models based on Generative Adversarial Networks (GANs) [33] have achieved higher accuracy; however, their training process may encounter issues such as mode collapse and difficulty in convergence. ...

Reference:

Reconstruction and Prediction of Chaotic Time Series with Missing Data: Leveraging Dynamical Correlations Between Variables
Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares
  • Citing Article
  • October 2014

Journal of Machine Learning Research

... According to the data from Demographia (2023), there are a total of 44 megalopolises worldwide, with approximately half of them located in China and India. According to the data from the 19.87 million in its urban area, has become the first megacity in China [7,8]. However, the allocation of medical resources in the suburbs does not match the population size. ...

Assessing the Significance of Global and Local Correlations under Spatial Autocorrelation; a Nonparametric Approach
  • Citing Article
  • January 2014

Biometrics

... Hajek et al. (2010) attempted to determine the statistical characterization of grain-size distributions in natural rivers. Ghoshal et al. (2013) and Ghoshal and Pal (2014) did experiments on a sand-gravel mixture to understand the bed material character in the river form, sorting process and its role in controlling the sediment flux through the landscape. Pal and Ghoshal (2014a, 2014b, 2015 in a series of papers carried out experiments on sand-gravel mixture beds under different flow velocities to study the influence of bed roughness, flow velocity, and suspension height on the grain-size distribution in suspension. ...

Turbulence, suspension and downstream fining over a sand-gravel mixture bed

International Journal of Sediment Research

... In turbulent boundary layers, coherent structures with large flux events have been proposed to explain the "bursting" phenomena responsible for two types of eddy motions name as "ejections" and "sweeps" (Cantwell, 1981;Robinson, 1991). These events are traditionally detected by conditional sampling through quadrant analysis in the (x, z)-plane (Willmarth & Lu, 1972) and their statistics have been investigated for a variety of flows and wall-roughness conditions, for example, experiments in open-channel (Hurther & Lemmin, 2000;Hurther, Lemmin, & Terray, 2007;Mazumder, 2007;Mazumder, Pal, Ghoshal, & Ojha, 2009;Nakagawa & Nezu, 1977;Nelson, Shreve, McLean, & Drake, 1995;Ojha & Mazumder, 2008;Venditti & Bauer, 2005); in wind-tunnel (Raupach, 1981); under an-ice boundary layer (Fer, McPhee, & Sirevaag, 2004); in atmospheric boundary layers (Hurther & Lemmin, 2003;Katul, Kuhn, Schieldge, & Hsieh, 1997;Katul, Poggi, Cava, & Finnigan, 2006;Sterk, Jacobs, & van Boxel, 1998), and in scour around vertical circular cylinders (Debnath, Manik, & Mazumder, 2012;Kirkil, Constantinscu, & Ettema, 2008;Sarkar, Chakraborty, & Mazumder, 2015;Sarkar, Chakraborty, & Mazumder, 2016). Besides the dominant role of the sweeps close to a rough wall, an "equilibrium region" is often observed in fully developed turbulent flows (Krogstad, Antonia, & Browne, 1992). ...

Clustering based on geometry and interactions of turbulence bursting rate processes in a trough region
  • Citing Article
  • June 2007

Environmetrics

... (Lyn [4], Bennett and Best [5], Parsons et al. [6], Best [7], Poggi et al. [8], Ojha and Mazumder [9], Peet et al. [10], Stoesser et al. [11], Mazumder et al. [12], Keshavarzi et al. [13]). The turbulent flow and related bursting phenomena over an isolated asymmetric waveform structure were studied statistically by Mazumder and Mazumder [14] using the analysis of multivariate normal distribution. Mazumder [15,16] studied the turbulence of fluid flow in the trough region between a pair of adjacent asymmetric waveform structures using a statistical clustering technique based on geometry and interactions of turbulence bursting rate. ...

Statistical characterization of circulation patterns and direction of turbulent flow over a waveform structure
  • Citing Article
  • August 2006

Environmetrics

... In terms of DC structure in sparse regularization, several works have analyzed the use of the DC function x → ∥x∥ 1 − ∥x∥ 2 as a sparsity inducing regularizer [3,51] as its zeros correspond to 1-sparse vectors. Many popular nonconvex regularizers have also been shown to have a DC decomposition [12], such as SCAD [18], MCP [53], or the Logarithmic penalty [35]. ...

SparseNet: Coordinate Descent With Nonconvex Penalties

... In recent works, there are two main approaches for joint graphical modeling, regularization-based and Bayesian methods [5]. Similar to the graphical lasso, sparsity in regularization-based approaches is induced by L 1 regularization on entries of the inverse covariance matrix, also referred to as the precision matrix [9]: ...

The Graphical Lasso: New Insights and Alternatives

Electronic Journal of Statistics

... Hence, using this information as user covariates helps in improving predictions for explicit ratings. Further, one can derive an item graph where edge weights represent movie similarities that are based on global "who-rated-what" matrix (Kouki et al., 2015;Wang et al., 2015;Agarwal et al., 2011;Mazumder and Agarwal, 2011). Imposing sparsity on such a graph and finding its fair communities is attractive since it is intuitive that an item is generally related to only a few other items. ...

Modeling item--item similarities for personalized recommendations on Yahoo! front page

The Annals of Applied Statistics

... We follow the experimental setting in [15,12,11] to generate data and perform synthetic experiments on multivariate Gaussians. Each off-diagonal entry of the precision matrix is drawn from a uniform distribution, i.e., Θ * ij ∼ U(−1, 1), and then set to zero with probability p = 1 − s, where s means the sparsity level (refer to Appendix C.1). ...

A Flexible, Scalable and Efficient Algorithmic Framework for Primal Graphical Lasso
  • Citing Article
  • October 2011

... This is of interest because L R = O q implies that there are no across-block edges so that the two groups are independent. For the case where λ 2 = 0, i.e. in the graphical lasso, this problem was considered by Witten et al. (2011) and Mazumder and Hastie (2012) where it is proved that such block diagonal structure is obtained whenever λ 1 ≥ max i, j∈L |s i j |. Here, we show that the latter result is still true in the pdglasso problem for any λ 2 ≥ 0. ...

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Journal of Machine Learning Research