Måns Magnusson

Måns Magnusson
Uppsala University | UU · Department of Statistics

Doctor of Philosophy

About

23
Publications
3,927
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
156
Citations

Publications

Publications (23)
Preprint
As large corpora of digitized text and novel methodologies become increasingly available, researchers are rediscovering textual data’s potential fruitfulness for inquiries into social and cultural phenomena. While textual corpora show great promise to enrich our knowledge of the social, avoiding problems related to data quality remains a challenge...
Preprint
Full-text available
We consider nearest neighbor weighted random walks on the $d$-dimensional box $[n]^d$ that are governed by some function $g:[0,1] \ra [0,\iy)$, by which we mean that standing at $x$, a neighbor $y$ of $x$ is picked at random and the walk then moves there with probability $(1/2)g(n^{-1}y)/(g(n^{-1}y)+g(n^{-1}x))$. We do this for $g$ of the form $f^{...
Preprint
Full-text available
We consider the problem of fitting variational posterior approximations usingstochastic optimization methods. The performance of these approximations de-pends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective.We show that even in the bes...
Preprint
Full-text available
When evaluating and comparing models using leave-one-out cross-validation (LOO-CV), the uncertainty of the estimate is typically assessed using the variance of the sampling distribution. It is known, however, that no unbiased estimator for the variance can be constructed in a general case. While it has not been discussed before, it could be possibl...
Preprint
Full-text available
Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. Estimating the uncertainty of the resulting LOO-CV estimate is a complex task and it is known that the commonly used standard error estimate is often too small. We analyse the frequency prop...
Preprint
Full-text available
Bayesian model comparison is often based on the posterior distribution over the set of compared models. This distribution is often observed to concentrate on a single model even when other measures of model fit or forecasting ability indicate no strong preference. Furthermore, a moderate change in the data sample can easily shift the posterior mode...
Article
Full-text available
We introduce Diagonal Orthant Latent Dirichlet Allocation (DOLDA), a supervised topic model for multi-class classification that can handle both many classes as well as many covariates. To handle many classes we use the recently proposed Diagonal Orthant (DO) probit model together with an efficient horseshoe prior for variable selection/shrinkage. A...
Preprint
Full-text available
Recently, new methods for model assessment, based on subsampling and posterior approximations, have been proposed for scaling leave-one-out cross-validation (LOO) to large datasets. Although these methods work well for estimating predictive performance for individual models, they are less powerful in model comparison. We propose an efficient method...
Preprint
Full-text available
Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word...
Preprint
Full-text available
Nonparametric extensions of topic models such as Latent Dirichlet Allocation, including Hierarchical Dirichlet Process (HDP), are often studied in natural language processing. Training these models generally requires use of serial algorithms, which limits scalability to large data sets and complicates acceleration via use of parallel and distribute...
Preprint
Full-text available
Model inference, such as model comparison, model checking, and model selection, is an important part of model development. Leave-one-out cross-validation (LOO) is a general approach for assessing the generalizability of a model, but unfortunately, LOO does not scale well to large datasets. We propose a combination of using approximate inference tec...
Preprint
Full-text available
In this paper we study the effects of a radical right party entering a national parliament, on the parliament discourse. We follow the classification developed by Meguid (2008) and use a probabilistic topic model approach to analyze the 300,000 speeches delivered in the Swedish parliament between 1994 and 2017. Our results indicate that immigration...
Article
Topic models, and more specifically the class of Latent Dirichlet Allocation (LDA), are widely used for probabilistic modeling of text. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-...
Article
Full-text available
Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big data sets that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to para...
Article
Full-text available
The amount of public data is increasing in most countries. At the same time fine-graned data, such as municipal or county levels data, are often underutilized by local news-rooms. There are many reasons to this, the financial situation of local newsrooms, the large scale of public datasets and the large amount of random noise in data. We use a Baye...
Article
Full-text available
Latent dirichlet allocation (LDA) is a model widely used for unsupervised probabilistic modeling of text and images. MCMC sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler that integrates out all model parameters except the topic indicators for each word. The topic indicators are Gibbs sampled iterative...
Article
Full-text available
The head louse, Pediculus humanus capitis, is an obligate ectoparasite that causes infestations of humans. Studies have demonstrated a correlation between sales figures for over-the-counter (OTC) treatment products and the number of humans with head lice. The deregulation of the Swedish pharmacy market on July 1, 2009, decreased the possibility to...

Network

Cited By

Projects

Project (1)