Matthew J. Beal’s research while affiliated with State University of New York and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (39)


Machine Learning for Signature Verification
  • Chapter

December 2007

·

409 Reads

·

13 Citations

Studies in Computational Intelligence

·

·

Siyuan Chen

·

Matthew J. Beal

Signature verification is a common task in forensic document analysis. It's aim is to determine whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning tasks to be accomplished: person-independent (or general) learning and person-dependent (or special) learning. General learning is from a population of genuine and forged signatures of several individuals, where the differences between genuines and forgeries across all individuals are learnt. The general learning model allows a questioned signature to be compared to a single genuine signature. In special learning, a person's signature is learnt from multiple samples of only that person's signature — where within-person similarities are learnt. When a sufficient number of samples are available, special learning performs better than general learning (5% higher accuracy). With special learning, verification accuracy increases with the number of samples. An interactive software implementation of signature verification involving both the learning and performance phases is described.


Segmentation and labeling of documents using conditional random fields

March 2007

·

268 Reads

·

45 Citations

Proceedings of SPIE - The International Society for Optical Engineering

The paper describes the use of Conditional Random Fields(CRF) utilizing contextual information in automati-cally labeling extracted segments of scanned documents as Machine-print, Handwriting and Noise. The result of such a labeling can serve as an indexing step for a context-based image retrieval system or a bio-metric signature verification system. A simple region growing algorithm is first used to segment the document into a number of patches. A label for each such segmented patch is inferred using a CRF model. The model is flexible enough to include signatures as a type of handwriting and isolate it from machine-print and noise. The robustness of the model is due to the inherent nature of modeling neighboring spatial dependencies in the labels as well as the observed data using CRF. Maximum pseudo-likelihood estimates for the parameters of the CRF model are learnt using conjugate gradient descent. Inference of labels is done by computing the probability of the labels under the model with Gibbs sampling. Experimental results show that this approach provides for 95.75% of the data being assigned correct labels. The CRF based model is shown to be superior to Neural Networks and Naive Bayes.


Reconstructing Transcriptional Networks Using Gene Expression Profiling and Bayesian State-Space Models

January 2007

·

23 Reads

·

3 Citations

A major challenge in systems biology is the ability to model complex regulatory interactions. This chapter is concerned with the use of Linear- Gaussian state-space models (SSMs), also known as linear dynamical systems (LDS) or Kalman filter models, to “reverse engineer” regulatory networks from high-throughput data sources, such as microarray gene expression profiling. LDS models are a subclass of dynamic Bayesian networks used for modeling time series data and have been used extensively in many areas of control and signal processing. We describe results from simulation studies based on synthetic mRNA data generated from a model that contains definite nonlinearities in the dynamics of the hidden factors (arising from the oligomerization of transcription factors). Receiver operating characteristic (ROC) analysis demonstrates an overall accuracy in transcriptional network reconstruction from the mRNA time series measurements alone of approximately a 68% area under the curve (AUC) for 12 time points, and better still for data sampled at a higher rate. A key ingredient of these models is the inclusion of “hidden factors” that help to explain the correlation structure of the observed measurements. These factors may correspond to unmeasured quantities that were not captured during the experiment and may represent underlying biological processes. Results from the modeling of the synthetic data also indicate that our method is capable of capturing the temporal nature of the data and of explaining it using these hidden processes, some of which may plausibly reflect dynamic aspects of the underlying biological reality.


Variational Bayesian Learning of Directed Graphical Models with Hidden Variables

December 2006

·

72 Reads

·

101 Citations

Bayesian Analysis

A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical models, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightforward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs containing hidden variables, we compare the accuracy of the VB approximation to existing asymptoticdata approximations such as the Bayesian Information Criterion (BIC) and the Cheeseman-Stutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We find that the VB algorithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.


Figure 3. Results for Document Topic Modeling. (a) Comparison of LDA ( ) and the HDP ( ) Mixtures, With Results Averaged Over 10 Runs (error bars are one standard error); and (b) Histogram of the Number of Topics for the Hierarchical Dirichlet Process Mixture Over 100 Posterior Samples. 
Figure 4: Comparing iHMM (horizontal line) versus ML, MAP and VB trained HMMs. Error bars are 1 standard error (those for iHMM too small to see).
Hierarchical Dirichlet Processes
  • Article
  • Full-text available

December 2006

·

2,197 Reads

·

2,882 Citations

We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.

Download

On profiling mobility and predicting locations of wireless users

May 2006

·

82 Reads

·

66 Citations

In this paper, we analyze a year long wireless network users' mobility trace data collected on ETH Zurich campus. Un-like earlier work in [4, 19], we profile the movement pattern of wireless users and predict their locations. More specifi-cally, we show that each network user regularly visits a list of places such as a building (also referred to as "hubs") with some probability. The daily list of hubs, along with their corresponding visit probabilities, are referred to as a mobil-ity profile. We also show that over a period of time (e.g., a week), a user may repeatedly follow a mixture of mobil-ity profiles with certain probabilities associated with each of the profiles. Our analysis of the mobility trace data not only validate the existence of our so-called sociological or-bits [8], but also demonstrate the advantages of exploiting it in performing hub-level location predictions. In particu-lar, we show that such profile based location predictions are more precise than common statistical approaches based on observed hub visitation frequencies alone.


Comparison of ROC-based and likelihood methods for flngerprint veriflcation

April 2006

·

49 Reads

·

3 Citations

Proceedings of SPIE - The International Society for Optical Engineering

·

·

Matthew J. Beal

·

[...]

·

The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs to the same flnger. The paper focuses on the classiflcation aspect of flngerprint veriflcation. Classiflcation is the third and flnal step after after the two earlier steps of feature extraction, where a known set of features (minutiae points) have been extracted from each flngerprint, and scoring, where a matcher has determined a degree of match between the two sets of features. Since this is a binary classiflcation problem involving a single variable, the commonly used threshold method is related to the so-called receiver operating characteristics (ROC). In the ROC approach the optimal threshold on the score is determined so as to determine match or non-match. Such a method works well when there is a well-registered flngerprint image. On the other hand more sophisticated methods are needed when there exists a partial imprint of a flnger|as in the case of latent prints in forensics or due to limitations of the biometric device. In such situations it is useful to consider classiflcation methods based on computing the likelihood ratio of match/non-match. Such methods are commonly used in some biometric and forensic domains such as speaker veriflcation where there is a much higher degree of uncertainty. This paper compares the two approaches empirically for the flngerprint classiflcation task when the number of available minutiae are varied. In both ROC-based and likelihood ratio methods, learning is from a general population of ensemble of pairs, each of which is labeled as being from the same flnger or from difierent flngers. In the ROC-based method the best operating point is derived from the ROC curve. In the likelihood method the distributions of same flnger and difierent flnger scores are modeled using Gaussian and Gamma distributions. The performances of the two methods are compared for varying numbers of minutiae points available. Results show that the likelihood method performs better than the ROC-based method when fewer minutiae points are available. Both methods converge to the same accuracy as more minutiae points are available.


Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier

January 2006

·

25 Reads

·

12 Citations

Most approaches to event extraction focus on mentions anchored in verbs. However, many mentions of events surface as noun phrases. Detecting them can increase the recall of event extraction and provide the foundation for detecting relations between events. This paper describes a weakly- supervised method for detecting nominal event mentions that combines techniques from word sense disambiguation (WSD) and lexical acquisition to create a classifier that labels noun phrases as denoting events or non-events. The classifier uses boot- strapped probabilistic generative models of the contexts of events and non-events. The contexts are the lexically-anchored se- mantic dependency relations that the NPs appear in. Our method dramatically im- proves with bootstrapping, and comfort- ably outperforms lexical lookup methods which are based on very much larger hand- crafted resources.


Machine Learning for Signature Verification

January 2006

·

135 Reads

·

28 Citations

Lecture Notes in Computer Science

Signature verification is a common task in forensic document analysis. It is one of determining whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning to be accomplished. In the first, the training set consists of genuines and forgeries from a general population. In the second there are genuine signatures in a given case. The two learning tasks are called person-independent (or general) learning and person-dependent (or special) learning. General learning is from a population of genuine and forged signatures of several individuals, where the differences between genuines and forgeries across all individuals are learnt. The general learning model allows a questioned signature to be compared to a single genuine signature. In special learning, a person's signature is learnt from multiple samples of only that person's signature- where within-person similarities are learnt. When a sufficient number of samples are available, special learning performs better than general learning (5% higher accuracy). With special learning, verification accuracy increases with the number of samples. An interactive software implementation of signature verification involving both the learning and performance phases is de- scribed.


Figure 1. Model of the competitive classifier.  
Competitive Mixtures of Simple Neurons

January 2006

·

35 Reads

·

1 Citation

We propose a competitive finite mixture of neurons (or perceptrons) for solving binary classification problems. Our classifier includes a prior for the weights between dif- ferent neurons such that it prefers mixture models made up from neurons having classification boundaries as orthog- onal to each other as possible. We derive an EM algo- rithm for learning the mixing proportions and weights of each neuron, consisting of an exact E step and a partial M step, and show that our model covers the regions of high posterior probability in weight space and tends to reduce overfitting. We demonstrate the way in which our mixture classifier works using a toy 2-dimensional data set, show- ing the effective use of strategically positioned components in the mixture. We further compare its performance against SVMs and one-hidden-layer neural networks on four real- world data sets from the UCI repository, and show that even a relatively small number of neurons with appopriate com- petitive priors can achieve superior classification accura- cies on held-out test data.


Citations (32)


... The parameters of the rest of the DGP depend on the discrete state ψ t . The objective is to infer the sequence of underlying discrete states that best "explains" the observed data (Ostendorf et al., 1996;Ghahramani & Hinton, 2000;Beal et al., 2001;Fox et al., 2007;Van Gael et al., 2008;Linderman et al., 2017). In this context, non-stationarity arises from the switching behaviour of the underlying discrete process. ...

Reference:

BONE: a unifying framework for Bayesian online learning in non-stationary environments
The Infinite Hidden Markov Model
  • Citing Chapter
  • November 2002

... Instead of evaluating the likelihood, the algorithms in this category operate under the assumption that simulating data under the model (or a surrogate thereof) facilitates an understanding of the likelihood. Representatives for these algorithms are Bayesian synthetic likelihood (Price et al. 2018), specific versions of Variational Bayes (Beal and Ghahramani 2003;Jordan et al. 1999;Blei et al. 2017), Integrated nested Laplace (Rue et al. 2009), and, possibly the most popular one, Approximate Bayesian computation (ABC) (Tavaré et al. 1997;Pritchard et al. 1999;Beaumont et al. 2002;Marjoram et al. 2003;Csilléry et al. 2010;Beaumont 2010;Sisson et al. 2007Sisson et al. , 2019. In this work, we focus exclusively on ABC, which has proven to facilitate successful calibration in the context of ABMs in biological applications, e.g., (Lambert et al. 2018;Wang et al. 2024). ...

The Variational Bayesian EM Algorithm for Incomplete Data: With Application to Scoring Graphical Model Structures
  • Citing Chapter
  • July 2003

... Online Bayesian learning [25] and joint Bayesian learning of HMM structures and transformation parameters have also been extensively studied [24], [26], [27]. Besides point estimation, the entire posterior distributions can also be approximated by other Bayesian approaches, such as Markov Chain Monte Carlo (MCMC) [28], assumed density filtering [29], and stochastic variational Bayes (VB) [30], [31]. The VB approach, in particular, performs an estimation on the entire posterior distribution via a stochastic variational inference method and transforms an estimation problem into an optimization one that can be solved numerically by leveraging a variational distribution. ...

Variational algorithms for approximate Bayesian inference
  • Citing Thesis
  • January 2003

... This is the conventional negative evidence lower bound objective (ELBO) (Beal and Ghahramani, 2000) up to a constant. In contrast to variational latent variable models such as variational autoencoders (VAEs) (Kingma and Welling, 2014), here the space modeled by the prior and variational posterior is grounded by observed data. ...

Gatsby Computational Neuroscience Unit
  • Citing Article
  • January 2000

... To reduce the complexity of learning and evaluation, the Lei team combined standard recursive feature elimination with principal component analysis based on SVM to generate a multi-class SVM framework. Experiments denoted that this method can improve the evaluation speed of SVM by an order of magnitude while maintaining considerable accuracy [10]. ...

Speeding Up Multi-class SVM Evaluation via Principle Component Analysis and Recursive Feature Elimination

... VB algorithm for HSMM allows making inference on parameters, hidden states and models by approximating the joint posterior of hidden states and parameters, with a simpler variational density. The usual Mean Field Approximation (Ghahramani et al., 2000) requires the approximate posterior to factorise over subsets of parameters and hidden variables: ...

Graphical model and variational methods
  • Citing Chapter
  • January 2001

... In other cases, the total error rate ε t , which is defined as ε t = ((FRR · P (ω 1 )) + (FAR · P (ω 2 ))-where P (ω 1 ) and P (ω 2 ) are the a priori probabilities of classes of genuine signatures (ω 1 ) and forgeries (ω 2 ), is used [281]- [283]. The receiver operating characteristic (ROC) curve analysis is also applied to FRR versus FAR evaluation since it shows the ability of a system to discriminate genuine signatures from forged ones [see Fig. 10(b)] [309], [311]. ...

Machine learning approaches for person identification and verification
  • Citing Article
  • May 2005

Proceedings of SPIE - The International Society for Optical Engineering

... Goodness of fit tests can be successfully used in various areas, such as signature verification, automatic speaker identification, detection of radio frequency, economics, and data reconstruction (Biswas et al. 2008;Cho et al. 2013;Güner et al. 2009;Srinivasan et al. 2005). ...

Signature verification using kolmogorov-smirnov statistic
  • Citing Article
  • February 2005

... Low-level visual cues (e.g., intensity, contrast, homogeneity, etc.) in combination with shallow machine learning techniques have been used for specific object classification tasks [22], [23] with fair performance, but these methods are mainly for classification as they tend to aggregate global features in compact representations which are less suitable for performing object detection. Performance improvement has been also sought by resorting to graphical models [24], [25], [26]) such as Conditional Random Fields (CRFs), but despite their capabilities to capture fine edge details, these methods are still not as effective as expected. Our hypothesis is that the main reason for unsatisfactory performance is that tables (mainly) and charts usually cover large areas and, as such, they need methods able to account for long-range dependencies. ...

Segmentation and labeling of documents using conditional random fields
  • Citing Article
  • March 2007

Proceedings of SPIE - The International Society for Optical Engineering

... (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) [14] and [15] [ 16] [17] [18] [8] Our approach users with similar behaviors, originating sets of trajectories that share characteristic traits. A recommendation agent analyzes contextual information and current user's past interactions in order to establish its most significant coupling with one of the clusters of trajectories, following a Multi-Armed Bandit policy, and to provide the most appropriate suggestions. ...

On profiling mobility and predicting locations of wireless users
  • Citing Article
  • May 2006