Mehdi Mirza’s research while affiliated with Université de Montréal and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (13)


Generalizable Features From Unsupervised Learning
  • Article

December 2016

·

118 Reads

·

10 Citations

Mehdi Mirza

·

·

Humans learn a predictive model of the world and use this model to reason about future events and the consequences of actions. In contrast to most machine predictors, we exhibit an impressive ability to generalize to unseen scenarios and reason intelligently in these settings. One important aspect of this ability is physical intuition(Lake et al., 2016). In this work, we explore the potential of unsupervised learning to find features that promote better generalization to settings outside the supervised training distribution. Our task is predicting the stability of towers of square blocks. We demonstrate that an unsupervised model, trained to predict future frames of a video sequence of stable and unstable block configurations, can yield features that support extrapolating stability prediction to blocks configurations outside the training set distribution


FIG. 1. Interactive graph visualization with d3viz. Profiling colors have been activated, with redder nodes corresponding to longer computation times. Blue arrows indicate a node returns a view of the input, red arrows indicate a destroyed input.  
FIG. 2. Processing time for convolutional networks on Imagenet (milliseconds per batch, lower is better). Dark colors show forward computation time, pale colors show backward time.  
Theano: A Python framework for fast computation of mathematical expressions
  • Article
  • Full-text available

May 2016

·

6,679 Reads

·

1,291 Citations

The Theano Development Team

·

Rami Al-Rfou

·

·

[...]

·

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

Download

EmoNets: Multimodal deep learning approaches for emotion recognition in video

March 2015

·

165 Reads

·

99 Citations

The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67% on the 2014 dataset.


Figure 2: Visualization of samples from the model. Rightmost column shows the nearest training example of the neighboring sample, in order to demonstrate that the model has not memorized the training set. Samples are fair random draws, not cherry-picked. Unlike most other visualizations of deep generative models, these images show actual samples from the model distributions, not conditional means given samples of hidden units. Moreover, these samples are uncorrelated because the sampling process does not depend on Markov chain mixing. a) MNIST b) TFD c) CIFAR-10 (fully connected model) d) CIFAR-10 (convolutional discriminator and "deconvolutional" generator)
Figure 3: Digits obtained by linearly interpolating between coordinates in z space of the full model. 
Generative Adversarial Networks

June 2014

·

52,412 Reads

·

16,697 Citations

Advances in Neural Information Processing Systems

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.


Generative Adversarial Nets

June 2014

·

9,836 Reads

·

50,521 Citations

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.


An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks

December 2013

·

1,091 Reads

·

865 Citations

Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget'' how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting.


Combining modality specific deep neural networks for emotion recognition in video

December 2013

·

606 Reads

·

400 Citations

In this paper we present the techniques used for the University of Montréal's team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions expressed by the primary human subject in short video clips extracted from feature length movies. This involves the analysis of video clips of acted scenes lasting approximately one-two seconds, including the audio track which may contain human voices as well as background music. Our approach combines multiple deep neural networks for different data modalities, including: (1) a deep convolutional neural network for the analysis of facial expressions within video frames; (2) a deep belief net to capture audio information; (3) a deep autoencoder to model the spatio-temporal information produced by the human actions depicted within the entire scene; and (4) a shallow network architecture focused on extracted features of the mouth of the primary human subject in the scene. We discuss each of these techniques, their performance characteristics and different strategies to aggregate their predictions. Our best single model was a convolutional neural network trained to predict emotions from static frames using two large data sets, the Toronto Face Database and our own set of faces images harvested from Google image search, followed by a per frame aggregation strategy that used the challenge training data. This yielded a test set accuracy of 35.58%. Using our best strategy for aggregating our top performing models into a single predictor we were able to produce an accuracy of 41.03% on the challenge test set. These compare favorably to the challenge baseline test set accuracy of 27.56%.


Pylearn2: A machine learning research library

August 2013

·

1,678 Reads

·

173 Citations

Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summary of the library's architecture, and a description of how the Pylearn2 community functions socially.


Table 1 . Private test set accuracy on FER-13 
Fig. 1. Histogram of accuracies obtained by different submissions on the BBL2013 dataset. Organizer-provided baselines shown in red.
Challenges in Representation Learning: A Report on Three Machine Learning Contests

July 2013

·

3,936 Reads

·

1,074 Citations

Neural Networks

The ICML 2013 Workshop on Challenges in Representation Learning. 11http://deeplearning.net/icml2013-workshop-competition. focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.


Table 3 . Test set misclassification rates for the best meth- ods on the CIFAR-10 dataset.
Figure 4. Example 
Table 5 . Test set misclassification rates for the best meth- ods on the SVHN dataset.
Maxout Networks

February 2013

·

2,577 Reads

·

595 Citations

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.


Citations (13)


... Therefore, Biswas et al. [41] used the smooth function to approximate the |x| function. They found a general approximation formula of the maximum function from the smooth approximation of the |x| function, which can smoothly approximate the general maxout [42] family, ReLU, leaky ReLU, or its variants, such as Swish, etc. In addition, the author also proves that the GELU function is a special case of the SMU. ...

Reference:

Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection
Maxout Networks
  • Citing Conference Paper
  • February 2013

... We consider the SOTA watermarking schemes in Table 1, covering all categories from §2, selected for appearing in top venues, with several remaining unbroken by a query-free black-box attacker. Among the generatorspecific schemes, Yu1 [21], Yu2 [22], and PTW [23] watermark Generative Adversarial Networks (GANS) [53], while StableSignature [20] and TRW [18] operate on LDMs [46]. We are not aware of semantic schemes from top venues other than those we consider, which are the focus of previous work [29], [31], [35]. ...

Generative Adversarial Nets
  • Citing Article
  • June 2014

... They use CNN based image classifiers taking as input an image of a block tower and returning a probability for the tower to fall. Lerer et al. (2016); Mirza et al. (2017) also include a decoding module to predict final positions of these blocks. Groth et al. (2018) investigate the ability of such a model to actively position shapes in stable tower configurations. ...

Generalizable Features From Unsupervised Learning
  • Citing Article
  • December 2016

... Theano Python Theano (Al-Rfou et al. 2016) is closely integrated with expressive Python syntax. It provides automatic differentiation capabilities, which eases flexibility to modify architecture for research development (Bahrampour et al. 2015). ...

Theano: A Python framework for fast computation of mathematical expressions

... However, a key distinction is that in DBMs, all connections between the RBMs are undirected [56] Figure 11, allowing for more flexible learning of feature hierarchies. This undirected nature enables DBMs to better represent the complex dependencies [121,122] between different activities and their corresponding sensor readings in HAR tasks. By leveraging their ability to learn high-level features from raw sensor data, DBMs can improve the accuracy of activity classification and enhance the interpretability of the underlying patterns, ultimately facilitating more robust HAR systems. ...

Multi-prediction deep Boltzmann machines
  • Citing Article
  • January 2013

Advances in Neural Information Processing Systems

... Content-based psycholinguistic features have also been studied to predict social media messages, and supervised machine learning has been proposed to overcome the limitations of human and computer coding procedures [44]. Although facial expression recognition has improved over the years, it still remains challenging due to the subtle and variable nature of facial expressions [45]. Effective feature extraction techniques such as Dlib-ml, which identifies key features of a face that contribute to generating emotions, have been proposed to overcome these difficulties [46]. ...

Disentangling Factors of Variation for Facial Expression Recognition
  • Citing Conference Paper
  • October 2012

Lecture Notes in Computer Science

... Of the total, 50 images (25 malignant and 25 benign) were used to train the neuronal network, and the remaining 10 were reserved to assess the accuracy of the tests. Five convolutional neural networks were used: AlexNet, LeNet5, LeNet5-Like, VGGNet16, and ZFNet, which have various applications such as facial emotion recognition [12], object recognition [13], and medical applications [14], which is the subject of this study. ...

Combining modality specific deep neural networks for emotion recognition in video
  • Citing Conference Paper
  • December 2013

... To evaluate the performance of quantum-inspired swarm optimization in addressing complex problem-solving, we consider the following datasets: a hierarchical text generation dataset that facilitates learning representations of dialogue messages [31], a dataset offering human-centered face similarity judgments designed for creating low-dimensional embeddings [32], a study on catastrophic forgetting in neural networks highlighting the utility of the dropout algorithm [33], the TWEETQA dataset for question answering focused on social media content [34], and an analysis of single-layer networks that underscores the importance of hidden nodes and feature extraction for high performance in unsupervised learning tasks [35]. ...

An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks