Mehdi Mirza's research while affiliated with Université de Montréal and other places

Publications (13)

Article
Humans learn a predictive model of the world and use this model to reason about future events and the consequences of actions. In contrast to most machine predictors, we exhibit an impressive ability to generalize to unseen scenarios and reason intelligently in these settings. One important aspect of this ability is physical intuition(Lake et al.,...
Article
Full-text available
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being...
Conference Paper
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches whi...
Article
Full-text available
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximiz...
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximiz...
Article
Full-text available
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget'' how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forge...
Conference Paper
In this paper we present the techniques used for the University of Montréal's team submissions to the 2013 Emotion Recognition in the Wild Challenge. The challenge is to classify the emotions expressed by the primary human subject in short video clips extracted from feature length movies. This involves the analysis of video clips of acted scenes la...
Article
Full-text available
Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the lib...
Article
Full-text available
The ICML 2013 Workshop on Challenges in Representation Learning. 11http://deeplearning.net/icml2013-workshop-competition. focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results o...
Article
Full-text available
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve t...
Conference Paper
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve t...
Article
We introduce the multi-prediction deep Boltzmann machine (MP-DBM). The MPDBM can be seen as a single probabilistic model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent nets that share parameters and approximately solve different inference problems. Prior methods of training DBMs eith...
Conference Paper
We propose a semi-supervised approach to solve the task of emotion recognition in 2D face images using recent ideas in deep learning for handling the factors of variation present in data. An emotion classification algorithm should be both robust to (1) remaining variations due to the pose of the face in the image after centering and alignment, (2)...

Citations

... Therefore, Biswas et al. [41] used the smooth function to approximate the |x| function. They found a general approximation formula of the maximum function from the smooth approximation of the |x| function, which can smoothly approximate the general maxout [42] family, ReLU, leaky ReLU, or its variants, such as Swish, etc. In addition, the author also proves that the GELU function is a special case of the SMU. ...
... A. Ferchichi et al. To deal with the challenges mentioned above, we explored one of the most significant achievements in DL: Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), which can implicitly learn rich distributions over ST data and work with multi-model outputs (Gao et al., 2022). The GAN framework was recently presented as a way to create generative DL models using adversarial training. ...
... processing videos is more memory-intensive and computationally intensive. One simple solution is to directly apply CNN on individual frames for either prediction or feature extraction and aggregate the network outputs in the temporal dimension to obtain video-level classification [19,20]. However, handling sequential data is not the nature of CNN, and frame-level aggregation cannot explicitly exploit the temporal dependency among frames in the video. ...
... They use CNN based image classifiers taking as input an image of a block tower and returning a probability for the tower to fall. Lerer et al. (2016); Mirza et al. (2017) also include a decoding module to predict final positions of these blocks. Groth et al. (2018) investigate the ability of such a model to actively position shapes in stable tower configurations. ...
... DLTS was implemented in Python 3 using Keras 1.1.0 (Ketkar 2017) with Theano 0.8.2 (Al-Rfou et al. 2016) as the backend for the implementation of the deep neural networks. While neural networks were trained using six cores for a few days, the results reported on the tables use a single core. ...
... DBMs are known to fail to train from random initialization, which is called the joint training problem (Goodfellow et al., 2016), and a widely known solution to this is greedy layer-wise pretraining. Although there have been attempts to train without pertaining (Montavon & Müller, 2012;Goodfellow et al., 2013), it is still difficult to show good generative performance. Our method shows high-generation performance by end-to-end training without pretraining and thus can be considered a potential solution to the joint training problem. ...
... (1) Generative Adversarial Networks (GAN) (Goodfellow et al., 2020) consist of two competing neural networks: a generator creating realistic data samples and a discriminator distinguishing between real as well as generated samples (Pan et al., 2019). Both networks are trained in tandem, resulting in an adversarial competition in which the data generation capability optimizes over time (Janiesch et al., 2021). ...
... CV researchers use the term "disentanglement" to illustrate the extraction of object features, such as shape, appearance, pose, or specific parts of the object, from the image data [46]. Salah Rifai and his colleague computer scientists [47] stated, "A central challenge in CV is to disentangle the various factors of variation that explain an image, such as object pose, identity, or various other attributes" (p. 808). ...
... The background information is removed during the process of reprocessing. The next step is to normalise the face image to the size of 227x227 pixels [19][20]. The three represents different samples after pre-processing of images. ...
... However, current SSL methods mainly optimize on global level objectives for encoders, and remain suboptimal to the downstream tasks [20], [23], [24]. Besides, catastrophic forgetting [25], [26] could happen in two-stage training, where model generality may lose by fine-tuning the downstream task on a few labeled images. This motivates us to learn better semantic sensitive local representations for both the encoder and decoder in a semi-supervised setting. ...