Markus Gross

Markus Gross
ETH Zurich | ETH Zürich · Department of Computer Science

About

467
Publications
110,418
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
24,826
Citations
Citations since 2016
103 Research Items
11363 Citations
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
Additional affiliations
March 1998 - present
Disney Research
Position
  • Managing Director
July 1994 - present
ETH Zurich
Position
  • Founder and Director

Publications

Publications (467)
Conference Paper
We propose a 3D+time framework for modeling dynamic sequences of 3D facial shapes, representing realistic non-rigid motion during a performance. Our work extends neural 3D morphable models by learning a motion manifold using a transformer architecture. More specifically, we derive a novel transformer-based autoencoder that can model and synthesize...
Article
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to...
Article
Generating realistic facial animation for CG characters and digital doubles is one of the hardest tasks in animation. A typical production workflow involves capturing the performance of a real actor using mo-cap technology, and transferring the captured motion to the target digital character. This process, known as retargeting , has been used for o...
Article
We propose a method for constructing feature sets that significantly improve the quality of neural denoisers for Monte Carlo renderings with volumetric content. Starting from a large set of hand‐crafted features, we propose a feature selection process to identify significantly pruned near‐optimal subsets. While a naive approach would require traini...
Conference Paper
Full-text available
Knowledge of users’ affective states can improve their interaction with smartphones by providing more personalized experiences (e.g., search results and news articles). We present an affective state classification model based on data gathered on smartphones in real-world environments. From touch events during keystrokes and the signals from the ine...
Conference Paper
Parametric 3D shape models (e.g., for faces) are heavily utilized in computer graphics and vision applications to provide priors on the observed variability of an object’s geometry. Original models were linear and operated on the entire shape at once. They were later enhanced to provide localized control on different shape parts separately. In deep...
Preprint
Recently, significant progress has been made in learned image and video compression. In particular the usage of Generative Adversarial Networks has lead to impressive results in the low bit rate regime. However, the model size remains an important issue in current state-of-the-art proposals and existing solutions require significant computation eff...
Conference Paper
For several decades, researchers have been advancing techniques for creating and rendering 3D digital faces, where a lot of the effort has gone into geometry and appearance capture, modeling and rendering techniques. This body of research work has largely focused on facial skin, with much less attention devoted to peripheral components like hair, e...
Article
The number of online news articles available nowadays is rapidly increasing. When exploring articles on online news portals, navigation is mostly limited to the most recent ones. The spatial context and the history of topics are not immediately accessible. To support readers in the exploration or research of articles in large datasets, we developed...
Article
We present Heapcraft: an open-source suite of tools for monitoring and improving collaboration in Minecraft. At the core of our system is a data collection and analysis framework for recording gameplay. We collected over 3451 player-hours of game behavior from 908 different players, and performed a general study of online collaboration. To make our...
Article
In order to use computational intelligence for automated narrative synthesis, domain knowledge of the story world must be defined, a task which is currently confined to experts. This paper discusses the benefits and tradeoffs between agent-centric and event-centric approaches towards authoring the domain knowledge of story worlds. In an effort to d...
Conference Paper
Style transfer between images is an artistic application of CNNs, where the 'style' of one image is transferred onto another image while preserving the latter's content. The state of the art in neural style transfer is based on Adaptive Instance Normalization (AdaIN), a technique that transfers the statistical properties of style features to a cont...
Conference Paper
We present a new method for designing high quality denoisers that are robust to varying noise characteristics of input images. Instead of taking a conventional blind denoising approach or relying on explicit noise parameter estimation networks as well as invertible camera imaging pipeline models, we propose a two-stage model that first processes an...
Preprint
Full-text available
DuctTake is a system designed to enable practical compositing of multiple takes of a scene into a single video. Current industry solutions are based around object segmentation, a hard problem that requires extensive manual input and cleanup, making compositing an expensive part of the film-making process. Our method instead composites shots togethe...
Conference Paper
Face models built from 3D face databases are often used in computer vision and graphics tasks such as face reconstruction, replacement, tracking and manipulation. For such tasks, commonly used multi-linear morphable models, which provide semantic control over facial identity and expression, often lack quality and expressivity due to their linear na...
Preprint
Image restoration has seen great progress in the last years thanks to the advances in deep neural networks. Most of these existing techniques are trained using full supervision with suitable image pairs to tackle a specific degradation. However, in a blind setting with unknown degradations this is not possible and a good prior remains crucial. Rece...
Preprint
Full-text available
Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding based approaches that have been established and refined over many decades. However, state-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional l...
Preprint
Full-text available
Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence mode...
Article
Performance capture of expressive subjects, particularly facial performances acquired with high spatial resolution, will inevitably incorporate some fraction of motion that is due to inertial effects and dynamic overshoot due to ballistic motion. This is true in most natural capture environments where the actor is able to move freely during their p...
Article
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attribu...
Conference Paper
Full-text available
Decades of research in psychology on the formal measurement of emotions led to the concept of affective states. Visualizing the measured affective state can be useful in education, as it allows teachers to adapt lessons based on the affective state of students. In the entertainment industry, game mechanics can be adapted based on the boredom and fr...
Conference Paper
Full-text available
Front camera data from tablets used in educational settings offer valuable clues to student behavior, attention, and affec-tive state. Due to the camera's angle of view, the face of the student is partially occluded and skewed. This hinders the ability of experts to adequately capture the learning process and student states. In this paper, we prese...
Preprint
Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to red...
Article
Dynamical systems are commonly used to describe the state of time‐dependent systems. In many engineering and control problems, the state space is high‐dimensional making it difficult to analyze and visualize the behavior of the system for varying input conditions. We present a novel dimensionality reduction technique that is tailored to high‐dimens...
Conference Paper
Facial landmark detection is a fundamental task for many consumer and high-end applications and is almost entirely solved by machine learning methods today. Existing datasets used to train such algorithms are primarily made up of only low resolution images, and current algorithms are limited to inputs of comparable quality and resolution as the tra...
Conference Paper
Full-text available
Gaining awareness of the user's affective states enables smartphones to support enriched interactions that are sensitive to the user's context. To accomplish this on smartphones, we propose a system that analyzes the user's text typing behavior using a semi-supervised deep learning pipeline for predicting affective states measured by valence, arous...
Preprint
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attribu...
Article
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a targe...
Article
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly incr...
Article
Corporate meetings are a crucial part of business activities. While numerous academic papers investigated how to make the scheduling process of meetings faster or even automatic, little work has been done yet to facilitate the retrospective reasoning about how time is spent on meetings. Traditional calendar applications do not allow users to extrac...
Chapter
The problem of explaining complex machine learning models, including Deep Neural Networks, has gained increasing attention over the last few years. While several methods have been proposed to explain network predictions, the definition itself of explanation is still debated. Moreover, only a few attempts to compare explanation methods from a theore...
Article
Statement of problem: Three-dimensional visualization for pretreatment diagnostics and treatment planning is necessary for surgical and prosthetic rehabilitations. The reliability of a novel 3D facial camera is unclear. Purpose: The purpose of this clinical study was to evaluate the reliability of a novel medical facial camera system in capturin...
Preprint
Facial animation is one of the most challenging problems in computer graphics, and it is often solved using linear heuristics like blend-shape rigging. More expressive approaches like physical simulation have emerged, but these methods are very difficult to tune, especially when simulating a real actor's face. We propose to use a simple finite elem...
Article
We present the first method to accurately track the invisible jaw based solely on the visible skin surface, without the need for any markers or augmentation of the actor. As such, the method can readily be integrated with off-the-shelf facial performance capture systems. The core idea is to learn a non-linear mapping from the skin deformation to th...
Conference Paper
Full-text available
The role of affective states in learning has recently attracted considerable attention in education research. The accurate prediction of affective states can help increase the learning gain by incorporating targeted interventions that are capable of adjusting to changes in the individual affective states of students. Until recently, most work on th...
Preprint
Full-text available
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a targe...
Article
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative mode...
Article
Statement of problem. Three-dimensional visualization for pretreatment diagnostics and treat- 77 ment planning is necessary for surgical and prosthetic rehabilitations. The reliability of a novel 3D 78 facial camera is unclear. 79 Purpose. The purpose of this clinical study was to evaluate the reliability of a novel medical facial 80 camera system...
Article
Over the past decades, scientific visualization became a fundamental aspect of modern scientific data analysis. Across all data-intensive research fields, ranging from structural biology to cosmology, data sizes increase rapidly. Dealing with the growing large scale data is one of the top research challenges of this century. For the visual explorat...
Preprint
Full-text available
The problem of explaining the behavior of deep neural networks has gained a lot of attention over the last years. While several attribution methods have been proposed, most come without strong theoretical foundations. This raises the question of whether the resulting attributions are reliable. On the other hand, the literature on cooperative game t...
Preprint
We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two sta...
Technical Report
Full-text available
Facial landmark points capture rigid and non-rigid deformation of faces in a very compact description and are therefore valuable for many different face analysis tasks. For face recognition or different categorisation tasks such as gender, age, ethnicity or expressions, a rough pose nor-malisation is needed in order to apply other algorithms. For 3...
Preprint
We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The result of our factorized graphical model is a well-organized and coherent latent space for data dynamic...
Conference Paper
Multi-variate visualizations of geospatial data often use combinations of different visual cues, such as color and texture. For textures, different point distributions (blue noise, regular grids, etc.) can encode nominal data. In this paper, we study the suitability of point distribution interpolation to encode quantitative information. For the int...
Preprint
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent component analysis, which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the m...
Preprint
Traditional approaches for color propagation in videos rely on some form of matching between consecutive video frames. Using appearance descriptors, colors are then propagated both spatially and temporally. These methods, however, are computationally expensive and do not take advantage of semantic information of the scene. In this work we propose a...
Article
Statement of problem. An evaluation of user satisfaction and image quality of a novel handheld purpose-built mobile camera system for 3-dimensional (3D) facial acquisition is lacking. Purpose. The purpose of this pilot clinical study was to assess and compare the effectiveness between a handheld mobile camera system designed for facial acquisition...
Preprint
Full-text available
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative mode...
Article
We present a framework to manage cerebral aneurysms. Rupture risk evaluation is based on manually extracted descriptors, which is time-consuming. Thus, we provide an automatic solution by considering several questions: How can expert knowledge be integrated? How should meta data be defined? Which interaction techniques are needed for data explorati...
Article
With the widespread use of 3D acquisition devices, there is an increasing need of consolidating captured noisy and sparse point cloud data for accurate representation of the underlying structures. There are numerous algorithms that rely on a variety of assumptions such as local smoothness to tackle this ill‐posed problem. However, such priors lead...
Article
In this work, we present a method to vectorize raster images of line art. Inverting the rasterization procedure is inherently ill‐conditioned, as there exist many possible vector images that could yield the same raster image. However, not all of these vector images are equally useful to the user, especially if performing further edits is desired. W...
Article
Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning approaches that rely on kernels to represent motion can only alleviate these problems to some extent. In...
Conference Paper
Full-text available
The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for such alignment, including Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions ar...
Article
Full-text available
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical c...
Article
Full-text available
Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state...
Conference Paper
PhysicsForests is a fluid-simulation framework that can simulate large scenes with several million particles in real time, including surface generation, foam, rigid-body coupling, and rendering. Instead of solving underlying Navier-Stokes equations, it predicts particle behavior approximately using machine-learning-based regression-forest method th...
Conference Paper
Despite the maturity in solutions for animating expressive virtual characters, innovations realizing the creative intent of story writers have yet to make the same strides. This problem is further exacerbated for interactive narrative content, such as games. The key challenge is to provide an accessible, yet expressive interface for story authoring...
Conference Paper
Painterly stylization is the cornerstone of non-photorealistic rendering. Inspired by the versatility of paint as a physical medium, existing methods target intuitive interfaces that mimic physical brushes, providing artists the ability to intuitively place paint strokes in a digital scene. Other work focuses on physical simulation of the interacti...
Conference Paper
In this paper we present an optimization-based approach for the design of cable-driven kinematic chains and trees. Our system takes as input a hierarchical assembly consisting of rigid links jointed together with hinges. The user also specifies a set of target poses or keyframes using inverse kinematics. Our approach places torsional springs at the...
Article
We present a computational tool for designing compliant mechanisms. Our method takes as input a conventional, rigidly-articulated mechanism defining the topology of the compliant design. This input can be both planar or spatial, and we support a number of common joint types which, whenever possible, are automatically replaced with parameterized fle...
Article
We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms. Inspired by existing path-guiding algorithms, our method learns an approximate representation of the scene's spatio-directional radiance field in an unbiased and iterative manner. To that end, we propose an adaptive spatio-directional hybrid...