Markus Gross

Markus Gross
  • ETH Zurich

About

487
Publications
146,572
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
30,070
Citations
Current institution
ETH Zurich
Additional affiliations
March 1998 - present
Disney Research
Position
  • Managing Director
July 1994 - present
ETH Zurich
Position
  • Founder and Director

Publications

Publications (487)
Article
Full-text available
Recent progress in physics‐based character control has made it possible to learn policies from unstructured motion data. However, it remains challenging to train a single control policy that works with diverse and unseen motions, and can be deployed to real‐world physical robots. In this paper, we propose a two‐stage technique that enables the cont...
Article
Large language models such as GPT-3 and ChatGPT can mimic human-to-human conversation with unprecedented fidelity, which enables many applications such as conversational agents for education and non-player characters in video games. In this work, we investigate the underlying personality structure that a GPT-3-based chatbot expresses during convers...
Chapter
We present a novel pipeline that takes smartphone videos of the intraoral region of newborn cleft patients as input and produces a 3D mesh. The mesh can be used to facilitate the plate treatment of the cleft and support surgery planning. A retrained LoFTR-based method creates an initial sparse point cloud. Next, we utilize our collection of existin...
Article
As governed by personality trait theory, humans tackle problems differently depending on their long-term behavioral characteristics. Computational awareness of personality traits fuels affective computing research, which investigates how to reliably recognize and utilize personality traits. Applications are diverse, including therapy monitoring, le...
Article
Simulation representations of robots have advanced in recent years. Yet, there remain significant sim-to-real gaps because of modeling assumptions and hard-to-model behaviors such as friction. In this paper, we propose to augment common simulation representations with a transformer-inspired architecture, by training a network to predict the true st...
Article
Full-text available
Purpose: Presurgical orthopedic plates are widely used for the treatment of cleft lip and palate, which is the most common craniofacial birth defect. For the traditional plate fabrication, an impression is taken under airway-endangering conditions, which recent digital alternatives overcome via intraoral scanners. However, these alternatives deman...
Preprint
Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-bo...
Article
Full-text available
We propose a 3D+time framework for modeling dynamic sequences of 3D facial shapes, representing realistic non‐rigid motion during a performance. Our work extends neural 3D morphable models by learning a motion manifold using a transformer architecture. More specifically, we derive a novel transformer‐based autoencoder that can model and synthesize...
Chapter
Full-text available
Augmented reality (AR) technologies can enhance the user’s experience of visiting attractions, shops, and restaurants by using AR-based virtual elements and additional information about the places they are visiting. In this work, we transform the city landscape or iconic buildings into a unique experience by bringing iconic characters onto the buil...
Chapter
Full-text available
Recent advances in augmented reality have enabled new ways of generating and presenting item recommendations. In tourism, AR applications can, for example, enhance points of interests (POIs) with virtual elements in AR and provide tourists with personalized recommendations for places to visit. In this paper, we present our prototype: a touristic AR...
Article
Photorealistic digital re-aging of faces in video is becoming increasingly common in entertainment and advertising. But the predominant 2D painting workflow often requires frame-by-frame manual work that can take days to accomplish, even by skilled artists. Although research on facial image re-aging has attempted to automate and solve this problem,...
Conference Paper
We propose a 3D+time framework for modeling dynamic sequences of 3D facial shapes, representing realistic non-rigid motion during a performance. Our work extends neural 3D morphable models by learning a motion manifold using a transformer architecture. More specifically, we derive a novel transformer-based autoencoder that can model and synthesize...
Article
Full-text available
We propose a method for constructing feature sets that significantly improve the quality of neural denoisers for Monte Carlo renderings with volumetric content. Starting from a large set of hand‐crafted features, we propose a feature selection process to identify significantly pruned near‐optimal subsets. While a naive approach would require traini...
Article
Active soft bodies can affect their shape through an internal actuation mechanism that induces a deformation. Similar to recent work, this paper utilizes a differentiable, quasi-static, and physics-based simulation layer to optimize for actuation signals parameterized by neural networks. Our key contribution is a general and implicit formulation to...
Article
Generating realistic facial animation for CG characters and digital doubles is one of the hardest tasks in animation. A typical production workflow involves capturing the performance of a real actor using mo-cap technology, and transferring the captured motion to the target digital character. This process, known as retargeting , has been used for o...
Conference Paper
Full-text available
Knowledge of users’ affective states can improve their interaction with smartphones by providing more personalized experiences (e.g., search results and news articles). We present an affective state classification model based on data gathered on smartphones in real-world environments. From touch events during keystrokes and the signals from the ine...
Conference Paper
Parametric 3D shape models (e.g., for faces) are heavily utilized in computer graphics and vision applications to provide priors on the observed variability of an object’s geometry. Original models were linear and operated on the entire shape at once. They were later enhanced to provide localized control on different shape parts separately. In deep...
Preprint
Recently, significant progress has been made in learned image and video compression. In particular the usage of Generative Adversarial Networks has lead to impressive results in the low bit rate regime. However, the model size remains an important issue in current state-of-the-art proposals and existing solutions require significant computation eff...
Conference Paper
For several decades, researchers have been advancing techniques for creating and rendering 3D digital faces, where a lot of the effort has gone into geometry and appearance capture, modeling and rendering techniques. This body of research work has largely focused on facial skin, with much less attention devoted to peripheral components like hair, e...
Article
The number of online news articles available nowadays is rapidly increasing. When exploring articles on online news portals, navigation is mostly limited to the most recent ones. The spatial context and the history of topics are not immediately accessible. To support readers in the exploration or research of articles in large datasets, we developed...
Article
We present Heapcraft: an open-source suite of tools for monitoring and improving collaboration in Minecraft. At the core of our system is a data collection and analysis framework for recording gameplay. We collected over 3451 player-hours of game behavior from 908 different players, and performed a general study of online collaboration. To make our...
Article
In order to use computational intelligence for automated narrative synthesis, domain knowledge of the story world must be defined, a task which is currently confined to experts. This paper discusses the benefits and tradeoffs between agent-centric and event-centric approaches towards authoring the domain knowledge of story worlds. In an effort to d...
Conference Paper
Style transfer between images is an artistic application of CNNs, where the 'style' of one image is transferred onto another image while preserving the latter's content. The state of the art in neural style transfer is based on Adaptive Instance Normalization (AdaIN), a technique that transfers the statistical properties of style features to a cont...
Conference Paper
We present a new method for designing high quality denoisers that are robust to varying noise characteristics of input images. Instead of taking a conventional blind denoising approach or relying on explicit noise parameter estimation networks as well as invertible camera imaging pipeline models, we propose a two-stage model that first processes an...
Preprint
Full-text available
DuctTake is a system designed to enable practical compositing of multiple takes of a scene into a single video. Current industry solutions are based around object segmentation, a hard problem that requires extensive manual input and cleanup, making compositing an expensive part of the film-making process. Our method instead composites shots togethe...
Conference Paper
Face models built from 3D face databases are often used in computer vision and graphics tasks such as face reconstruction, replacement, tracking and manipulation. For such tasks, commonly used multi-linear morphable models, which provide semantic control over facial identity and expression, often lack quality and expressivity due to their linear na...
Preprint
Image restoration has seen great progress in the last years thanks to the advances in deep neural networks. Most of these existing techniques are trained using full supervision with suitable image pairs to tackle a specific degradation. However, in a blind setting with unknown degradations this is not possible and a good prior remains crucial. Rece...
Preprint
Full-text available
Deep learning based image compression has recently witnessed exciting progress and in some cases even managed to surpass transform coding based approaches that have been established and refined over many decades. However, state-of-the-art solutions for deep image compression typically employ autoencoders which map the input to a lower dimensional l...
Preprint
Full-text available
Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence mode...
Article
Full-text available
Dynamical systems are commonly used to describe the state of time‐dependent systems. In many engineering and control problems, the state space is high‐dimensional making it difficult to analyze and visualize the behavior of the system for varying input conditions. We present a novel dimensionality reduction technique that is tailored to high‐dimens...
Article
Performance capture of expressive subjects, particularly facial performances acquired with high spatial resolution, will inevitably incorporate some fraction of motion that is due to inertial effects and dynamic overshoot due to ballistic motion. This is true in most natural capture environments where the actor is able to move freely during their p...
Article
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attribu...
Conference Paper
Full-text available
Decades of research in psychology on the formal measurement of emotions led to the concept of affective states. Visualizing the measured affective state can be useful in education, as it allows teachers to adapt lessons based on the affective state of students. In the entertainment industry, game mechanics can be adapted based on the boredom and fr...
Conference Paper
Full-text available
Front camera data from tablets used in educational settings offer valuable clues to student behavior, attention, and affec-tive state. Due to the camera's angle of view, the face of the student is partially occluded and skewed. This hinders the ability of experts to adequately capture the learning process and student states. In this paper, we prese...
Preprint
Structured pruning is a well-known technique to reduce the storage size and inference cost of neural networks. The usual pruning pipeline consists of ranking the network internal filters and activations with respect to their contributions to the network performance, removing the units with the lowest contribution, and fine-tuning the network to red...
Conference Paper
Facial landmark detection is a fundamental task for many consumer and high-end applications and is almost entirely solved by machine learning methods today. Existing datasets used to train such algorithms are primarily made up of only low resolution images, and current algorithms are limited to inputs of comparable quality and resolution as the tra...
Conference Paper
Full-text available
Gaining awareness of the user's affective states enables smartphones to support enriched interactions that are sensitive to the user's context. To accomplish this on smartphones, we propose a system that analyzes the user's text typing behavior using a semi-supervised deep learning pipeline for predicting affective states measured by valence, arous...
Preprint
Artistically controlling the shape, motion and appearance of fluid simulations pose major challenges in visual effects production. In this paper, we present a neural style transfer approach from images to 3D fluids formulated in a Lagrangian viewpoint. Using particles for style transfer has unique benefits compared to grid-based techniques. Attribu...
Article
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a targe...
Article
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly incr...
Article
Corporate meetings are a crucial part of business activities. While numerous academic papers investigated how to make the scheduling process of meetings faster or even automatic, little work has been done yet to facilitate the retrospective reasoning about how time is spent on meetings. Traditional calendar applications do not allow users to extrac...
Chapter
The problem of explaining complex machine learning models, including Deep Neural Networks, has gained increasing attention over the last few years. While several methods have been proposed to explain network predictions, the definition itself of explanation is still debated. Moreover, only a few attempts to compare explanation methods from a theore...
Article
Statement of problem: Three-dimensional visualization for pretreatment diagnostics and treatment planning is necessary for surgical and prosthetic rehabilitations. The reliability of a novel 3D facial camera is unclear. Purpose: The purpose of this clinical study was to evaluate the reliability of a novel medical facial camera system in capturin...
Preprint
Facial animation is one of the most challenging problems in computer graphics, and it is often solved using linear heuristics like blend-shape rigging. More expressive approaches like physical simulation have emerged, but these methods are very difficult to tune, especially when simulating a real actor's face. We propose to use a simple finite elem...
Article
We present the first method to accurately track the invisible jaw based solely on the visible skin surface, without the need for any markers or augmentation of the actor. As such, the method can readily be integrated with off-the-shelf facial performance capture systems. The core idea is to learn a non-linear mapping from the skin deformation to th...
Conference Paper
Full-text available
The role of affective states in learning has recently attracted considerable attention in education research. The accurate prediction of affective states can help increase the learning gain by incorporating targeted interventions that are capable of adjusting to changes in the individual affective states of students. Until recently, most work on th...
Article
Full-text available
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative mode...
Preprint
Full-text available
Artistically controlling fluids has always been a challenging task. Optimization techniques rely on approximating simulation states towards target velocity or density field configurations, which are often handcrafted by artists to indirectly control smoke dynamics. Patch synthesis techniques transfer image textures or simulation features to a targe...
Article
Statement of problem. Three-dimensional visualization for pretreatment diagnostics and treat- 77 ment planning is necessary for surgical and prosthetic rehabilitations. The reliability of a novel 3D 78 facial camera is unclear. 79 Purpose. The purpose of this clinical study was to evaluate the reliability of a novel medical facial 80 camera system...
Article
Over the past decades, scientific visualization became a fundamental aspect of modern scientific data analysis. Across all data-intensive research fields, ranging from structural biology to cosmology, data sizes increase rapidly. Dealing with the growing large scale data is one of the top research challenges of this century. For the visual explorat...
Preprint
Full-text available
The problem of explaining the behavior of deep neural networks has gained a lot of attention over the last years. While several attribution methods have been proposed, most come without strong theoretical foundations. This raises the question of whether the resulting attributions are reliable. On the other hand, the literature on cooperative game t...
Preprint
We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two sta...
Technical Report
Full-text available
Facial landmark points capture rigid and non-rigid deformation of faces in a very compact description and are therefore valuable for many different face analysis tasks. For face recognition or different categorisation tasks such as gender, age, ethnicity or expressions, a rough pose nor-malisation is needed in order to apply other algorithms. For 3...
Preprint
We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The result of our factorized graphical model is a well-organized and coherent latent space for data dynamic...
Conference Paper
Multi-variate visualizations of geospatial data often use combinations of different visual cues, such as color and texture. For textures, different point distributions (blue noise, regular grids, etc.) can encode nominal data. In this paper, we study the suitability of point distribution interpolation to encode quantitative information. For the int...
Preprint
We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent component analysis, which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the m...
Preprint
Traditional approaches for color propagation in videos rely on some form of matching between consecutive video frames. Using appearance descriptors, colors are then propagated both spatially and temporally. These methods, however, are computationally expensive and do not take advantage of semantic information of the scene. In this work we propose a...
Article
Statement of problem. An evaluation of user satisfaction and image quality of a novel handheld purpose-built mobile camera system for 3-dimensional (3D) facial acquisition is lacking. Purpose. The purpose of this pilot clinical study was to assess and compare the effectiveness between a handheld mobile camera system designed for facial acquisition...
Preprint
Full-text available
This paper presents a novel generative model to synthesize fluid simulations from a set of reduced parameters. A convolutional neural network is trained on a collection of discrete, parameterizable fluid simulation velocity fields. Due to the capability of deep learning architectures to learn representative features of the data, our generative mode...
Article
We present a framework to manage cerebral aneurysms. Rupture risk evaluation is based on manually extracted descriptors, which is time-consuming. Thus, we provide an automatic solution by considering several questions: How can expert knowledge be integrated? How should meta data be defined? Which interaction techniques are needed for data explorati...
Article
With the widespread use of 3D acquisition devices, there is an increasing need of consolidating captured noisy and sparse point cloud data for accurate representation of the underlying structures. There are numerous algorithms that rely on a variety of assumptions such as local smoothness to tackle this ill‐posed problem. However, such priors lead...
Article
In this work, we present a method to vectorize raster images of line art. Inverting the rasterization procedure is inherently ill‐conditioned, as there exist many possible vector images that could yield the same raster image. However, not all of these vector images are equally useful to the user, especially if performing further edits is desired. W...
Article
Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning approaches that rely on kernels to represent motion can only alleviate these problems to some extent. In...
Conference Paper
Full-text available
The alignment of heterogeneous sequential data (video to text) is an important and challenging problem. Standard techniques for such alignment, including Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from inherent drawbacks. Mainly, the Markov assumption implies that, given the immediate past, future alignment decisions ar...
Article
Full-text available
Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical c...
Article
Full-text available
Understanding the flow of information in Deep Neural Networks is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, only few attempts to analyze them from a theoretical perspective have been made in the past. In this work we analyze various state...
Conference Paper
PhysicsForests is a fluid-simulation framework that can simulate large scenes with several million particles in real time, including surface generation, foam, rigid-body coupling, and rendering. Instead of solving underlying Navier-Stokes equations, it predicts particle behavior approximately using machine-learning-based regression-forest method th...
Conference Paper
Despite the maturity in solutions for animating expressive virtual characters, innovations realizing the creative intent of story writers have yet to make the same strides. This problem is further exacerbated for interactive narrative content, such as games. The key challenge is to provide an accessible, yet expressive interface for story authoring...
Conference Paper
Painterly stylization is the cornerstone of non-photorealistic rendering. Inspired by the versatility of paint as a physical medium, existing methods target intuitive interfaces that mimic physical brushes, providing artists the ability to intuitively place paint strokes in a digital scene. Other work focuses on physical simulation of the interacti...
Conference Paper
In this paper we present an optimization-based approach for the design of cable-driven kinematic chains and trees. Our system takes as input a hierarchical assembly consisting of rigid links jointed together with hinges. The user also specifies a set of target poses or keyframes using inverse kinematics. Our approach places torsional springs at the...
Article
We present a computational tool for designing compliant mechanisms. Our method takes as input a conventional, rigidly-articulated mechanism defining the topology of the compliant design. This input can be both planar or spatial, and we support a number of common joint types which, whenever possible, are automatically replaced with parameterized fle...
Article
We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms. Inspired by existing path-guiding algorithms, our method learns an approximate representation of the scene's spatio-directional radiance field in an unbiased and iterative manner. To that end, we propose an adaptive spatio-directional hybrid...
Conference Paper
The strong interest children show for mobile robots makes these devices potentially powerful to teach programming. Moreover, the tangibility of physical objects and the sociability of interacting with them are added benefits. A key skill that novices in programming have to acquire is the ability to mentally trace program execution. However, because...
Conference Paper
We show how the novel use of a semantic representation based on Osgood’s semantic differential scales can lead to effective features in predicting short- and long-term learning in students using a vocabulary learning system. Previous studies in students’ intermediate knowledge states during vocabulary acquisition did not provide much information on...
Article
Intelligent tutoring systems adapt the curriculum to the needs of the individual student. Therefore, an accurate representation and prediction of student knowledge is essential. Bayesian Knowledge Tracing (BKT) is a popular approach for student modeling. The structure of BKT models, however, makes it impossible to represent the hierarchy and relati...
Article
Full-text available
Digital restoration of film content that has historical value is crucial for the preservation of cultural heritage. Also, digital restoration is not only a relevant application area of various video processing technologies that have been developed in computer graphics literature but also involves a multitude of unresolved research challenges. Curre...
Article
Corrective lenses introduce distortion caused by the refraction effect, which changes the wear’s appearance. To give users a more realistic experience, a virtual try-on system for prescription eyeglasses modifies an input video and virtually inserts prescription eyeglasses, producing an output similar to a virtual mirror. The proposed system genera...
Conference Paper
Most head-mounted displays for Virtual Reality (VR) are designed for users with perfect eyesight. Wearing prescription eyeglasses inside such a headset can be uncomfortable, or even impossible if the glasses do not fit (Fig. 1, left). While some headsets offer manual focus adjustment, they need to be manually adjusted for each user through trial an...

Network

Cited By