Hans-Peter Seidel’s research while affiliated with Max Planck Institute for Informatics and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (218)


Surrogate‐NFP misalignment. The IMD's surrogate predicts that A and B are good matches (low surrogate error) for the given target properties, while C is a poor match. However, when put through the NFP, A and B are in fact poor matches (high NFP error), while C is a good match.
The framework of TrustMol. a) Existing surrogate‐based IMD often finds solutions in high‐uncertainty regions that are far away from the training distribution, in which the surrogate predictions are most unreliable. This could lead to molecules that are invalid or have high NFP‐error. TrustMol directs the IMD process into low‐uncertainty regions where the surrogate can be trusted. b) Improvement in the forward modeling comes from the SGP‐VAE, which encourages similar latents to exhibit similar properties. Moreover, the surrogate model is trained with latent‐property pairs that are representative of the learned latent space. c) During inversion, TrustMol optimizes a latent design by minimizing the predicted surrogate error and the epistemic uncertainty. The optimal latent design will then be decoded back into SELFIES by the pretrained SGP‐VAE decoder.
The architecture for the latent‐to‐property subnetwork. A (x, y) block represents an nn.Linear layer with an input dimensionality of x and an output dimensionality of y.
Additional regularizations can be easily incorporated into TrustMol. Here, we add molecular mass to the optimization objectives, penalizing molecular designs with high masses. We can see that the distribution of the generated molecular designs shifts toward molecules with lower molecular mass.
Visualization of the hypervolume of MAE for LIMO and TrustMol. We can clearly see the smaller space covered by the hypervolume of TrustMol.

+2

Trustworthy Inverse Molecular Design via Alignment with Molecular Dynamics
  • Article
  • Full-text available

May 2025

·

3 Reads

Kevin Tirta Wijaya

·

·

Hans‐Peter Seidel

·

Vahid Babaei

Data‐driven inverse molecular design (IMD) has attracted significant attention in recent years. Despite the remarkable progress, existing IMD methods lag behind in terms of trustworthiness, as indicated by their misalignment to the ground‐truth function that models the molecular dynamics. Here, TrustMol, an IMD method built to be trustworthy is proposed by inverting a reliable molecular property predictor. TrustMol first constructs a latent space with a novel variational autoencoder (VAE) and trains an ensemble of property predictors to learn the mapping from the latent space to the property space. The training samples for the ensemble are obtained from a new reacquisition method to ensure that the samples are representative of the latent space. To generate a desired molecule, TrustMol optimizes a latent design by minimizing both the predictive error and the uncertainty quantified by the ensemble. As a result, TrustMol achieves state‐of‐the‐art performance in terms of IMD accuracy, and more importantly, it is aligned with the ground‐truth function that indicates trustworthiness.

Download

Fig. 2. (a) A classical 3-class example, where the red and blue point subsets represent one uniform-density objective (class) each and their union is another objective, all three with equal optimization priority. (b) We represent this three-objective problem using two staircase functions, one per color, defined on an extra dimension (e.g., the point indices). The overlap between the functions implicitly specifies the third (union) objective.
Fig. 6. Illustration of image synthesis where the grey box represents the sampling space, i.e. the unit hypercube H 2+í µí±‘ ; the horizontal axis represents the image subspace where reconstruction from the samples í µí±‹ is performed. (a) When using a box reconstruction kernel í µí±Ÿ , the sample sets estimating different pixels are disjoint. (b) A Gaussian kernel introduces overlaps, making each sample contribute to multiple pixel estimates. (c) The human visual system applies additional filtering on the reconstructed image with a generally wider kernel í µí±. The convolution í µí± * í µí±Ÿ acts as an effective reconstruction kernel for the perceived image, and introduces even more overlaps.
Scalable multi-class sampling via filtered sliced optimal transport

November 2022

·

42 Reads

We propose a multi-class point optimization formulation based on continuous Wasserstein barycenters. Our formulation is designed to handle hundreds to thousands of optimization objectives and comes with a practical optimization scheme. We demonstrate the effectiveness of our framework on various sampling applications like stippling, object placement, and Monte-Carlo integration. We a derive multi-class error bound for perceptual rendering error which can be minimized using our optimization. We provide source code at https://github.com/iribis/filtered-sliced-optimal-transport.


Physics Informed Neural Fields for Smoke Reconstruction with Sparse Data

June 2022

·

204 Reads

·

31 Citations

ACM Transactions on Graphics

·

Lingjie Liu

·

Quan Zheng

·

[...]

·

Rhaleb Zayer

High-fidelity reconstruction of fluids from sparse multiview RGB videos remains a formidable challenge due to the complexity of the underlying physics as well as complex occlusion and lighting in captures. Existing solutions either assume knowledge of obstacles and lighting, or only focus on simple fluid scenes without obstacles or complex lighting, and thus are unsuitable for real-world scenes with unknown lighting or arbitrary obstacles. We present the first method to reconstruct dynamic fluid by leveraging the governing physics (ie, Navier -Stokes equations) in an end-to-end optimization from sparse videos without taking lighting conditions, geometry information, or boundary conditions as input. We provide a continuous spatio-temporal scene representation using neural networks as the ansatz of density and velocity solution functions for fluids as well as the radiance field for static objects. With a hybrid architecture that separates static and dynamic contents, fluid interactions with static obstacles are reconstructed for the first time without additional geometry input or human labeling. By augmenting time-varying neural radiance fields with physics-informed deep learning, our method benefits from the supervision of images and physical priors. To achieve robust optimization from sparse views, we introduced a layer-by-layer growing strategy to progressively increase the network capacity. Using progressively growing models with a new regularization term, we manage to disentangle density-color ambiguity in radiance fields without overfitting. A pretrained density-to-velocity fluid model is leveraged in addition as the data prior to avoid suboptimal velocity which underestimates vorticity but trivially fulfills physical equations. Our method exhibits high-quality results with relaxed constraints and strong flexibility on a representative set of synthetic and real flow captures.


Learning HDR video reconstruction for dual-exposure sensors with temporally-alternating exposures

April 2022

·

45 Reads

·

13 Citations

Computers & Graphics

High-dynamic range (HDR) video reconstruction using conventional single-exposure sensors can be achieved by temporally alternating exposures. This, in turn, requires computing exposure alignment which is difficult to achieve due to the exposure differences that notoriously creates problems for moving content, in particular, in larger saturated and dis-occluded regions. An attractive alternative are dual-exposure sensors that capture, in a single-shot, differently exposed and spatially interleaved half-frames, so that they are perfectly spatially and temporally (up to varying motion blur) aligned by construction. In this work, we demonstrate that we successfully compensate for reduced spatial resolution and aliasing in such sensors, and we improve overall the quality and dynamic range of reconstructed HDR video with respect to single-exposure sensors for a given number of alternating exposures. Specifically, we consider low, mid, and high exposures, and we propose that the mid exposure is captured for every frame, and serves as a spatial and temporal reference. We capitalize here on neural networks for denoising, deblurring, and upsampling tasks, so that effectively we obtain two clean, sharp, and full-resolution exposures for every frame, which are then complemented by warping a missing third exposure. High-quality warping is achieved by learning optical flow that merges the individual flows found for each specific exposure. Such flow merging is instrumental in handling saturated/dis-occluded image regions, while dense temporal sampling of mid exposure improves motion quality reproduction between more sparsely sampled exposures. We demonstrate that by capturing only a limited amount of sensor-specific data and a novel use of histograms, instead of common parametric noise statistics, we are able to generate synthetic training data that lead to a better denoising and deblurring quality than can be achieved by existing state-of-the-art methods. As there is not enough high-quality HDR video available, we devise a method to learn from LDR video instead. Our approach compares favorably to several strong baselines, and can boost existing HDR image and video methods when they are re-trained on our data.


Mixed Integer Neural Inverse Design

September 2021

·

34 Reads

In computational design and fabrication, neural networks are becoming important surrogates for bulky forward simulations. A long-standing, intertwined question is that of inverse design: how to compute a design that satisfies a desired target performance? Here, we show that the piecewise linear property, very common in everyday neural networks, allows for an inverse design formulation based on mixed-integer linear programming. Our mixed-integer inverse design uncovers globally optimal or near optimal solutions in a principled manner. Furthermore, our method significantly facilitates emerging, but challenging, combinatorial inverse design tasks, such as material selection. For problems where finding the optimal solution is not desirable or tractable, we develop an efficient yet near-optimal hybrid optimization. Eventually, our method is able to find solutions provably robust to possible fabrication perturbations among multiple designs with similar performances.



Learning Complete 3D Morphable Face Models from Images and Videos

October 2020

·

72 Reads

Most 3D face reconstruction methods rely on 3D morphable models, which disentangle the space of facial deformations into identity geometry, expressions and skin reflectance. These models are typically learned from a limited number of 3D scans and thus do not generalize well across different identities and expressions. We present the first approach to learn complete 3D models of face identity geometry, albedo and expression just from images and videos. The virtually endless collection of such data, in combination with our self-supervised learning-based approach allows for learning face models that generalize beyond the span of existing approaches. Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis. Our method also allows for in-the-wild monocular reconstruction at test time. We show that our learned models better generalize and lead to higher quality image-based reconstructions than existing approaches.


Hierarchical Rasterization of Curved Primitives for Vector Graphics Rendering on the GPU

June 2019

·

206 Reads

·

10 Citations

In this paper, we introduce the CPatch, a curved primitive that can be used to construct arbitrary vector graphics. A CPatch is a generalization of a 2D polygon: Any number of curves up to a cubic degree bound a primitive. We show that a CPatch can be rasterized efficiently in a hierarchical manner on the GPU, locally discarding irrelevant portions of the curves. Our rasterizer is fast and scalable, works on all patches in parallel, and does not require any approximations. We show a parallel implementation of our rasterizer, which naturally supports all kinds of color spaces, blending and super‐sampling. Additionally, we show how vector graphics input can efficiently be converted to a CPatch representation, solving challenges like patch self intersections and false inside‐outside classification. Results indicate that our approach is faster than the state‐of‐the‐art, more flexible and could potentially be implemented in hardware.


A Perception-driven Hybrid Decomposition for Multi-layer Accommodative Displays

February 2019

·

71 Reads

·

10 Citations

IEEE Transactions on Visualization and Computer Graphics

Multi-focal plane and multi-layered light-field displays are promising solutions for addressing all visual cues observed in the real world. Unfortunately, these devices usually require expensive optimizations to compute a suitable decomposition of the input light field or focal stack to drive individual display layers. Although these methods provide near-correct image reconstruction, a significant computational cost prevents real-time applications. A simple alternative is a linear blending strategy which decomposes a single 2D image using depth information. This method provides real-time performance, but it generates inaccurate results at occlusion boundaries and on glossy surfaces. This paper proposes a perception-based hybrid decomposition technique which combines the advantages of the above strategies and achieves both real-time performance and high-fidelity results. The fundamental idea is to apply expensive optimizations only in regions where it is perceptually superior, e.g., depth discontinuities at the fovea, and fall back to less costly linear blending otherwise. We present a complete, perception-informed analysis and model that locally determine which of the two strategies should be applied. The prediction is later utilized by our new synthesis method which performs the image decomposition. The results are analyzed and validated in user experiments on a custom multi-plane display.


Dataset and metrics for predicting local visible differences

August 2018

·

34 Reads

·

5 Citations

ACM Transactions on Graphics

A large number of imaging and computer graphics applications require localized information on the visibility of image distortions. Existing image quality metrics are not suitable for this task as they provide a single quality value per image. Existing visibility metrics produce visual difference maps, and are specifically designed for detecting just noticeable distortions but their predictions are often inaccurate. In this work, we argue that the key reason for this problem is the lack of large image collections with a good coverage of possible distortions that occur in different applications. To address the problem, we collect an extensive dataset of reference and distorted image pairs together with user markings indicating whether distortions are visible or not. We propose a statistical model that is designed for the meaningful interpretation of such data, which is affected by visual search and imprecision of manual marking. We use our dataset for training existing metrics and we demonstrate that their performance significantly improves. We show that our dataset with the proposed statistical model can be used to train a new CNN-based metric, which outperforms the existing solutions. We demonstrate the utility of such a metric in visually lossless JPEG compression, super-resolution and watermarking.


Citations (90)


... Liang et al. [86] proposed a two-stage double-CNN-involved pipeline to segment and identify particles on the sub-pixel level. For flow visualization and velocity field reconstruction, there are some emerging methods: reconstructing 2D time-varying flow fields with particle tracing and Lagrangian representations [87,88], using deep learning to reconstruct vector fields from streamlines [89,90], and with no particles involved but purely applying a hybrid neural network (HyFluid) to process sparse multi-view videos and infer fluid density and velocity fields [91][92][93]. Other than neural network models, transformer models [94,95] are also becoming widely used in the domain of complex flow prediction. ...

Reference:

Micro-Scale Particle Tracking: From Conventional to Data-Driven Methods
Physics Informed Neural Fields for Smoke Reconstruction with Sparse Data

ACM Transactions on Graphics

... Note that the radiance ranges below the black level and over 1 are covered just in a single exposure EV+1 and EV-1, respectively, while for EV+0, radiance information is clamped on both sides of the range. Dark image regions are also contaminated with sensor noise, whose characteristics may differ between exposures, which makes consistent denoising difficult [Chang et al. 2020;Cogalan et al. 2022;Mustaniemi et al. 2020]. Some camera manufacturers introduce hard clamping at a black-level radiance, assuming that there is no reliable image information below this threshold due to noise. ...

Learning HDR video reconstruction for dual-exposure sensors with temporally-alternating exposures
  • Citing Article
  • April 2022

Computers & Graphics

... Originally, approaches used camera viewpoint, 2D keypoints, and masks to predict the 3D shape [24,59] and texture [22,64]. Several works achieved to mitigate the requirement of camera viewpoint and 2D keypoint annotations by exploiting symmetries [12,18,31,60], by adding multi-view viewpoint hypothesis [12], by requiring consistent self-supervised co-part segmentation [31] using [19], by cycle-consistent surface maps [27,28,60], or by temporal consistency for video clips [56,57,68]. Few works even achieve to not require mask annotations, [38,58,66], by being constrained to front views of faces [66], or coarseto-fine learning of texture and shape [38]. ...

Learning Complete 3D Morphable Face Models from Images and Videos
  • Citing Conference Paper
  • June 2021

... e description of the stroke operator by Gosling et al. (1989) indicates that early on, stroking was implemented by generating a llable region corresponding to the stroked region of a path and then drawing that derived llable region. Other recent path rendering systems explicitly state they take this approach (Dokter et al. 2019;Ganacim et al. 2014;Li et al. 2016). ...

Hierarchical Rasterization of Curved Primitives for Vector Graphics Rendering on the GPU

... Multifocal displays address several focal planes simultaneously, forming a volume within which near-correct focus cues can be delivered. Common implementations use stacked display layers [34,55,57], microlens arrays [32], and high-speed projectors with focus adjusting optics [47,50]. However, approaches based on high-speed projections with synchronized optics demand complex setups, while approaches based on microlens arrays commonly suffer from the loss of spatial resolution for presenting multi-view images [51]. ...

A Perception-driven Hybrid Decomposition for Multi-layer Accommodative Displays
  • Citing Article
  • February 2019

IEEE Transactions on Visualization and Computer Graphics

... pixels mainly exists in low luminance range, which means they are of little help to viewing experience. Our method is able to recover adequate HDR/WCG volume, meanwhile reasonably enhance the brightness and saturation. 1 1 We also provide conventional PSNR, SSIM, ∆E (color difference [79]) and VDP3 (HDR-VDP-3 [80]), but they mostly represent output's closer value with GT (For example, result both dimmer (e.g. Deep SR-ITM [4]) and more vivid (ours) than GT will have a similar low score.), ...

Dataset and metrics for predicting local visible differences
  • Citing Article
  • August 2018

ACM Transactions on Graphics

... While hashing is also used in our framework, we do not need any learning-based approach to generate representations of the merge trees. There is another set of comparison measures (such as those based on histograms [55,56] and the extended branch decomposition [54]). They are not metrics by definition but are simple, intuitive, and easy to compute. ...

Fast Similarity Search in Scalar Fields using Merging Histograms

Mathematics and Visualization

... Transformation-invariance Surprisingly, results produced by our approach can turn out to be better than their own supervision, as our method is forced to come up with strategies to detect problems without seeing the reference. This makes it immune to a common issue of many image metrics: misalignment [KRMS16]. Even a simple shift in image content will result in many false positives for classic metrics (Fig. 6). ...

Transformation-aware perceptual image metric
  • Citing Article
  • September 2016

Journal of Electronic Imaging

... Procedural models "encapsulate a large variety of shapes into a concise formal description that can be efficiently parametrized" [Krs et al. 2021], which lends them to a variety of tasks including 2D textures and shaders [Cook 1984;Hu et al. 2022a;Perlin 1985;Shi et al. 2020] and virtual world modeling [Prusinkiewicz and Lindenmayer 2004;Smelik et al. 2014;Whiting et al. 2009]. Graph-based models are of particular interest, as they are widely used in practice (e.g., SideFX Houdini, Blender, Adobe Substance Designer), and they are amenable to performance optimization [Boechat et al. 2016] and the intuitive specification of edits and constraints [Krs et al. 2021;Michel and Boubekeur 2021]. Our procedural graph builds on these ideas toward concise, intuitive metamaterial design. ...

Representing and Scheduling Procedural Generation using Operator Graphs

ACM Transactions on Graphics

... Somewhat more closely related to ours are data-driven and feature-based interpolation methods. These include interpolation based on hand-crafted features [91,125] or on exploring various local shape spaces obtained by analyzing a shape collection [92,238,210]. Such techniques work well if the input shapes are sufficiently similar, but require triangle meshes and dense point-wise correspondences, or a single template that is fitted to all input data to build a statistical model, e.g. ...

Optimized Subspaces for Deformation-Based Modeling and Shape Interpolation
  • Citing Article
  • May 2016

Computers & Graphics