Hans-Peter Seidel’s research while affiliated with Max Planck Institute for Informatics and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Data‐driven inverse molecular design (IMD) has attracted significant attention in recent years. Despite the remarkable progress, existing IMD methods lag behind in terms of trustworthiness, as indicated by their misalignment to the ground‐truth function that models the molecular dynamics. Here, TrustMol, an IMD method built to be trustworthy is proposed by inverting a reliable molecular property predictor. TrustMol first constructs a latent space with a novel variational autoencoder (VAE) and trains an ensemble of property predictors to learn the mapping from the latent space to the property space. The training samples for the ensemble are obtained from a new reacquisition method to ensure that the samples are representative of the latent space. To generate a desired molecule, TrustMol optimizes a latent design by minimizing both the predictive error and the uncertainty quantified by the ensemble. As a result, TrustMol achieves state‐of‐the‐art performance in terms of IMD accuracy, and more importantly, it is aligned with the ground‐truth function that indicates trustworthiness.
We propose a multi-class point optimization formulation based on continuous Wasserstein barycenters. Our formulation is designed to handle hundreds to thousands of optimization objectives and comes with a practical optimization scheme. We demonstrate the effectiveness of our framework on various sampling applications like stippling, object placement, and Monte-Carlo integration. We a derive multi-class error bound for perceptual rendering error which can be minimized using our optimization. We provide source code at https://github.com/iribis/filtered-sliced-optimal-transport.
High-fidelity reconstruction of fluids from sparse multiview RGB videos remains a formidable challenge due to the complexity of the underlying physics as well as complex occlusion and lighting in captures. Existing solutions either assume knowledge of obstacles and lighting, or only focus on simple fluid scenes without obstacles or complex lighting, and thus are unsuitable for real-world scenes with unknown lighting or arbitrary obstacles. We present the first method to reconstruct dynamic fluid by leveraging the governing physics (ie, Navier -Stokes equations) in an end-to-end optimization from sparse videos without taking lighting conditions, geometry information, or boundary conditions as input. We provide a continuous spatio-temporal scene representation using neural networks as the ansatz of density and velocity solution functions for fluids as well as the radiance field for static objects. With a hybrid architecture that separates static and dynamic contents, fluid interactions with static obstacles are reconstructed for the first time without additional geometry input or human labeling. By augmenting time-varying neural radiance fields with physics-informed deep learning, our method benefits from the supervision of images and physical priors. To achieve robust optimization from sparse views, we introduced a layer-by-layer growing strategy to progressively increase the network capacity. Using progressively growing models with a new regularization term, we manage to disentangle density-color ambiguity in radiance fields without overfitting. A pretrained density-to-velocity fluid model is leveraged in addition as the data prior to avoid suboptimal velocity which underestimates vorticity but trivially fulfills physical equations. Our method exhibits high-quality results with relaxed constraints and strong flexibility on a representative set of synthetic and real flow captures.
High-dynamic range (HDR) video reconstruction using conventional single-exposure sensors can be achieved by temporally alternating exposures. This, in turn, requires computing exposure alignment which is difficult to achieve due to the exposure differences that notoriously creates problems for moving content, in particular, in larger saturated and dis-occluded regions. An attractive alternative are dual-exposure sensors that capture, in a single-shot, differently exposed and spatially interleaved half-frames, so that they are perfectly spatially and temporally (up to varying motion blur) aligned by construction. In this work, we demonstrate that we successfully compensate for reduced spatial resolution and aliasing in such sensors, and we improve overall the quality and dynamic range of reconstructed HDR video with respect to single-exposure sensors for a given number of alternating exposures. Specifically, we consider low, mid, and high exposures, and we propose that the mid exposure is captured for every frame, and serves as a spatial and temporal reference. We capitalize here on neural networks for denoising, deblurring, and upsampling tasks, so that effectively we obtain two clean, sharp, and full-resolution exposures for every frame, which are then complemented by warping a missing third exposure. High-quality warping is achieved by learning optical flow that merges the individual flows found for each specific exposure. Such flow merging is instrumental in handling saturated/dis-occluded image regions, while dense temporal sampling of mid exposure improves motion quality reproduction between more sparsely sampled exposures. We demonstrate that by capturing only a limited amount of sensor-specific data and a novel use of histograms, instead of common parametric noise statistics, we are able to generate synthetic training data that lead to a better denoising and deblurring quality than can be achieved by existing state-of-the-art methods. As there is not enough high-quality HDR video available, we devise a method to learn from LDR video instead. Our approach compares favorably to several strong baselines, and can boost existing HDR image and video methods when they are re-trained on our data.
In computational design and fabrication, neural networks are becoming important surrogates for bulky forward simulations. A long-standing, intertwined question is that of inverse design: how to compute a design that satisfies a desired target performance? Here, we show that the piecewise linear property, very common in everyday neural networks, allows for an inverse design formulation based on mixed-integer linear programming. Our mixed-integer inverse design uncovers globally optimal or near optimal solutions in a principled manner. Furthermore, our method significantly facilitates emerging, but challenging, combinatorial inverse design tasks, such as material selection. For problems where finding the optimal solution is not desirable or tractable, we develop an efficient yet near-optimal hybrid optimization. Eventually, our method is able to find solutions provably robust to possible fabrication perturbations among multiple designs with similar performances.
Most 3D face reconstruction methods rely on 3D morphable models, which disentangle the space of facial deformations into identity geometry, expressions and skin reflectance. These models are typically learned from a limited number of 3D scans and thus do not generalize well across different identities and expressions. We present the first approach to learn complete 3D models of face identity geometry, albedo and expression just from images and videos. The virtually endless collection of such data, in combination with our self-supervised learning-based approach allows for learning face models that generalize beyond the span of existing approaches. Our network design and loss functions ensure a disentangled parameterization of not only identity and albedo, but also, for the first time, an expression basis. Our method also allows for in-the-wild monocular reconstruction at test time. We show that our learned models better generalize and lead to higher quality image-based reconstructions than existing approaches.
In this paper, we introduce the CPatch, a curved primitive that can be used to construct arbitrary vector graphics. A CPatch is a generalization of a 2D polygon: Any number of curves up to a cubic degree bound a primitive. We show that a CPatch can be rasterized efficiently in a hierarchical manner on the GPU, locally discarding irrelevant portions of the curves. Our rasterizer is fast and scalable, works on all patches in parallel, and does not require any approximations. We show a parallel implementation of our rasterizer, which naturally supports all kinds of color spaces, blending and super‐sampling. Additionally, we show how vector graphics input can efficiently be converted to a CPatch representation, solving challenges like patch self intersections and false inside‐outside classification. Results indicate that our approach is faster than the state‐of‐the‐art, more flexible and could potentially be implemented in hardware.
Multi-focal plane and multi-layered light-field displays are promising solutions for addressing all visual cues observed in the real world. Unfortunately, these devices usually require expensive optimizations to compute a suitable decomposition of the input light field or focal stack to drive individual display layers. Although these methods provide near-correct image reconstruction, a significant computational cost prevents real-time applications. A simple alternative is a linear blending strategy which decomposes a single 2D image using depth information. This method provides real-time performance, but it generates inaccurate results at occlusion boundaries and on glossy surfaces. This paper proposes a perception-based hybrid decomposition technique which combines the advantages of the above strategies and achieves both real-time performance and high-fidelity results. The fundamental idea is to apply expensive optimizations only in regions where it is perceptually superior, e.g., depth discontinuities at the fovea, and fall back to less costly linear blending otherwise. We present a complete, perception-informed analysis and model that locally determine which of the two strategies should be applied. The prediction is later utilized by our new synthesis method which performs the image decomposition. The results are analyzed and validated in user experiments on a custom multi-plane display.
A large number of imaging and computer graphics applications require localized information on the visibility of image distortions. Existing image quality metrics are not suitable for this task as they provide a single quality value per image. Existing visibility metrics produce visual difference maps, and are specifically designed for detecting just noticeable distortions but their predictions are often inaccurate. In this work, we argue that the key reason for this problem is the lack of large image collections with a good coverage of possible distortions that occur in different applications. To address the problem, we collect an extensive dataset of reference and distorted image pairs together with user markings indicating whether distortions are visible or not. We propose a statistical model that is designed for the meaningful interpretation of such data, which is affected by visual search and imprecision of manual marking. We use our dataset for training existing metrics and we demonstrate that their performance significantly improves. We show that our dataset with the proposed statistical model can be used to train a new CNN-based metric, which outperforms the existing solutions. We demonstrate the utility of such a metric in visually lossless JPEG compression, super-resolution and watermarking.
Citations (90)
... Liang et al. [86] proposed a two-stage double-CNN-involved pipeline to segment and identify particles on the sub-pixel level. For flow visualization and velocity field reconstruction, there are some emerging methods: reconstructing 2D time-varying flow fields with particle tracing and Lagrangian representations [87,88], using deep learning to reconstruct vector fields from streamlines [89,90], and with no particles involved but purely applying a hybrid neural network (HyFluid) to process sparse multi-view videos and infer fluid density and velocity fields [91][92][93]. Other than neural network models, transformer models [94,95] are also becoming widely used in the domain of complex flow prediction. ...
... Note that the radiance ranges below the black level and over 1 are covered just in a single exposure EV+1 and EV-1, respectively, while for EV+0, radiance information is clamped on both sides of the range. Dark image regions are also contaminated with sensor noise, whose characteristics may differ between exposures, which makes consistent denoising difficult [Chang et al. 2020;Cogalan et al. 2022;Mustaniemi et al. 2020]. Some camera manufacturers introduce hard clamping at a black-level radiance, assuming that there is no reliable image information below this threshold due to noise. ...
... Originally, approaches used camera viewpoint, 2D keypoints, and masks to predict the 3D shape [24,59] and texture [22,64]. Several works achieved to mitigate the requirement of camera viewpoint and 2D keypoint annotations by exploiting symmetries [12,18,31,60], by adding multi-view viewpoint hypothesis [12], by requiring consistent self-supervised co-part segmentation [31] using [19], by cycle-consistent surface maps [27,28,60], or by temporal consistency for video clips [56,57,68]. Few works even achieve to not require mask annotations, [38,58,66], by being constrained to front views of faces [66], or coarseto-fine learning of texture and shape [38]. ...
... e description of the stroke operator by Gosling et al. (1989) indicates that early on, stroking was implemented by generating a llable region corresponding to the stroked region of a path and then drawing that derived llable region. Other recent path rendering systems explicitly state they take this approach (Dokter et al. 2019;Ganacim et al. 2014;Li et al. 2016). ...
... Multifocal displays address several focal planes simultaneously, forming a volume within which near-correct focus cues can be delivered. Common implementations use stacked display layers [34,55,57], microlens arrays [32], and high-speed projectors with focus adjusting optics [47,50]. However, approaches based on high-speed projections with synchronized optics demand complex setups, while approaches based on microlens arrays commonly suffer from the loss of spatial resolution for presenting multi-view images [51]. ...
... pixels mainly exists in low luminance range, which means they are of little help to viewing experience. Our method is able to recover adequate HDR/WCG volume, meanwhile reasonably enhance the brightness and saturation. 1 1 We also provide conventional PSNR, SSIM, ∆E (color difference [79]) and VDP3 (HDR-VDP-3 [80]), but they mostly represent output's closer value with GT (For example, result both dimmer (e.g. Deep SR-ITM [4]) and more vivid (ours) than GT will have a similar low score.), ...
... While hashing is also used in our framework, we do not need any learning-based approach to generate representations of the merge trees. There is another set of comparison measures (such as those based on histograms [55,56] and the extended branch decomposition [54]). They are not metrics by definition but are simple, intuitive, and easy to compute. ...
... Transformation-invariance Surprisingly, results produced by our approach can turn out to be better than their own supervision, as our method is forced to come up with strategies to detect problems without seeing the reference. This makes it immune to a common issue of many image metrics: misalignment [KRMS16]. Even a simple shift in image content will result in many false positives for classic metrics (Fig. 6). ...
... Procedural models "encapsulate a large variety of shapes into a concise formal description that can be efficiently parametrized" [Krs et al. 2021], which lends them to a variety of tasks including 2D textures and shaders [Cook 1984;Hu et al. 2022a;Perlin 1985;Shi et al. 2020] and virtual world modeling [Prusinkiewicz and Lindenmayer 2004;Smelik et al. 2014;Whiting et al. 2009]. Graph-based models are of particular interest, as they are widely used in practice (e.g., SideFX Houdini, Blender, Adobe Substance Designer), and they are amenable to performance optimization [Boechat et al. 2016] and the intuitive specification of edits and constraints [Krs et al. 2021;Michel and Boubekeur 2021]. Our procedural graph builds on these ideas toward concise, intuitive metamaterial design. ...
... Somewhat more closely related to ours are data-driven and feature-based interpolation methods. These include interpolation based on hand-crafted features [91,125] or on exploring various local shape spaces obtained by analyzing a shape collection [92,238,210]. Such techniques work well if the input shapes are sufficiently similar, but require triangle meshes and dense point-wise correspondences, or a single template that is fitted to all input data to build a statistical model, e.g. ...