• Home
  • University of Bologna
  • "Guglielmo Marconi" Department of Electrical, Electronic and Information Engineering DEI
  • Luigi Di Stefano
Luigi Di Stefano

Luigi Di Stefano
University of Bologna | UNIBO · "Guglielmo Marconi" Department of Electrical, Electronic and Information Engineering DEI

About

220
Publications
56,785
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,380
Citations

Publications

Publications (220)
Article
Full-text available
In this paper, we investigate the effectiveness of shape completion neural networks as clinical aids in maxillofacial surgery planning. We present a pipeline to apply shape completion networks to automatically reconstruct complete eumorphic 3D meshes starting from a partial input mesh, easily obtained from CT data routinely acquired for surgery pla...
Preprint
Full-text available
Motivated by efficiency requirements, most anomaly detection and segmentation (AD&S) methods focus on processing low-resolution images, e.g., $224\times 224$ pixels, obtained by downsampling the original input images. In this setting, downsampling is typically applied also to the provided ground-truth defect masks. Yet, as numerous industrial appli...
Preprint
Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), hav...
Article
Full-text available
We propose a framework that combines traditional, hand-crafted algorithms and recent advances in deep learning to obtain high-quality, high-resolution disparity maps from stereo images. By casting the refinement process as a continuous feature sampling strategy, our neural disparity refinement network can estimate an enhanced disparity map at any o...
Article
Full-text available
The design of patient-specific implants for cranioplasty surgery is time-consuming and challenging. Hence, the 2021 AutoImplant II challenge, consisting of the SkullBreak and SkullFix datasets, was organized to foster research on computer vision techniques pursuing automation of the cranial implant design task. Data-driven methods working on Comput...
Article
Full-text available
In recent years, Neural Fields ( NF s) have emerged as an effective tool for encoding diverse continuous signals such as images, videos, audio, and 3D shapes. When applied to 3D data, NF s offer a solution to the fragmentation and limitations associated with prevalent discrete representations. However, given that NF s are essentially neural ne...
Article
Full-text available
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and d...
Preprint
Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-pai...
Preprint
In this paper, we focus on the problem of rendering novel views from a Neural Radiance Field (NeRF) under unobserved light conditions. To this end, we introduce a novel dataset, dubbed ReNe (Relighting NeRF), framing real world objects under one-light-at-time (OLAT) conditions, annotated with accurate ground-truth camera and light poses. Our acquis...
Preprint
Full-text available
3D semantic segmentation is a critical task in many real-world applications, such as autonomous driving, robotics, and mixed reality. However, the task is extremely challenging due to ambiguities coming from the unstructured, sparse, and uncolored nature of the 3D point clouds. A possible solution is to combine the 3D information with others coming...
Preprint
Implicit Neural Representations (INRs) have emerged in the last few years as a powerful tool to encode continuously a variety of different signals like images, videos, audio and 3D shapes. When applied to 3D shapes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that I...
Article
Availability of labelled data is the major obstacle to the deployment of deep learning algorithms for computer vision tasks in new domains. The fact that many frameworks adopted to solve different tasks share the same architecture suggests that there should be a way of reusing the knowledge learned in a specific setting to solve novel tasks with li...
Preprint
Availability of labelled data is the major obstacle to the deployment of deep learning algorithms for computer vision tasks in new domains. The fact that many frameworks adopted to solve different tasks share the same architecture suggests that there should be a way of reusing the knowledge learned in a specific setting to solve novel tasks with li...
Preprint
Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and d...
Article
Full-text available
LiDAR semantic segmentation is receiving increased attention due to its deployment in autonomous driving applications. As LiDARs come often with other sensors such as RGB cameras, multi-modal approaches for this task have been developed, which however suffer from the domain shift problem as other deep learning approaches. To address this, we propos...
Preprint
In this paper, we propose the first-ever real benchmark thought for evaluating Neural Radiance Fields (NeRFs) and, in general, Neural Rendering (NR) frameworks. We design and implement an effective pipeline for scanning real objects in quantity and effortlessly. Our scan station is built with less than 500$ hardware budget and can collect roughly 4...
Preprint
Full-text available
Point cloud classification is a popular task in 3D vision. However, previous works, usually assume that point clouds at test time are obtained with the same procedure or sensor as those at training time. Unsupervised Domain Adaptation (UDA) instead, breaks this assumption and tries to solve the task on an unlabeled target domain, leveraging only on...
Preprint
We propose X-NeRF, a novel method to learn a Cross-Spectral scene representation given images captured from cameras with different light spectrum sensitivity, based on the Neural Radiance Fields formulation. X-NeRF optimizes camera poses across spectra during training and exploits Normalized Cross-Device Coordinates (NXDC) to render images of diffe...
Article
Image retrieval usually faces scale-variance issues as the amount of image data is rapidly increasing, which calls for more accurate retrieval technology. Besides, existing methods usually treat pair-image similarity as a binary value which indicates whether two images share either at least one common label or none of shared labels. However, such s...
Preprint
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences. Purposely, we introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolu...
Preprint
Embedding of large but redundant data, such as images or text, in a hierarchy of lower-dimensional spaces is one of the key features of representation learning approaches, which nowadays provide state-of-the-art solutions to problems once believed hard or impossible to solve. In this work, in a plot twist with a strong meta aftertaste, we show how...
Preprint
We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel de...
Article
Correspondences between 3D keypoints generated by matching local descriptors are a key step in 3D computer vision and graphic applications. Learned descriptors are rapidly evolving and outperforming the classical handcrafted approaches in the field. Yet, to learn effective representations they require supervision through labeled data, which are cum...
Preprint
Full-text available
We introduce a novel architecture for neural disparity refinement aimed at facilitating deployment of 3D computer vision on cheap and widespread consumer devices, such as mobile phones. Our approach relies on a continuous formulation that enables to estimate a refined disparity map at any arbitrary output resolution. Thereby, it can handle effectiv...
Preprint
Unsupervised Domain Adaptation (UDA) for point cloud classification is an emerging research problem with relevant practical motivations. Reliance on multi-task learning to align features across domains has been the standard way to tackle it. In this paper, we take a different path and propose RefRec, the first approach to investigate pseudo-labels...
Preprint
Although recent semantic segmentation methods have made remarkable progress, they still rely on large amounts of annotated training data, which are often infeasible to collect in the autonomous driving scenario. Previous works usually tackle this issue with Unsupervised Domain Adaptation (UDA), which entails training a network on synthetic images a...
Preprint
Although deep neural networks have achieved remarkable results for the task of semantic segmentation, they usually fail to generalize towards new domains, especially when performing synthetic-to-real adaptation. Such domain shift is particularly noticeable along class boundaries, invalidating one of the main goals of semantic segmentation that cons...
Article
Full-text available
Self-aware robots rely on depth sensing to interact with the surrounding environment, e.g. to pursue object grasping. Yet, dealing with tiny items, often occurring in industrial robotics scenarios, may represent a challenge due to lack of sensors yielding sufficiently accurate depth measurements. Existing active sensors fail at measuring details of...
Article
Full-text available
Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a plethora of 3D feature detectors and descriptors have been proposed in literature, it is quite difficult to identify the most effective detector-descriptor pair in a certain application. Yet, it has been shown in recent works that machine...
Article
Full-text available
Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an...
Article
Precision Livestock Farming relies on several technological approaches to acquire in the most efficient way precise and up-to-date data concerning individual animals. In dairy farming, particular attention is paid to the automatic cow detection and tracking, as such information is closely related to animal welfare and thus to possible health issues...
Preprint
In this paper we investigate how to effectively deploy deep learning in practical industrial settings, such as robotic grasping applications. When a deep-learning based solution is proposed, usually lacks of any simple method to generate the training data. In the industrial field, where automation is the main goal, not bridging this gap is one of t...
Preprint
Defining and reliably finding a canonical orientation for 3D surfaces is key to many Computer Vision and Robotics applications. This task is commonly addressed by handcrafted algorithms exploiting geometric cues deemed as distinctive and robust by the designer. Yet, one might conjecture that humans learn the notion of the inherent orientation of 3D...
Chapter
Surface matching is a fundamental task in 3D computer vision, typically tackled by describing and matching local features computed from the 3D surface. As a result, description of local features lays the foundations for a variety of applications processing 3D data, such as 3D object recognition, 3D registration and reconstruction, and SLAM. A varie...
Preprint
Full-text available
Depth estimation from stereo images is carried out with unmatched results by convolutional neural networks trained end-to-end to regress dense disparities. Like for most tasks, this is possible if large amounts of labelled samples are available for training, possibly covering the whole data distribution encountered at deployment time. Being such an...
Preprint
Full-text available
Whole understanding of the surroundings is paramount to autonomous systems. Recent works have shown that deep neural networks can learn geometry (depth) and motion (optical flow) from a monocular video without any explicit supervision from ground truth annotations, particularly hard to source for these two tasks. In this paper, we take an additiona...
Preprint
Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction. In this work we propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data, which is of special importance, as often mult...
Article
Ambiguity is inherently present in many machine learning tasks, but especially for sequential models seldom accounted for, as most only output a single prediction. In this work we propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data, which is of special importance, as often mult...
Preprint
Full-text available
Object recognition in 3D point clouds is a challenging task, mainly when time is an important factor to deal with, such as in industrial applications. Local descriptors are an amenable choice whenever the 6 DoF pose of recognized objects should also be estimated. However, the pipeline for this kind of descriptors is highly time-consuming. In this w...
Preprint
Availability of a few, large-size, annotated datasets, like ImageNet, Pascal VOC and COCO, has lead deep learning to revolutionize computer vision research by achieving astonishing results in several vision tasks. We argue that new tools to facilitate generation of annotated datasets may help spreading data-driven AI throughout applications and dom...
Article
In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing th...
Preprint
Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a variety of 3D feature detectors and descriptors has been proposed in literature, they have seldom been proposed together and it is yet not clear how to identify the most effective detector-descriptor pair for a specific application. A pro...
Preprint
Establishing correspondences between 3D shapes is a fundamental task in 3D Computer Vision, typically addressed by matching local descriptors. Recently, a few attempts at applying the deep learning paradigm to the task have shown promising results. Yet, the only explored way to learn rotation invariant descriptors has been to feed neural networks w...
Article
Full-text available
State-of-the-art methods to infer dense and accurate depth measurements from images rely on deep CNN models trained in an end-to-end fashion on a significant amount of data. However, despite the outstanding performance achieved, these frameworks suffer a drastic drop in accuracy when dealing with unseen environments much different, concerning appea...
Preprint
Full-text available
State-of-the-art approaches to infer dense depth measurements from images rely on CNNs trained end-to-end on a vast amount of data. However, these approaches suffer a drastic drop in accuracy when dealing with environments much different in appearance and/or context from those observed at training time. This domain shift issue is usually addressed...
Chapter
Object recognition in 3D point clouds is a challenging task, mainly when time is an important factor to deal with, such as in industrial applications. Local descriptors are an amenable choice whenever the 6 DoF pose of recognized objects should also be estimated. However, the pipeline for this kind of descriptors is highly time-consuming. In this w...
Chapter
Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a variety of 3D feature detectors and descriptors has been proposed in literature, they have seldom been proposed together and it is yet not clear how to identify the most effective detector-descriptor pair for a specific application. A pro...
Preprint
Full-text available
In this paper, we propose Augmented Reality Semi-automatic labeling (ARS), a semi-automatic method which leverages on moving a 2D camera by means of a robot, proving precise camera tracking, and an augmented reality pen to define initial object bounding box, to create large labeled datasets with minimal human intervention. By removing the burden of...
Preprint
Full-text available
Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs. Whilst various stereo algorithms have been deployed on these platforms, usually cut down to better mat...
Chapter
While robotic manipulation of rigid objects is quite straightforward, coping with deformable objects is an open issue. More specifically, tasks like tying a knot, wiring a connector or even surgical suturing deal with the domain of Deformable Linear Objects (DLOs). In particular the detection of a DLO is a non-trivial problem especially under clutt...
Chapter
Full-text available
Depth estimation from a single image represents a very exciting challenge in computer vision. While other image-based depth sensing techniques leverage on the geometry between different viewpoints (e.g., stereo or structure from motion), the lack of these cues within a single image renders ill-posed the monocular depth estimation task. For inferenc...
Article
Camera relocalisation is a key computer vision problem. Common techniques match the current image against keyframes with known poses, directly regress the pose, or estimate the pose using camera-world correspondences. Regression forests have become popular for establishing correspondences. They are accurate, but previously needed to be trained offl...
Preprint
Full-text available
Recent works have proven that many relevant visual tasks are closely related one to another. Yet, this connection is seldom deployed in practice due to the lack of practical methodologies to transfer learned concepts across different trains. In this work, we introduce a novel adaptation framework that can operate across both task and domains. Our f...
Preprint
Full-text available
Real world applications of stereo depth estimation require models that are robust to dynamic variations in the environment. Even though deep learning based stereo methods are successful, they often fail to generalize to unseen variations in the environment, making them less suitable for practical applications such as autonomous driving. In this wor...
Article
Full-text available
Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs. Whilst various stereo algorithms have been deployed on these platforms, usually cut down to better mat...
Article
Full-text available
Recognizing packaged grocery products based solely on appearance is still an open issue for modern computer vision systems due to peculiar challenges. Firstly, the number of different items to be recognized is huge (i.e., in the order of thousands) and rapidly changing over time. Moreover, there exist a significant domain shift between the images t...
Preprint
Full-text available
Recognizing packaged grocery products based solely on appearance is still an open issue for modern computer vision systems due to peculiar challenges. Firstly, the number of different items to be recognized is huge (i.e., in the order of thousands) and rapidly changing over time. Moreover, there exist a significant domain shift between the images t...