Josep R Casas

Josep R Casas
Universitat Politècnica de Catalunya | UPC · Department of Signal Theory and Communications (TSC)

PhD in Telecommunications Engineering

About

133
Publications
12,954
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,300
Citations
Introduction
I do not completely agree with ResearchGate self-archiving policy, as I cannot consider ResearchGate as my own web page... You can download my self-archived papers from https://imatge.upc.edu/web/josep
Additional affiliations
January 1991 - present
Universitat Politècnica de Catalunya

Publications

Publications (133)
Article
Full-text available
Wendelstein 7-X (W7-X) is the leading experiment on the path of demonstrating that stellarators are a feasible concept for a future power plant. One of its major goals is to prove quasi-steady-state operation in a reactor-relevant parameter regime. The surveillance and protection of the water-cooled plasma-facing components (PFCs) against overheati...
Preprint
Full-text available
Detecting pedestrians is a crucial task in autonomous driving systems to ensure the safety of drivers and pedestrians. The technologies involved in these algorithms must be precise and reliable, regardless of environment conditions. Relying solely on RGB cameras may not be enough to recognize road environments in situations where cameras cannot cap...
Article
Full-text available
This paper presents a novel calibration method for solid-state LiDAR devices based on a geometrical description of their scanning system, which has variable angular resolution. Determining this distortion across the entire Field-of-View of the system yields accurate and precise measurements which enable it to be combined with other sensors. On the...
Preprint
We propose a novel 3D segmentation method for RBGD stream data to deal with 3D object segmentation task in a generic scenario with frequent object interactions. It mainly contributes in two aspects, while being generic and not requiring initialization: firstly, a novel tree structure representation for the point cloud of the scene is proposed. Then...
Article
Full-text available
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks...
Article
This document presents a novel method based on Convolutional Neural Networks (CNN) to obtain correspondence matchings between sets of keypoints of several unorganized 3D point cloud captures, independently of the sensor used. The proposed technique extends a state-of-the-art method for correspondence matching in standard 2D images to sets of unorga...
Article
Video segmentation is an important building block for high level applications such as scene understanding and interaction analysis. While outstanding results are achieved in this field by state-of-the-art learning and model based methods, they are restricted to certain types of scenes or require a large amount of annotated training data to achieve...
Article
Full-text available
Humanoid robots introduce instabilities during biped march that complicate the process of estimating their position and orientation along time. Tracking humanoid robots may be useful not only in typical applications such as navigation, but in tasks that require benchmarking the multiple processes that involve registering measures about the performa...
Conference Paper
Commercial depth sensors represent an opportunity for automation of certain 3D production and analysis tasks. One way to overcome some of their inherent limitations is by capturing the same scene with several depth sensors and merging their data, i.e. by performing 3D data fusion, which requires the registration of point clouds from different senso...
Chapter
Given the widespread availability of point cloud data from consumer depth sensors, 3D segmentation becomes a promising building block for high level applications such as scene understanding and interaction analysis. It benefits from the richer information contained in actual world 3D data compared to apparent (projected) data in 2D images. This als...
Conference Paper
Given the widespread availability of point cloud data from consumer depth sensors, 3D point cloud segmentation becomes a promising building block for high level applications such as scene understanding and interaction analysis. It benefits from the richer information contained in real world 3D data compared to 2D images. This also implies that the...
Conference Paper
End-effectors are usually related to the location of the free end of a kinematic chain. Each of them contains rich structure information about the entity. Hence, estimating stable end-effectors of different entities enables robust tracking as well as a generic representation. In this paper, we present a system for end-effector estimation from RGB-D...
Article
Full-text available
In this paper, we propose a gesture-based interface designed to interact with panoramic scenes. The system combines novel static gestures with a fast hand tracking method. Our proposal is to use static gestures as shortcuts to activate functionalities of the system (i.e. volume up/down, mute, pause, etc.), and hand tracking to freely explore the pa...
Article
A method to obtain accurate hand gesture classification and fingertip localization from depth images is proposed. The Oriented Radial Distribution feature is utilized, exploiting its ability to globally describe hand poses, but also to locally detect fingertip positions. Hence, hand gesture and fingertip locations are characterized with a single fe...
Conference Paper
Full-text available
Aquest treball vol promoure la col·laboració i coordinació entre assignatures de processat d'imatge/vídeo amb l’objectiu de potenciar els resultats en l'aprenentatge. Les principals contribucions son a) la creació d’un banc de materials comú: demostradors gràfics, col·leccions de problemes, qüestionaris, etc. i b) l’establiment d’estratègies per ut...
Article
Full-text available
This article provides an alternative solution for the costly representation of multi-view video data, which can be used for both rendering and scene analysis. First, a new, efficient Monte Carlo discrete surface reconstruction method for foreground objects with static background is presented, which outperforms volumetric techniques and is suitable...
Conference Paper
The 'old' remote falls short of requirements when confronted with digital convergence for living room displays. Enriched options to watch, manage and interact with content on large displays demand improved means of interaction. Concurrently, gesture recognition is increasingly present in human-computer interaction for gaming applications. In this p...
Conference Paper
In this demo we present INTAIRACT, an online hand-based touchless interaction system. Interactions are based on easy-to-learn hand gestures, that combined with translations and rotations render a user friendly and highly configurable system. The main advantage with respect to existing approaches is that we are able to robustly locate and identify f...
Conference Paper
Full-text available
Temporal clustering of human motion into semantically meaningful behaviors is a challenging task. While unsupervised methods do well to some extent, the obtained clus-ters often lack a semantic interpretation. In this paper, we propose to learn what makes a sequence of human poses different from others such that it should be annotated as an ac-tion...
Conference Paper
Full-text available
In this paper we present a novel approach to markerless human motion capture that robustly integrates body part detections in multiple views. The proposed method fuses cues from multiple views to enhance the propagation and observation model of particle filtering methods aiming at human motion capture. We particularize our method to improve arm tra...
Article
This paper presents a human action recognition framework based on the theory of nonlinear dynamical systems. The ultimate aim of our method is to recognize actions from multi-view video. We estimate and represent human motion by means of a virtual skeleton model providing the basis for a view-invariant representation of human actions. Actions are m...
Conference Paper
Full-text available
End-effectors are considered to be the main topological extremities of a given 3D body. Even if the nature of such body is not restricted, this paper focuses on the human body case. Detection of human extremities is a key issue in the human motion capture domain, being needed to initialize and update the tracker. Therefore, the effectiveness of hum...
Article
This document presents the implemented algorithms for human motion analysis within ACTIBIO. According to the systems developed in WP3 two main branches of analysis have been defined: model-based and model-free analysis. Model-based analysis targets methods that employ high-level parameters obtained from a body model. These parameters are basically...
Article
Full-text available
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70 of erro...
Article
Full-text available
This article presents a new approach to the problem of simultaneous tracking of several people in low-resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene, being the starting point of the processing chain. We review how the initiation and terminati...
Conference Paper
Full-text available
This paper presents a methodology for obtaining a 3D reconstruction of a dynamic scene in multi-camera settings. Our target is to derive a compact representation of the 3D scene which is effective and accurate, whatever the number of cameras and even for very-wide baseline settings. Easing realtime 3D scene capture has outstanding applications in 2...
Conference Paper
We present a novel method for upper body pose estimation with online initialization of pose and the anthropometric profile. Our method is based on a Hierarchical Particle Filter that defines its likelihood function with a single view depth map provided by a range sensor. We use Connected Operators on range data to detect hand and head candidates th...
Article
This paper presents a general analysis framework towards exploiting the underlying hierarchical and scalable structure of an articulated object for pose estimation and tracking. Scalable human body models are introduced as an ordered set of articulated models fulfilling an inclusive hierarchy. The concept of annealing is applied to derive a generic...
Conference Paper
Full-text available
In this paper we present a novel foreground segmentation and D reconstruction system for multi-view scenarios. The system achieves correct 3D object reconstruction even when foreground segmentation presents critical misses in some of the views. We introduce the spatial redundancy of the multi-view data into the foreground segmentation process by co...
Conference Paper
Full-text available
A novel real-time algorithm for head and hand tracking is proposed in this paper. This approach is based on data from a range camera, which is exploited to resolve ambiguities and overlaps. The position of the head is estimated with a depth-based template matching, its robustness being reinforced with an adaptive search zone. Hands are detected in...
Conference Paper
We present a real-time human body tracking system for a single user in a Smart Room scenario. In this paper we propose a novel system that involves a silhouette-based cost function using variable windows, a hierarchical optimization method, parallel implementations of pixel-based algorithms and efficient usage of a low-cost hardware structure. Resu...
Conference Paper
This paper presents a novel method for filtering and extraction of human body features from 3D data, either from multi-view images or range sensors. The proposed algorithm consists in processing the geodesic distances on a 3D surface representing the human body in order to find prominent maxima representing salient points of the human body. We intr...
Conference Paper
This paper presents a model-based hierarchical particle filtering algorithm to estimate the pose and anthropometric parameters of humans in multi-view environments. Our method incorporates a novel likelihood measurement approach consisting of an approximate partitioning of observations. Provided that a partitioning of the human body model has been...
Article
This paper presents the use of Time-of-Flight (ToF) cameras in smart-rooms and how this leads to improved results in segmenting the people in the room from the background and consequently better 3D reconstruction of foreground objects. A calibrated rig consisting of one Swissranger SR3100 Time-of-Flight range camera and a high resolution standard C...
Conference Paper
Full-text available
We address the problem of reconstructing 3D shapes from color data available in a sparse set of views from all directions of a scene. As an advantage when compared to multiview stereo approaches, our method is able to reconstruct object surfaces from a small number of views in wide-baseline setups. This introduces a trade-off between reconstruction...
Conference Paper
Full-text available
This paper considers the problem of reconstructing a surface from a point set. More specifically, we propose a method which focuses on obtaining fast surface reconstructions for visual purposes. The proposed scheme is based on propagation in a voxelized space, which is performed in the directions defined by a propagation pattern, during an optimal...
Conference Paper
This paper presents a fast algorithm for recovering shape and motion of dynamic scenes from sequences of silhouettes in calibrated multi-camera settings in wide-baseline configurations. The proposed method captures the static shape on the first frame by deforming an arbitrary set of points located on an initial surface enclosing the actual scene. T...
Conference Paper
This paper presents a system for 3D reconstruction from video sequences acquired in multi-camera environments. In particular, the 3D surfaces of foreground objects in the scene are extracted and represented by polygon meshes. Three stages are concatenated to process multi-view data. First, a foreground segmentation method extracts silhouettes of ob...
Article
Full-text available
This document defines the overall requirements that the FascinatE system should meet. The deliverable proposes three scenarios, depending on the configuration and functionality of the complete delivery chain, where to develop and study the possible FascinatE requirements. Peer Reviewed Postprint (published version)
Conference Paper
Full-text available
The current paper presents a low-complexity approach to the problem of simultaneous tracking of several people in low resolution sequences from multiple calibrated cameras. Redundancy among cameras is exploited to generate a discrete 3D colored representation of the scene. The proposed filtering technique estimates the centroid of a target using on...
Conference Paper
Full-text available
This paper presents a view-invariant approach to gait recognition in multi-camera scenarios exploiting a joint spatio-temporal data representation and analysis. First, multi-view information is employed to generate a 3D voxel reconstruction of the scene under study. The analyzed subject is tracked and its centroid and orientation allow recentering...
Chapter
Full-text available
It is a common experience in our modern world, for us humans to be overwhelmed by the complexities of technological artifacts around us, and by the attention they demand. While technology provides wonderful support and helpful assistance, it also causes an increased preoccupation with technology itself and a related fragmentation of attention. But...
Conference Paper
Full-text available
We propose a view-invariant representation of human appearance in multi-view scenarios consisting in a new set of views that overcome the view-dependency and moderate occlusion problems of fixed cameras. First, a 3D reconstruction of the scene is generated, from which we can track multiple persons in the scenario. For each tracked subject, we defin...
Article
The present document aims at reviewing the state-of-the-art algorithms for human detection and tracking that are relevant to ACTIBIO objectives. The document is split into two main parts: Human Detection and Human Body Tracking Preprint
Article
Full-text available
This paper presents a low-cost real-time alternative to available commercial human motion capture systems. First, a set of distinguishable markers are placed on several human body landmarks, and the scene is captured by a number of calibrated and synchronized cameras. In order to establish a physical relation among markers, a human body model is de...
Conference Paper
Full-text available
Illumination changes may lead to false foreground (FG) segmentation and tracking results. Most of the existing FG extraction algorithms obtain a background (BG) estimation from temporal statistical parameters. Such algorithms consider a quasi-static BG which does not change but slowly. Therefore, fast illumination changes are not taken into account...
Conference Paper
Full-text available
This paper presents a low cost real-time alternative to available commercial human motion capture systems. First, a set of distinguishable markers are placed on several human body landmarks and the scene is captured by a number of calibrated and synchronized cameras. In order to establish a physical relation among markers, a human body model (HBM)...
Article
Full-text available
Tracking of unrestricted human movement has received great attention by the computer vision community basically fostered by the number of applications that benefit from it. Despite this research focus, there are few established mech-anisms for evaluating and comparing the performance of reported solutions. Existing metrics to quantify the perfor-ma...
Conference Paper
Full-text available
This paper presents a view-independent approach to markerless human motion capture in low resolution sequences from multiple calibrated and synchronized cameras. Redundancy among cameras is exploited to generate a 3D voxelized representation of the scene and a human body model (HBM) is introduced towards analyzing these data. An annealed particle f...
Article
Full-text available
Acoustic events produced in meeting environments may contain useful information for perceptually aware inter-faces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multi-modal perspective is presented combining information from multiple cameras and microphones. First, spectral and tem-poral featu...
Article
Full-text available
At the Technical University of Catalonia (UPC), a smart room has been equipped with 85 microphones and 8 cameras. This paper describes the setup of the sensors, gives an overview of the underlying hardware and software infrastructure and indicates possibilities for highand low-level multi-modal interaction. An example of usage of the information co...
Article
Full-text available
Mitjançant el present ajut s’ha ampliat l’aplicació en xarxa LAVICAD (LAboratori VIrtual de COmunicacions Analògiques i Digitals) que s’ofereix de forma integrada dins de la plataforma d’e-learning COM@WEB. LAVICAD és una eina programada en Java i Matlab i està formada per un conjunt de simuladors de la capa física de sistemes de comunicacions. Tot...
Article
Full-text available
The introduction of active (pan-tilt-zoom or PTZ) cameras in Smart Rooms in addition to fixed static cameras allows to improve resolution in volumetric reconstruction, adding the capability to track smaller objects with higher precision in actual 3D world coordinates. To accomplish this goal, precise camera calibration data should be available for...
Conference Paper
Full-text available
The detection of the acoustic events (AEs) that are naturally produced in a meeting room may help to describe the human and social activity that takes place in it. When applied to spontaneous recordings, the detection of AEs from only audio information shows a large amount of errors, which are mostly due to temporal overlapping of sounds. In this p...
Chapter
When a person enters a room, he or she immediately develops a mental concept about “what is going on” in the room; for example, people may be working in the room, people may be engaged in a conversation, or the room may be empty. The CHIL services depend on just the same kind of semantic description, which is termed activity in the following. The “...
Chapter
The CHIL Memory Jog service focuses on facilitating the collaboration of participants in meetings, lectures, presentations, and other human interactive events, occurring in indoor CHIL spaces. It exploits the whole set of the perceptual components that have been developed by the CHIL Consortium partners (e.g., person tracking, face identification,...
Chapter
One of the most basic building blocks for the understanding of human actions and interactions is the accurate detection and tracking of persons in a scene. In constrained scenarios involving at most one subject, or in situations where persons can be confined to a controlled monitoring space or required to wear markers, sensors, or microphones, thes...