Vítězslav Beran's research while affiliated with Brno University of Technology and other places

Publications (31)

Conference Paper
In this work, we investigate the application of the Bag-of-Words approach for object search task in 3D domain. Image retrieval task solutions, operating on datasets of thousands and millions images, have proved the effectiveness of Bag-of-Words approach. The availability of low cost RGB-D cameras is a rise of large datasets of 3D data similar to im...
Article
Despite remarkable progress of service robotics in recent years, it seems that a fully autonomous robot which would be able to solve everyday household tasks in a safe and reliable manner is still unachievable. Under certain circumstances, a robot’s abilities might be supported by a remote operator. In order to allow such support, we present a user...
Conference Paper
Up to date, methods from Human-Computer Interaction (HCI) have not been widely adopted in the development of Human-Robot Interaction systems (HRI). In this paper, we describe a system prototype and a use case. The prototype is an augmented reality-based collaborative workspace. The envisioned solution is focused on small and medium enterprises (SME...
Conference Paper
The segmentation of sensory data of various domains is often crucial pre-processing step in many computer vision methods and applications. In this work, we propose a method that leverages the quantization of local feature's distributions for the depth and the temporal information. Three variants of the segmentation method is designed and evaluated...
Conference Paper
This paper presents a novel depth information utilization method for performance boosting of tracking in traditional RGB trackers for arbitrary objects (objects not known in advance) by object segmentation/ separation supported by depth information. The main focus is on real-time applications, such as robotics or surveillance, where exploitation of...
Conference Paper
Full-text available
In the modern digital cinema production, extremely large volumes (in order of 10s of TB) of footage data are captured every day. The process of cataloging and reviewing such footage is nowadays largely manual and time consuming process. In our work, we aim at technical quality aspects, such as correct exposure, color compatibility of adjacent shots...
Conference Paper
This paper deals with a scene pre-processing task - depth image segmentation. Efficiency and accuracy of several methods for depth map segmentation are explored. To meet real-time capable constraints, state-of-the-art techniques needed to be modified. Along with these modifications, new segmentation approaches are presented which aim at optimizing...
Conference Paper
This paper introduces the approach for the real-time camera localization by capturing the plane of chessboard pattern. This task has been already solved by several different approaches, but we present the novel method of the chessboard reconstruction from its incomplete image, that enables successful camera localization even if the captured chessbo...
Conference Paper
This paper explores techniques in the pipeline of image description based on visual codebooks suitable for video on-line processing. The pipeline components are (i) extraction and description of local image features, (ii) translation of each high-dimensional feature descriptor to several most appropriate visual words selected from the discrete code...
Conference Paper
Full-text available
In this paper we describe our experiments in High-level feature extraction (HLF) and Search tasks of the 2009 TRECVid evaluation. This year, we have concentrated mainly on the local (affine covariant) image features and their transformation into a searchable form, especially using the indexing techniques. In brief, we have submitted the following r...
Conference Paper
This paper describes the video summarization system built for the TRECVID 2008 evaluation by the Brno team. Motivations for the system design and its overall structure are described followed by more detailed description of the critical parts of the system. Low-level features, which are extracted from each frame, are clustered to group visually simi...
Conference Paper
Full-text available
This paper describes the video summarization system built for the TRECVID 2007 evaluation by the Brno team. Motivations for the system design and its overall structure are described followed by more detailed description of the critical parts of the system, which are feature extraction and clustering of frames (shots, sub-shots) in time domain. Many...
Conference Paper
Classifiers used in image processing and computer vision are frequent subject of research and exploitation in applications. This contribution does not directly involve research in the classification itself but rather introduces a systematic approach of evaluation of image classifiers, comparison between the classifiers, and "tuning" the classifiers...
Conference Paper
Full-text available
In this paper, we present the findings of the Augmented Multiparty Interaction (AMI) project investigation on the localization and tracking of 2D head positions in meetings. The focus of the study was to test and evaluate various multi-person tracking methods developed in the project using a standardized data set and evaluation methodology.
Conference Paper
Full-text available
In this paper4, we present the findings of the Augmented Multiparty Interaction (AMI) project investigation on the localization and tracking of 2D head positions in meetings. The focus of the study was to test and evaluate various multi-person tracking methods developed in the project using a standardized data set and evaluation methodology. One of...
Article
Full-text available
In this paper, we present the findings of the Augmented Multiparty Interaction (AMI) project investigation on the localization and tracking of 2D head positions in meetings. The focus of the study was to test and evaluate various multi-person tracking methods developed in the project using a standardized data set and evaluation methodology.
Conference Paper
This paper presents improvements carried out to enhance the visual interaction of computer users in existing communication systems. It includes the usage of augmented reality techniques and the modification of a method for user model reconstruction according to particular requirements of such applications. Promised achievement is to prepare the bac...
Article
Visual cues, such as gesturing, looking at each other or monitoring each others facial expressions, play an important role in meetings. Such information can be used for indexing of multimedia meeting recordings. These situations are strongly focused nowadays. The omnidirectional system usage in such situations brings many advantages as portability,...
Article
This paper presents an approach for on-line video motion segmentation. Common methods were designed for off-line processing, where time to process one frame is not so important and varies from minutes to hours. The motivation of our work was an application in robotic perception, where a high computational speed is required. The main contribution of...
Article
Full-text available
This paper presents the procedure for on-line visual-content-based video synchronization. The motivation of our pioneering work is the existence of several off-line video processing systems employed in video classification or summarization applications, but no evidence of on-line solutions for video analysis. In some applications, the video streams...

Citations

... The best way in which to use collected data to improve simulations in both a computationally tractable and useful manner remains an open question Virtual reality (VR) and augmented reality (AR) systems, among other types of simulations, demonstrate promise in terms of reducing integration times and shortening the simulation-to-programming pipeline by enabling line workers and others to quickly prototype potential solutions without needing to program robots at a low level. Burghardt et al. (2020) introduced an integrated VR and digital twin system for programming industrial robots, while results from work by Kapinus et al. (2020) and Gadre et al. (2019) suggested that mixed-and augmented-reality interfaces using head-mounted displays reduce integration time and user workload and improve usability compared with traditional programming interfaces. However, implementation challenges remain for both VR and AR systems, including extensive infrastructure requirements and human factors-related issues (such as motion sickness). ...
... Robotic Arms [21,25,27,33,51,59,79,105,133,136,140,153,182,262,270,282,285,325,326,341,354,358,372] Drones [15,58,76,96,114,171,261,332,382,436,450,451,475,482,494] Mobile Robots [53,91,112,163,169,177,191,197,209,210,212,221,263,305,328,346,370,407,418,445,467,468] Humanoid Robots [14,29,59,158,183,254,262,372,429,439,440] Vehicles [2,7,223,300,314,320,438,458,495] Actuated Objects [116, 117, 127, 159, 167, 243-245, 258, 431, 443, 472, 473] Combinations [29,52,53,66,98,101,117,136,169,177,209,225,327] Other Types [31,65,86,136,206,239,304,337,338,443,476] 1 : 1 [25,30,33,54,67,76,103,118,133,153,216,226,228,251,252,310,313,329,358,361,450,481,482,494] 1 : m [131,142,145,163,176,177,212,235,246,328,416,425] n : 1 [31,112,159,275,317,338,345,364,369,382,430] n : m [200,221,328,408,415,431] Small [86,179,258,328,374,425,443] [ 11,14,18,19,49,55,116,117,127,130,136,163,167,193,221,242,244,245,257,335,340,407,408,429,445,451] [ 29,158,162,191,210,220,229,230,262,272,285,289,315,337,342,346,350,370,372,406,429,464,467] Large [2,7,320,458,475] Near Far ...
... Auditory cues are used to make the robot's behavior more transparent [3,7], or to assist in localization [9]. Bazilinskyy et al. [5] investigated different methods of auditory representation of distance, including Beep Repetition Rate (BRR) where beep time and inter-beep time were a function of distance, Sound Intensity (SI) where the sound volume was a function of distance, and Sound Fundamental Frequency (SFF) where the frequency was a function of distance. ...
... Other solutions worth exploring are those that integrate new emerging technologies such as Augmented Reality (AR) [96], Virtual Reality (VR) [97], or Digital Twins (DT) [98], to list some. These technologies increase the safety of the testing environments [99,100], and decrease the ergonomic load perceived by the user, to encourage more natural humanrobot interactions [101][102][103]. Particularly, the use of DT in industrial applications show promising results in increased safety and interaction [104][105][106], but also in reduced im-plementation time and costs [107]. ...
... Variants of the "bag-of-words", such as a "bag-ofkeypoints" [21] or "bag-of-features" [22] have also been used in image recognition and more specifically, in related tasks, such as 3D object classification [23] and video analysis [24]. In these domains, the methodology has been proven to be pretty versatile, concerning, e. g., the ability to work with arbitrarily sized images [25]. ...
... In the teleoperation systems, the movements of the slave robot can be controlled by various interfaces such as joystick [8], haptic interface [9], [10], 3-D mouse [11], [12] or motion capture systems [13], [14], [15], [16], as in our case. Among these, it has been found that direct motion mapping is the most intuitive and effective method for the user to operate a robot [17]. ...
... Figure 10 suggests that there are less explored application domains, which can be investigated in the future, which include design and creative tasks, remote collaboration, and workspace applications. Manufacturing: joint assembly [21,30,43,138,289], grasping and manipulation [ [67,79,153,185,379], tutorial and simulation [52], welding [313] Maintenance: maintenance of robots [144,252], remote repair [31,48], performance monitoring [126,128], setup and calibration [352], debugging [373] Safety and Inspection: nuclear detection [15], drone monitoring [76, 114,171,482], safety feature [44,66,104,379,385], ground monitoring [211,269,479] Automation and Teleoperation: interactive programming interface [118,131,136,177,270,288,324,496] Logistics: package delivery [265] Aerospace: surface exploration [54], teleoperated manipulator [310], spacecraft maintenance [470] Household Task: authoring home automation [53,111,135,163,177,191,212,228,263,387], item movement and de-livery [76, 108, 209,265], multi-purpose table [430] Photography: drone photography [114,171], Advertisement: mid-air advertisement [317,382,418,475] Wearables Interactive Devices: haptic interaction [443], fog screens [418], head-worn projector for sharable AR scenes [168] Assistance and Companionship: elder care [65], personal assistant [239,367,368] Tour and Exhibition Guide: tour guide [295], museum exhibition guide [168,368], guiding crowds [475], indoor building guide [89], museum interactive display [112] Games: interactive treasure protection game [350], pong-like game [346,370], labyrinth game [258], tangible game [49,178,230], air hockey [91,451], tank battle [90,221], adventure game [55], role-play game [229], checker [242], domino [246], ball target throwing game [337], multiplayer game [115,404], virtual playground [272] Storytelling: immersive storytelling [328,369,395,415,467] Enhanced Display: immersive gaming and digital media [431] Music: animated piano key press [472,473], tangible tabletop music mixer [340] Festivals: festival greetings [382] Aquarium: robotic and virtual sh [240] Remote Teaching: remote live instruction [445] Training: military training for working with robot teammates [199], piano instruction [472,473], robotic environment setup [142], robot assembly guide [18], driving review [11], posture analysis and correction [183,437] ...
... Based on literature review and the current state of the technology, we see SAR as the most suitable instrument to visualize a user interface within a task context. While previous research has shown that gesture control is the preferred input modality for setting the parameters of common industrial tasks, we decided to use a touch-enabled table, which was also rated highly [11], and which is much more reliable. Moreover, together with SAR, it creates a similar user experience to tablets and smart phones, the usage of which is well-known to the general public. ...
... Chrapek et al. [31] extended the RGB tracker, TLD [32] (Tracking-Learning-Detection), to depth sequences. They used the depth image as an additional feature in the tracking phase to improve the feature quality. ...
... Various parameter's evaluation and various system's ranking based on mosaic creation were proposed in literature like ranking of orientation tracking systems [9], ranking of electrooptical (EO) systems [10], Radiographic Quality [11], ranking of radiographic digital system [12], quality of size, shape and position of the image layer in radiographic panoramic images [13], quality of video compression [14] and ranking of tracking methods [15]. ...