
Włodzimierz Kasprzak- Ph.D., D.Sc., Prof.
- Professor at Warsaw University of Technology
Włodzimierz Kasprzak
- Ph.D., D.Sc., Prof.
- Professor at Warsaw University of Technology
About
107
Publications
22,212
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
857
Citations
Introduction
Current institution
Publications
Publications (107)
Research results on human activity classification in video are described, based on initial human skeleton estimation in video frames. Both single person actions and two-person interactions are considered. The initial skeleton data is estimated in selected video frames by OpenPose, HRNet or other dedicated library. Important contributions of present...
A “long short-term memory” (LSTM)-based human activity classifier is presented for skeleton data estimated in video frames. A strong feature engineering step precedes the deep neural network processing. The video was analyzed in short-time chunks created by a sliding window. A fixed number of video frames was selected for every chunk and human skel...
A method of human skeleton-tracking and -refinement, and feature extraction for two-person interaction recognition in video is proposed. Its purpose is to properly reassign the same person-representing skeletons, approximate the missing joints and extract meaningful relational features. In addition, based on the created feature streams, two differe...
An approach to human action classification in videos is presented, based on knowledge-aware initial features extracted from human skeleton data and on further processing by convolutional networks. The proposed smart tracking of skeleton joints, approximation of missing joints and normalization of skeleton data are important steps of feature extract...
We consider the problem of image set comparison, i.e., to determine whether two image sets show the same unique object (approximately) from the same viewpoints. Our proposition is to solve it by a multi-stream fusion of several image recognition paths. Immediate applications of this method can be found in fraud detection, deduplication procedure, o...
A unique problem is considered: how to automatically determine whether two images (pictures, drawings, or photos) show the same 3D object pictured in different views. Similarity or equality of the objects seen despite the spotting, size or angle of view is one of the biggest challenges. Thus, a unique proposition of solving this task is proposed. I...
The goal of the research reported here was to investigate whether the design methodology utilising embodied agents can be applied to produce a multi-modal human–computer interface for cyberspace events visualisation control. This methodology requires that the designed system structure be defined in terms of cooperating agents having well-defined in...
Police and various security services use video analysis for securing public space, mass events, and when investigating criminal activity. Due to a huge amount of data supplied to surveillance systems, some automatic data processing is a necessity. In one typical scenario, an operator marks an object in an image frame and searches for all occurrence...
Nutations of plant organs are significantly affected by the circatidal modulation in the gravitational force exerted by the Moon and Sun (lunisolar tidal acceleration, Etide). In a previous study on nutational rotations of stem apices, we observed abrupt alterations in their direction and irregularities of the recorded trajectories. Such transition...
A computational framework for 3D object recognition in RGB-D images is presented. The focus is on computer vision applications in indoor autonomous robotics, where objects need to be recognized either for the purpose of being grasped and manipulated by the robot, or where the entire scene must be recognized to allow high-level cognitive tasks to be...
This paper presents a method of designing variable structure control systems for robots. As the on-board robot computational resources are limited, but in some cases the demands imposed on the robot by the user are virtually limitless, the solution is to produce a variable structure system. The task dependent part has to be exchanged, however the t...
This paper proposes and evaluates a watermarking-based approach to certify the authenticity of iris images when they are captured by a genuine equipment. In the proposed method, the iris images are secretly signed before being used in biometric processes, and the resulting signature is embedded into the JPEG carrier image in the DCT domain in a dat...
This paper proposes an architecture for tactile-based
fabric learning and classification. The architecture is based on
a number of SVM-based learning units, which we call fabric
classification cores, specifically trained to discriminate between
two fabrics. Each core is based on a specific subset of the fully
available set of features, on the basis...
The article presents a comprehensive strategy of door parts (locks, handles, doorplates etc.) examination as a paradigm of active sensing. It covers the whole process – from segmentation, through initial hypothesis generation based on fuzzy inference, to final recognition and precise localization of the keyholes in a robot base coordinate system. T...
The article focuses on the problem of building dense 3D occupancy maps using commercial RGB-D sensors and the SLAM approach. In particular, it addresses the problem of 3D map representations, that must be able to both store millions of points and to offer efficient update mechanisms. The proposed solution consists of two such key elements, the visu...
The analysis of infrared (IR) images obtained from a robot-mounted camera is presented, with the purpose to reconstruct the 3D surface of texture-less objects located in close range to the camera. The prospective application of this approach is object’s pose recognition with the aim of object grasping by a robot hand. Algorithms are developed that...
A concept learning algorithm is developed, which uses the visual information generated by a virtual receptor in a robotic system (e.g. symbolic image segments) to create learning examples. Its goal is to detect similarities in the training data and to create an appropriate object model. The version-space, intended to describe the possible concept h...
Steganography methods are proposed for the authentication of the holder's photo in an ICAO-consistent (travel) document. The embedded message is heavily influenced by the print-scan process, as the electronic image is first printed to be included into the document (or identity card) and is scanned next to constitute the reference template in an aut...
Robots have to perform diverse and complex tasks. To face the limitations of the computational capabilities of robot on-board control computer, it is required to split the control systems between the robot embedded and the cloud computational resources. This paper presents a reconfigurable control architecture for a robot designed to meet this requ...
To operate autonomously a robot system needs among others to perceive the environment and to recognize the scene objects. In particular, nowadays an RGB-D sensor can be applied for vision-based perception. In this paper, two data-driven RGB-D image analysis steps, required for a reliable 3D object recognition process, are studied and appropriate al...
Cloud robotics is becoming a trend in the modern robotics field, as it became evident that true artificial intelligence can be achieved only by sharing collective knowledge. In the ICT area, the most common way to formulate knowledge is via the ontology concept, where different meanings connect semantically. Additionally, a considerable effort to m...
There is a growing use of RGB-D sensors in vision-based robot perception. A reliable 3D object recognition requires the integration of image-driven and model-based analysis. Only then the low-level image-like representation can be successfully transformed into a symbolic description with equivalent semantics, considered by the ontology-level repres...
Nowadays, the research on robot on-map localization while using landmarks is more intensively dealing with visual code recognition. One of the most popular landmarks of this type is the QR-code. This paper is devoted to the experimental evaluation of vision-based on-map localization procedures that apply QR-codes or NAO marks, as implemented in ser...
The article is devoted to the evaluation of performance of image features with binary descriptors for the purpose of their utilization in recognition of objects by service robots. In the conducted experiments we used the dataset and followed the methodology proposed by Mikolajczyk and Schmid. The performance analysis takes into account the discrimi...
The paper presents the application of artificial intelligence tools for the path planning of complex multi-agent robotic systems. In particular, a solution is proposed to the planning problem for the conjoint operation of two or more mobile robotic fixtures used for the manufacturing of large workpieces, like those used in the aerospace industry. S...
The paper presents a method of placement for the movable supporting heads of a self-reconfigurable robotic fixture system. The whole system consists of two mobile platforms carrying parallel type manipulators equipped with deformable heads. The heads are providing the dynamic support for large flexible workpieces during machining. The two heads con...
The paper presents the application of artificial intelligence tools for the path planning of complex multi-agent robotic systems. In particular, a solution is proposed to the planning problem for the conjoint operation of two or more mobile robotic fixtures used for the manufacturing of large workpieces, like those used in the aerospace industry. S...
The paper presents a method of placement and relocation of the movable components of a self-reconfigurable robotic fixture system. The system consists of a set of mobile platforms carrying parallel type manipulators equipped with deformable heads. The heads support large flexible workpieces during machining. The machined workpieces have complex spa...
We propose a general-purpose virtual receptor for 3D robot vision based on RGB-D sensor data. The application independent robot vision framework performs two basic tasks: it creates a 3D metric map of the environment and it recognizes basic 3D solids and 2D textures and shapes. The design methodology follows the principle of knowledge-based systems...
The paper deals with the problem of recognition of 3D objects for the purpose of their subsequent grasping and manipulation by a two-handed robot. We describe the idea of a general framework for object recognition rooted in the compositional model of the world. This approach threats complex objects as entities constructed of simpler, elementary one...
The work solves the problem of task and motion planning of a self-reconfigurable fixture system. A feasible solution is a key requirement for the viability of such systems, which have raised hopes of overcoming the deficiencies that more traditional fixtures are recognized to have in the dynamic conditions of modern manufacturing, with its increasi...
The current development in electronics and computer science enabled to replace different types of paper or plastic personal documents with their electronic counterparts. The data security and authenticity is of primary concern here. The state-of-the-art protection is provided by PKI (according to ICAO 9303 norm for e-passport), however for older, n...
Purpose
Machining fixtures must fit exactly the work piece to support it appropriately. Even slight change in the design of the work piece renders the costly fixture useless. Substitution of traditional fixtures by a programmable multi‐robot system supporting the work pieces requires a specific control system and a specific programming method enabl...
This paper presents an approach to letter-to-sound translation for the Polish language that is a part of a speech recognition system. It describes the process of automatic generation of Polish letter-to-sound (LTS) rules. The LTS rules were trained with a Polish phonetic lexicon, that was extracted from the “wictionary” - a Polish on-line dictionar...
The use of steganography and watermarking techniques for a secure identification and automatic authentication of the holder’s photo in an ICAO-consistent (travel) document is proposed. A specific distortion of the hidden watermarks is caused by the print-scan process, as a printed photo is scanned to constitute the reference pattern in face verific...
Computer vision plays an increasing role in robotics, as the computing power of modern computers grows year by year allowing more advanced algorithms to be implemented. Along with visual information, depth is also widely used in navigation of mobile robots, for example for obstacle detection. As cheap depth sensors become popular nowadays, there is...
An integrated segmentation approach for color images and depth maps is proposed. The 3D pointclouds are characterized by normal vectors and then grouped into planar, concave or convex faces. The empty regions in the depth map are filled by segments of the associated color image. In the experimental part two types of depth maps are analysed: generat...
We propose two methods for the disambiguation of results in time-delay based detection and localization of sound sources, when a triangle of microphones is applied for signal acquisition. A standard approach is to create histograms of time differences of arrival (TDOA) for each microphone pair in a triangular array and to create an averaged histogr...
A model-based object recognition in video and depth images is proposed for the purpose of semantic map creation in mobile robotics. Three types of objects are modeled: a human silhouette, a chair/table and corridor walls. A bi-driven hypothesis generation and verification strategy is outlined. The object model includes a hierarchic semantic nets, c...
Hand gesture recognition based on free-form contours and probabilistic inference
A computer vision system is described that captures color image sequences, detects and recognizes static hand poses (i.e., "letters") and interprets pose sequences in terms of gestures (i.e., "words"). The hand object is detected with a double-active contour-based meth...
Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition
A stable and accurate estimation of the fundamental frequency (pitch, F 0) is an important requirement in speech and music signal analysis, in tasks like automatic speech recognition and extraction of target signal in noisy en...
The paper presents a stochastic approach to articulated hand (palm shape) tracking in images. The gesture model is given in terms of a Dynamic Bayesian network that incorporates a Hidden Markov Model in order to utilize prior information on gesture structure in the tracking task. The Deformable Templates methodology is applied for hand shape modeli...
A stochastic approach to spoken sentence recognition is proposed for the purpose of an automatic voice-based dialogue system.
Three main tasks are distinguished: word recognition, word chain filtering and sentence recognition. The first task is solved
by typical acoustic processing followed by phonetic word recognition with the use of Hidden Markov...
We are developing two crucial improvements on the time-frequency masking approach to the blind speech separation of underdetermined mixtures when processing anechoic and echoic mixtures. First, the proposed method copes with the usually large amount of delay estimation error that appears in a low frequency band. This step generates a restrictive ma...
A general (application independent) computer vision framework is proposed. It follows the methodology of knowledge-base systems - dividing a system into knowledge base and control. We choose procedural semantic networks for object-oriented modelling of the world. It is basically a non-monotonic logical system. Several inference rules are proposed t...
A general (application independent) framework for the recognition of partially hidden 3-D objects in images is presented.
It views the model-to-image matching as a constraint satisfaction problem (CSP) supported by Bayesian net-based evaluation
of partial variable assignments. A modified incremental search for CSP is designed that allows partial so...
The time-frequency masking approach in blind speech extraction consists of two main steps: feature clustering in a space spanned over delay-time and attenuation rate, and spectrogram masking in order to reconstruct the sources. Usually a binary mask is generated under the strong W-disjoint orthogonal (WDO) assumption (disjoint orthogonal representa...
A planner for a self adaptable and reconfigurable fixture system is proposed. The system is composed of mobile support agents
that support thin sheet metal parts to minimize part dimensional deformation during drilling and milling operations. Compliant
sheet metal parts are widely used in various manufacturing processes including automotive and aer...
Image and speech recognition - a textbook in Polish (published in 2009).
See lecture notes and exercises in English (published in 2012)
In this paper constrained contour models are applied for hand posture recognition in single color images. In particular, the proposed algorithm utilizes a class of physics-based modelling methods called Deformable Templates [1],[2],[3]. After color-based image segmentation a contour hypothesis is detected and some features are extracted, suitable f...
The Rubik's cube puzzle is seen as a benchmark for service robots. In such an application, a computer vision subsystem is required to locate the object in space and to determine the configuration of its colored cells. This paper presents a robust algorithm for Rubik's cube reconstruction from a single view in real time. An issue of special interest...
The technique of independent component analysis (ICA) in Fourier space is proposed for the detection of base functions, that can be applied both in speech coding and contour-based shape description in digital images. Our aim is to compare our coding scheme with the Mel cepstrum features of speech and the complex contour Fourier features of image co...
We propose an approach to hand sign interpretation in image that is based on active contour tracking. We can decompose our approach into 5 steps: a color-based skin pixel detection, a double hand contour detection, the localization of ngers and palm (the hand description generation), the detection of a nal po- sition (with respect to considered sig...
An important source for information about digital i mage content is the texture of image regions. This paper presents a feature extraction a pproach that is based on independent component analysis (ICA ). In ICA a transformation of measured vectored time series is discovered via blind signal processing that gives s tatistically independent source s...
We apply the technique of independent component analysis to Fourier power coefficients of speech signal frames for a blind
detection of basic vectors (sources). A subset of sources corresponding to the noisy influence of basic frequency is identified
and its corresponding features could be eliminated. The mixing coefficients for such sources are th...
The technique of independent component analysis (ICA) is applied for texture feature detection. In ICA an optimal transformation
(with respect to the statistical structure of the image samples set) is discovered via blind signal processing. Any texture
is considered as a mixture of independent sources (basic functions of detected transformation). E...
We propose a blind signal extraction approach to the extraction of binary and sparse images from their under-determined mixtures,
i.e. when the number of sensors is lower by one than the number of unknown sources. A practically feasible solution is proposed
for constrained classes of images, i.e. sparse, binaryvalued and dynamically-constrained sou...
In autonomous indoor navigation some number of localizations and orientations of the vehicle can be learned in advance. No
artificial landmarks are required to exist. We describe and compare the detection of several global features of color images
(sensor data). This constitutes the measurement process in a self-localization approach that is based...
In order to understand commands given through voice by an operator, user or any human, a robot needs to focus on a single source, to acquire a clear speech sample and to recognize it. A two-step approach to the deconvolution of speech and sound mixtures in the time-domain is proposed. At first, we apply a deconvolution procedure, constrained in the...
An approach to speech feature detection is de-veloped, which uses the technique of independent component analysis for a blind (unsupervised learning) detection of basic vectors in the Fourier space. This kind of features could replace the Mel Frequency Cepstrum Coefficient (MFCC) features, widely used today for phoneme-based speech recognition. Alt...
In multichannel blind decon-volution (MBD) the goal is to calculate possibly scaled and delayed estimates of source signals from their convoluted mix-tures, using approximate knowledge of the source characteristics only. Nearly all of the solutions to MBD proposed so far re-quire from source signals to be pair-wise statistically independent and to...
An approach to multi-channel blind deconvolution is developed, which uses an adaptive filter that performs blind source separation in the Fourier space. The approach keeps (during the learning process) the same permutation and provides appropriate scaling of components for all frequency bins in the frequency space. Experiments verify a proper blind...
Natural landmarks are assumed to exist in the environment. Global color image features are extracted from sensor data to feed the robot’s self-localization approach. The color features correspond to natural landmarks, that are learned by the navigation sub-system. During the localization process, which is a Bayes filtering of a Markov environment,...
A method for discrete self-localization of an autonomous mobile system was proposed. One of its many possible implementations was designed, that uses a camera subsystem, which delivers sensor information about the environment reduced to an n-elementary measurement vector. Three different algorithms of image analysis were proposed and implemented. T...
An application-oriented vision-based traffic scene sensor system is designed. Its most important vision modules are identified
and their algorithms are described in details: the on-line auto-calibration modules and three optional modules for 2-D measurement
tasks (i.e. queue length detection, license plate identification and vehicle classification)...
Computer vision applications for traffic scene analysis and autonomous navigation (driver support) require highly sophisticated sensors and computation methods – they constitute a real challenge for image analysis systems. Common to both applications is the moving object detec-tion/tracking task. In this paper we study this task on four different d...
of image analysis, i.e. adaptive algorithmsfor the problem of many image source restoration from their mixtures (on the signal processinglevel), an adaptive image compression and classification approach (on the iconic level) anda visual motion detection/estimation (on the segmentation level).In the second part of this work, a general scheme for the...
Blind source separation problems have recently drawn a lot of attention in unsupervised neural learning. In the current approaches, the number of sources is typically assumed to be known in advance, but this does not usually hold in practical applications. In this paper, various neural network architectures and associated adaptive learning algorith...
. A domain--independent tree search algorithm for semantic network--based image understanding systems is proposed. The basic transition operators for this search, that provide search space expansion, have been designed for a (hierarchical) model--to--image match. In this paper two operators for data--dependent matching are additionally defined. The...
An approach to road recognition and ego-state tracking in monocular image sequences of traffic scenes is described. The main contribution of this paper is the adaptive recognition scheme, which deals with competitive road hypotheses, and its application in several processing steps of an image sequence analysis system. No manual initialization of th...
On-line adaptive learning algorithms for cancellation of additive, convolutive noise from linear mixtures of sources with a simultaneous blind source separation are developed. Associated neural network architectures are proposed. A simple convolutive noise model is assumed, i.e. the unknown additive noise in each channel is a (FIR) filtering versio...
Noise is an unavoidable factor in real sensor signals. We study how additive and convolutive noise can be reduced or even eliminated in the blind source separation (BSS) problem. Particular attention is paid to cases in which the number of sensors is larger than the number of sources. We propose various methods and associated adaptive learning algo...
In this paper an adaptive approach to the cancellation of
additive, convolutional noise from many-source mixtures with
simultaneous blind source separation is proposed. Associated neural
network learning algorithms are developed on the basis of the
decorrelation principle and energy minimization of the output signals.
The reference noise is transfo...
It is known that the independent component analysis (ICA) (also
called blind source separation) can be applied only if the number of
received signals (sensors) is at least equal to the number of mixed
sources, contained in the sensor signals. In this paper an application
of the ICA is proposed for hidden (secured) image transmission by
communicatio...
In this paper a robust method for visual motion estimation under ego--motion is developed. The possible application of this method is image sequence analysis of road traffic or airport runway/taxiway scenes, where the camera is located in a moving vehicle. The method combines an application independent estimation of visual motion with specific meth...
Novel on-line learning algorithms with self adaptive learning
rates (parameters) for blind separation of signals are proposed. The
main motivation for development of new learning rules is to improve
convergence speed and to reduce cross-talk, especially for
non-stationary signals. Furthermore, we have discovered that under some
conditions the propo...
In this paper a robust method for visual motion estimation under ego-motion is developed. The possible application of this method is image sequence analysis of road traffic or airport runway/taxiway scenes, where the camera is located in a moving vehicle. The method combines an application independent estimation of visual motion with specific metho...
: In this paper we propose improved, high speed convergence algorithms for principal subspace analysis (PSA) and related principal component analysis (PCA). We have confirmed by computer simulations that applied recursive least squares (RLS) technique together with deflation preprocessing, dramatically improves the performance and reduces the train...
: In this paper a neural network approach for reconstruction of natural highly correlated images from linear (additive) mixture of them is proposed. A multi--layer architecture with local on--line learning rules is developed to solve the problem of blind separation of sources. The main motivation for using a multi--layer network instead of a single...
In this paper a fast and efficient adaptive learning algorithm for estimation of the principal components is developed. It seems to be especially useful in applications with
changing environment, where the learning process has to be repeated in on-line manner.
The approach can be called the cascade recursive least square (CRLS) method, as it
com...
The recurrent least squares (RLS) learning approach is proposed for controlling the learning rate in parallel principal subspace analysis (PSA) and in a wide class of principal component analysis (PCA) associated algorithms with a quasi{parallel extraction ability. The purpose is to provide a useful tool for applications where the learning process...
An adaptive object recognition scheme for image sequences of many object scenes is described. The scheme is applied for traffic object recognition under ego--motion. The recursive estimation of object states is performed by an extended Kalman Filter with modified error estimation, which is a neural network learning process. This new feature allows...
The common task of several processing steps in a vision system for
autonomous road vehicle guidance is to stabilize the single image
measurements of following image or scene features: contour motion,
vanishing point, road class, moving object state and egomotion. An
adaptive estimation procedure with linear or extended Kalman filter is
applied for...
A general analysis strategy is expressed in terms of state space
search. The advantages of optimal tree search and true maintenance
systems in the context of dynamic analysis are discussed. A consecutive
tree search algorithm is proposed for the control mechanism in image
sequence analysis. It combines the focusing strength of tree search with
foca...