
Muhammad Haroon Yousaf- PhD Computer Engineering
- Professor at University of Engineering and Technology Taxila
Muhammad Haroon Yousaf
- PhD Computer Engineering
- Professor at University of Engineering and Technology Taxila
Professor of Computer Engineering.....
Director, Swarm Robotics Lab - National Centre for Robotics and Automation (NCRA)
About
121
Publications
75,226
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,808
Citations
Introduction
Educator | Researcher | Mentor | Reformer
Current institution
Additional affiliations
June 2018 - present
December 2015 - present
July 2007 - December 2015
Education
November 2007 - November 2012
Publications
Publications (121)
Urban transportation management increasingly relies on Intelligent
Transportation Systems (ITS), where Vehicle Make and Model Recognition
(VMMR) plays a vital role in surveillance, traffic monitoring, and infrastructure
planning. However, traffic conditions in developing nations such as Pakistan
present unique challenges due to unstructured driving...
Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimod...
The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audiovisual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Associatio...
Motorbikes are an integral part of transportation in emerging countries, but unfortunately, motorbike users are also one the most vulnerable road users (VRUs) and are engaged in a large number of yearly accidents. So, motorbike detection is very important for proper traffic surveillance, road safety, and security. Most of the work related to bike d...
Background
Melanoma is one of the deadliest skin cancers that originate from melanocytes due to sun exposure, causing mutations. Early detection boosts the cure rate to 90%, but misclassification drops survival to 15–20%. Clinical variations challenge dermatologists in distinguishing benign nevi and melanomas. Current diagnostic methods, including...
The generation of a large human-labelled facial expression dataset is challenging due to ambiguity in labelling the facial expression class, and annotation cost. However, facial expression recognition (FER) systems demand discriminative feature representation, and require many training samples to establish stronger decision boundaries. Recently, FE...
Vehicle make and model recognition (VMMR) is an important aspect of intelligent transportation systems (ITS). In VMMR systems, surveillance cameras capture vehicle images for real-time vehicle detection and recognition. These captured images pose challenges, including shadows, reflections, changes in weather and illumination, occlusions, and perspe...
In the recent era of technological advancements, surveillance cameras are installed in crowded areas to ensure public protection. In the video surveillance context, contents belonging to suspicious actions are very less in course of the surveillance stream. Therefore, manual monitoring of suspicious actions may become very exhaustive, which effects...
Object detection is a critical task that becomes difficult when dealing with onboard detection using aerial images and computer vision technique. The main challenges with aerial images are small target sizes, low resolution, occlusion, attitude, and scale variations, which affect the performance of many object detectors. The accuracy of the detecti...
A swarm of robots is the coordination of multiple robots that can perform a collective task and solve a problem more efficiently than a single robot. Over the last decade, this area of research has received significant interest from scientists due to its large field of applications in military or civil, including area exploration, target search and...
With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use s...
In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability of large scale audio-visual datasets is instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim of this paper is to levera...
Automatic speech recognition (ASR) has ensured a convenient and fast mode of communication between humans and computers. It has become more accurate over the passage of time. However, in majority of ASR systems, the models have been trained using native English accents. While they serve best for native English speakers, their accuracy drops drastic...
Recent years have seen an increased interest in establishing association between faces and voices of celebrities leveraging audio-visual information from YouTube. Prior works adopt metric learning methods to learn an embedding space that is amenable for associated matching and verification tasks. Albeit showing some progress, such formulations are,...
Asphalt pavement distresses are the major concern
of underdeveloped and developed nations for the smooth running
of daily life commute. Among various pavement failures, numerous
research can be found on potholes detection as they are injurious
to automobiles and passengers; may turn into an accident.
This work is intended to explore the potential o...
Classification models for human action recognition require robust features and large training sets for good generalization. However, data augmentation methods are employed for imbalanced training sets to achieve higher accuracy. These samples generated using data augmentation only reflect existing samples within the training set, their feature repr...
We study the problem of learning association between face and voice, which is gaining interest in the computer vision community lately. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks. Albeit showing some progress, such loss formulations are, however, restri...
In the current era of technological development, human actions can be recorded in public places like airports, shopping malls, and educational institutes, etc., to monitor suspicious activities like terrorism, fighting, theft, and vandalism. Surveillance videos contain adequate visual and motion information for events that occur within a camera’s v...
Melanoma malignancy recognition is a challenging task due to the existence of intraclass similarity, natural or clinical artefacts, skin contrast variation, and higher visual similarity among the normal or melanoma-affected skin. To overcome these problems, we propose a novel solution by leveraging “region-extreme convolutional neural network” for...
In the work presented here, a new technique based upon stereo vision is proposed to acquire three-dimensional time-resolved trajectories with labeling support for multi-species insects under unconstrained flying conditions. A low-cost, off-the-shelf depth camera is used for stereo vision which is equipped with two wide-angle global-shutter IR image...
Classroom communication involves teacher’s behavior and student’s responses. Extensive research has been done on the analysis of student’s facial expressions, but the impact of instructor’s facial expressions is yet an unexplored area of research. Facial expression recognition has the potential to predict the impact of teacher’s emotions in a class...
Robust view-invariant human action recognition (HAR) requires effective representation of its temporal structure in multi-view videos. This study explores a view-invariant action representation based on convolutional features. Action representation over long video segments is computationally expensive, whereas features in short video segments limit...
Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition. Inspired from this, we introduce a challenging task in establishing association between faces and voices across multiple languages spoken by the same set of persons. The aim of this paper is to an...
The analysis and detection of skin cancer diseases from skin lesion have always been tedious when done manually. The complex nature of skin lesion images is one of the key reasons for this. The skin lesion images contain noise and artifacts such as hairs, oil and bubbles, blood vessels, and skin lines. They also have variegated colors, low contrast...
The development of kinematic model of hand can play a vital role in hand gesture recognition and Human Computer Interaction (HCI) applications. This paper proposes an algorithm for finger identification and joints localization, thus generating the kinematic model of
human hand by means of image processing techniques. Skin tone analysis and backgrou...
Segmentation of foreground objects is an important issue in computer vision. Since, there exist no predefined classes for unsupervised learning, which makes it computationally expensive. Also, the contrast of foreground to background needs to be tackled effectively. To overcome the aforementioned problems, in this paper an efficient approach named...
Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being mo...
Melanoma is the skin cancer caused by the ultraviolet radiation from the Sun and has only 15-20% of survival rate. Late diagnosis of melanoma leads to the severe malignancy of disease, and metastasis expands to the other body organs i.e. liver, lungs and brain. The dermatologists analyze the pigmented lesions over the skin to discriminate melanoma...
Road accidents are major cause of death which has been increased by 46% since 1990. In recent years, significant efforts have been invested in four-wheeled vehicle detection that improved intelligent transportation systems and decreased the calamity rate. However, the automatic detection of two-wheeled vehicles remains challenging due to occlusion,...
Detection of human actions in long untrimmed videos is an important but challenging task due to the unconstrained nature of actions present in untrimmed videos. We argue that untrimmed videos contain multiple snippets from actions and the background classes having significant correlation with each other, which results in imprecise detection of star...
Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the fea...
Recent years have seen a surge in finding association between faces and voices within a cross-modal biometric application along with speaker recognition. Inspired from this, we introduce a challenging task in establishing association between faces and voices across multiple languages spoken by the same set of persons. The aim of this paper is to an...
The recognition of human actions recorded in a multi-camera environment faces the challenging issue of viewpoint variation. Multi-view methods employ videos from different views to generate a compact view-invariant representation of human actions. This paper proposes a novel multi-view human action recognition approach that uses multiple low-dimens...
Implementing accurate and reliable passenger detection and counting system is an important task for the correct distribution of available transport system. The aim of this paper is to develop an accurate computer vision-based system to track and count passengers. The proposed passenger detection system incorporates the ideas of well-established det...
Vehicle make and model recognition (VMMR) is a key task for automated vehicular surveillance (AVS) and various intelligent transport system (ITS) applications. In this paper, we propose and study the suitability of the bag of expressions (BoE) approach for VMMR-based applications. The method includes neighborhood information in addition to visual w...
Human action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic s...
Human action recognition has the potential to predict the activities of an instructor within the lecture room. Evaluation of lecture delivery can help teachers analyze shortcomings and plan lectures more effectively. However, manual or peer evaluation is time-consuming, tedious and sometimes it is difficult to remember all the details of the lectur...
In this work, we propose a new method to generate temporal action proposals from long untrimmed videos named Temporally Aggregated Bag-of-Discriminant-Words (TAB). TAB is based on the observation that there are many overlapping frames in action and background temporal regions of untrimmed videos, which cause difficulties in segmenting actions from...
Deep learning has led to a series of breakthrough in the human action recognition field. Given the powerful representational ability of residual networks (ResNet), performance in many computer vision tasks including human action recognition has improved. Motivated by the success of ResNet, we use the residual network and its variations to obtain fe...
Detecting human actions in long untrimmed videos is a challenging problem. Existing temporal action detection methods have difficulties in finding the precise starting and ending time of the actions in untrimmed videos. In this letter, we propose a temporal action detection framework based on a Bag of Discriminant Snippets (BoDS) that can detect mu...
Video surveillance systems have become one of the most useful entities in our routine life. Surveillance videos contain plenty of visual information about criminal actions happening in the field-of-view. With the increase of criminal activities, it is mandatory to develop the accurate criminal recognition system. Our paper aims to propose and evalu...
An automatic content-based video retrieval (CBVR) system utilises visual data of videos to search for user’s desired query from a large collection of videos. The need for CBVR can be attributed to the massive growth of video data, which made the manual-based retrieval of videos a challenging task. CBVR (for human action videos) relies on action rec...
Real-time vehicle detection is one of the challenging problems for automotive and autonomous driving applications. Object detection using Deformable Parts Model (DPM) proved to be a promising approach providing higher detection accuracy. But the baseline DPM scheme spends 98% of its execution time in loop processing thus highlighting the drawback o...
Identifying and restoring distresses in asphalt pavement have key significance in durability and long life of roads and highways. A vast number of accidents occurs on the roads and highways due to the pavement distresses. This paper aims to detect and localize one of the critical roadway distresses, the potholes, using computer vision. We have proc...
Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weig...
The Bag of Words (BoW) approach has been widely used for human action recognition in recent state-of-the-art methods. In this paper, we introduce what we call a Bag of Expression (BoE) framework, based on the bag of words method, for recognizing human action in simple and realistic scenarios. The proposed approach includes space time neighborhood i...
The detection of the spatial-temporal interest points has a key role in human action recognition algorithms. This research work aims to exploit the existing strength of bag-of-visual features and presents a method for automatic action recognition in realistic and complex scenarios. This paper provides a better feature representation by combining th...
This letter proposes a method for the generation of temporal action proposals for the segmentation of long uncut video sequences. The presence of consecutive multiple actions in video sequences makes the temporal segmentation a challenging problem due to the unconstrained nature of actions in space and time. To address this issue, we exploit the no...
Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weig...
Human action recognition in realistic scenarios is an important yet challenging task. In this paper, we propose a new method, Inter and Intra class correlation analysis (IICCA), to handle inter and intra class variations observed in realistic scenarios. Our contribution includes learning the class specific visual representation that efficiently rep...
Finding an accurate and computationally efficient vehicle detection and classification algorithm for urban environment is challenging due to large video datasets and complexity of the task. Many algorithms have been proposed but there is no efficient algorithm due to various real-time issues. This paper proposes an algorithm which addresses shadow...
This paper proposes an improvement in the Bag of Words (BoW) model by introducing spatial position of features in interaction representation stage for Human Vehicle Interaction Recognition. The spatial positions of features are incorporated along with feature descriptor to obtain the structural information required to correctly classify different k...
Image forgery detection is one of the prominent areas from research and development perspective. This research work aims to propose a scheme for the detection of multiple types of image forgeries. In this paper, a generic passive image forgery scheme is proposed using spatial rich model (SRM) in combination with textural feature i.e. local binary p...
This chapter discusses traffic flow analysis in the context of technologies for intelligent transportation systems (ITS) from three important perspectives. First, we look at traffic flow from a transport engineering perspective so as to set the context in terms of the application domain. Second, we consider how flow could be observed using computer...
In this paper, a silhouette-based view-independent human action recognition scheme is proposed for multi-camera dataset. To overcome the high-dimensionality issue, incurred due to multi-camera data, the low-dimensional representation based on Motion History Image (MHI) was extracted. A single MHI is computed for each view/action video. For efficien...
Instructor activity recognition can certainly play its part as an important parameter in evaluating and improving the performance of an instructor. This paper presents a single-layered sequential approach for instructor activity recognition in the lecture room environment. A hidden Markov model (HMM) scheme is selected as a sequential approach for...
In this paper, a silhouette-based view-independent human action recognition scheme is proposed for multi-camera dataset. To overcome the high-dimensionality issue, incurred due to multi-camera data, the low-dimensional representation based on Motion History Image (MHI) was extracted. A single MHI is computed for each view/action video. For efficien...
This work presented a leading and more secured least significant bits stegnography technique having variable information hiding capacity and signal to noise ratio (SNR). Hiding capacity was enhanced by sacrificing signal to noise ratio (SNR) and vice versa. A give and take was made between the capacity and SNR depending on the situation it was used...
Aabstract: A human body consists of a complex 3D structure. Conversion of 3D structures into 2D leads to a loss of information and may result in incorrect disease diagnosis. This issue has grasped the attention of researchers involved in 3D modeling. MRI scans consist of a large number of 2D slices, which makes 3D reconstruction a complex and time-...
Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment soun...
Performance analysis of instructors in the lecture room plays a significant role in maintaining the higher education quality and standards. This paper presents a novel approach for the evaluation of instructor's performance and behavior in the lecture room. Proposed approach employs the lecture video using face recognition and pose estimation of in...
To determine a collision free path for a robot between start and goal positions in an environment filled with obstacles is a very challenging task in the design of an autonomous robot path planning. This paper aims to select an optimal path planning algorithm for a mobile robot in structured environment. To achieve the goal, comprehensive strengths...
Questions
Questions (9)
Hi
Dear,
I want to know about the availability of any dataset in precision agriculture domain?
I want to know about, what the latest trends are and challenges in the image stegnography. As variety of the algorithms exist for capacity enhancement and efficient stegnography. Then whats new in this domain.
Dear Fellows,
I need some information. Is there any video/image dataset on street crime scenes?
If I want to perform quality inspection of ceramic tiles then what is the better technique in computer vision to find out the surface defects on the tiles
I am working on potholes detection algorithms using image processing. Is there any appropriate dataset of potholes available or not? Anyone who also worked on it?
How do you interpret the bag-of-words scheme and its function in human action / activity recognition?
If we implement SIFT algorithm on FPGA then what kind of optimization is possible in it?
Can images be segmented using a hybrid technique carrying texture and stereo information?
I want to evaluate different texture analysis techniques. For this I need image data set regarding textures.....