Ardhendu Behera

Ardhendu Behera
Edge Hill University · Computer Science

PhD in Computer Science

About

65
Publications
20,170
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,754
Citations
Additional affiliations
September 2014 - November 2015
Edge Hill University
Position
  • Professor (Associate)

Publications

Publications (65)
Article
Full-text available
This paper presents a novel approach for Fine-Grained Visual Classification (FGVC) by exploring Graph Neural Networks (GNNs) to facilitate high-order feature interactions, with a specific focus on constructing both inter- and intra-region graphs. Unlike previous FGVC techniques that often isolate global and local features, our method combines both...
Article
Full-text available
Background Optical coherence tomography angiography (OCTA) enables fast and non-invasive high-resolution imaging of retinal microvasculature and is suggested as a potential tool in the early detection of retinal microvascular changes in Alzheimer’s Disease (AD). We developed a standardised OCTA analysis framework and compared their extracted parame...
Chapter
Current activity recognition approaches have achieved a great success due to the advancement in deep learning and the availability of huge public benchmark datasets. These datasets focus on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. However, in real-world scenarios, e.g., functio...
Article
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) sin...
Preprint
Full-text available
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) sin...
Article
Full-text available
Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making. In this paper, we propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint seg...
Preprint
Full-text available
Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making. In this paper, we propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint seg...
Chapter
Full-text available
This paper proposes a novel method for the detection of the symmetrical axis of the cropped face required for the aesthetic outcome estimation from the facial images of patients after their cleft treatment. It firstly applies the Gaussian filter to smooth the images in order to compress noise on the subsequent tasks, then the bilateral semantic seg...
Preprint
Full-text available
This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight...
Preprint
Full-text available
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capt...
Preprint
Full-text available
This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end...
Article
Full-text available
The computer vision community has extensively researched the area of human motion analysis, which primarily focuses on pose estimation, activity recognition, pose or gesture recognition and so on. However for many applications, like monitoring of functional rehabilitation of patients with musculo skeletal or physical impairments, the requirement is...
Article
Full-text available
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among di...
Article
This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end...
Chapter
Head pose is a vital indicator of human attention and behavior. Therefore, automatic estimation of head pose from images is key to many applications. In this paper, we propose a novel approach for head pose estimation from a single RGB image. Many existing approaches often predict head poses by localizing facial landmarks and then solve 2D to 3D co...
Article
Full-text available
It is well-known that numerical weather prediction (NWP) models require considerable computer power to solve complex mathematical equations to obtain a forecast based on current weather conditions. In this article, we propose a novel lightweight data-driven weather forecasting model by exploring temporal modelling approaches of long short-term memo...
Preprint
Full-text available
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in l...
Preprint
Full-text available
There is significant progress in recognizing traditional human activities from videos focusing on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. Driver's activities are different since they are executed by the same subject with similar body parts movements, resulting in subtle change...
Preprint
Full-text available
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among di...
Article
Full-text available
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in l...
Article
Full-text available
Non-predictive or inaccurate weather forecasting can severely impact the community of users such as farmers. Numerical weather prediction models run in major weather forecasting centers with several supercomputers to solve simultaneous complex nonlinear mathematical equations. Such models provide the medium-range weather forecasts, i.e., every 6 h...
Chapter
Full-text available
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Article
Automatic recognition and prediction of in-vehicle human activities has a significant impact on the next generation of driver assistance and intelligent autonomous vehicles. In this article, we present a novel single image driver action recognition algorithm inspired by human perception that often focuses selectively on parts of the images to acqui...
Preprint
Full-text available
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Article
Full-text available
Autonomous vehicles (AVs) are undergoing rapid worldwide development. They will only become a success if they are accepted by their users. Therefore, there is a need for user acceptance for these vehicles. Previous studies on acceptance of AV have identified several predictors. Inspired by these studies, the authors’ investigation is aimed at socio...
Preprint
Full-text available
This article proposes a novel attention-based body pose encoding for human activity recognition. The approach is combined with RGB video data and 3D human pose information to give us a novel end-to-end trainable network. Most of the existing human activity recognition approaches based on 3D pose data often enrich the input data using additional han...
Conference Paper
Full-text available
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models usin...
Preprint
Full-text available
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models usin...
Conference Paper
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art daytime depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Article
Full-text available
Learning involves a substantial amount of cognitive, social and emotional states. Therefore, recognizing and understanding these states in the context of learning is key in designing informed interventions and addressing the needs of the individual student to provide personalized education. In this paper, we explore the automatic detection of learn...
Article
Full-text available
Background Medical robots are increasingly used for a variety of applications in healthcare. Robots have mainly been used to support surgical procedures, and for a variety of assistive uses in dementia and elderly care. To date, there has been limited debate about the potential opportunities and risks of robotics in other areas of palliative, suppo...
Article
Full-text available
In this paper, we propose a compressive sensing-based method to pan-sharpen the low-resolution multispectral (LRM) data, with the help of high-resolution panchromatic (HRP) data. In order to successfully implement the compressive sensing theory in pan-sharpening, two requirements should be satisfied: (i) forming a comprehensive dictionary in which...
Chapter
Numerical Weather Prediction (NWP) requires considerable computer power to solve complex mathematical equations to obtain a forecast based on current weather conditions. In this article, we propose a lightweight data-driven weather forecasting model by exploring state-of-the-art deep learning techniques based on Artificial Neural Network (ANN). Wea...
Chapter
Full-text available
Automatic recognition of in-vehicle activities has significant impact on the next generation intelligent vehicles. In this paper, we present a novel Multi-stream Long Short-Term Memory (M-LSTM) network for recognizing driver activities. We bring together ideas from recent works on LSTMs, transfer learning for object detection and body pose by explo...
Article
Full-text available
Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This...
Article
In this paper, we present a novel method to explore semantically meaningful visual information and identify the discriminative spatiotemporal relationships between them for real-time activity recognition. Our approach infers human activities using continuous egocentric (first-person-view) videos of object manipulations in an industrial setup. In or...
Conference Paper
This paper presents an approach for recognising activities using video from an egocentric (first-person view) setup. Our approach infers activity from the interactions of objects and hands. In contrast to previous approaches to activity recognition, we do not require to use an intermediate such as object detection, pose estimation, etc. Recently, i...
Article
Full-text available
Perception of scenes has typically been investigated by using static or simplified visual displays. How attention is used to perceive and evaluate dynamic, realistic scenes is more poorly understood, in part due to the problem of comparing eye fixations to moving stimuli across observers. When the task and stimulus is common across observers, consi...
Technical Report
Full-text available
The major goal of COGNITO was to develop enabling technologies for intelligent user assistance systems with focus on industrial manual tasks. This comprises technology for capturing user activity in relation to industrial workspaces through on-body sensors, algorithms for linking the captured information to underlying workflow patterns and adaptive...
Conference Paper
Full-text available
This paper presents a novel approach for real-time egocentric activity recognition in which component atomic events are characterised in terms of binary relationships between parts of the body and manipulated objects. The key contribution is to summarise, within a histogram, the relationships that hold over a fixed time interval. This histogram is...
Conference Paper
Full-text available
We present a method for real-time monitoring of workflows in a constrained environment. The monitoring system should not only be able to recognise the current step but also provide instructions about the possible next steps in an ongoing workflow. In this paper, we address this issue by using a robust approach (HMM-pLSA) which relies on a Hidden Ma...
Conference Paper
Full-text available
Live workflow monitoring and the resulting user interaction in industrial settings faces a number of challenges. A formal workflow may be unknown or implicit, data may be sparse and certain isolated actions may be undetectable given current visual feature extraction technology. This paper attempts to address these problems by inducing a structural...
Article
Full-text available
Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382-390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and...
Article
The presence of a visuo-motor buffer of around half a second to a second has been proposed to account for performance in real-world tasks such as motor racing (Land & Tatler, 2001), hitting the ball in cricket (Land & MacLeod, 2000), making tea (Land, Mennie & Rusted, 1999) and sandwiches (Hayhoe, 2000). In addition, the magnitude of this buffer ap...
Article
Full-text available
This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatical...
Conference Paper
Full-text available
The paper describes a method by which one could use the documents captured from low-resolution handheld devices to retrieve the originals of those documents from a document store. The method considers conjunctively two complementary feature sets. First, the geometrical distribution of the color in the document's 2D image plane is preferred. Secondl...
Conference Paper
Full-text available
� Abstract—This paper proposes a method, combining color and layout features, for identifying documents captured from low- resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchica...
Conference Paper
Full-text available
This paper proposes a multi-signature document identification method that works robustly with low-resolution documents captured from handheld devices. The proposed method is based on the extraction of a visual signature containing both (a) the color content distribution in the image plane of the document, i.e. the color signature, and (b) the shall...
Conference Paper
Full-text available
In this paper, we present (a) a method for identifying documents captured from low-resolution devices such as web-cams, digital cameras or mobile phones and (b) a technique for extracting their textual content without performing OCR. The first method associates a hierarchically structured visual signature to the low-resolution document image and fu...
Conference Paper
Full-text available
In the context of a multimodal application, the article proposes an image-based method for bridging the gap between document excerpts and video extracts. The approach, called document image alignment, takes advantage of the observable events related to documents that are visible during meetings. In particular, the article presents a new method for...
Conference Paper
Full-text available
Static documents play a central role in multimodal applications such as meeting recording and browsing. They provide a variety of structures, in particular thematic, for segmenting meetings, structures that are often hard to extract from audio and video. In this article, we present four steps for creating a strong link between static documents and...
Article
Full-text available
The recording, multimodal analysis and archiving of meetings introduce new challenges for research in multimedia information management. Meetings involve multiple media that can be aligned together. This requires a global annotation framework. In particular, meetings often deal with documents, either projected or discussed, which can be aligned wit...
Article
Full-text available
This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represen...
Article
Cette thèse porte sur le développement d’un système complet pour l’indexation automatique, centrée sur le document (DocMIR), de données multimédias issues d’environnements multimodaux tels que réunions, conférences, etc. Tant des méthodes de traitement d’images que de segmentation vidéo et d’analyse de document sont utilisées pour mettre en relatio...

Network

Cited By