Ardhendu BeheraEdge Hill University · Computer Science
Ardhendu Behera
PhD in Computer Science
About
65
Publications
20,170
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,754
Citations
Introduction
Additional affiliations
September 2014 - November 2015
Publications
Publications (65)
This paper presents a novel approach for Fine-Grained Visual Classification (FGVC) by exploring Graph Neural Networks (GNNs) to facilitate high-order feature interactions, with a specific focus on constructing both inter- and intra-region graphs. Unlike previous FGVC techniques that often isolate global and local features, our method combines both...
Background
Optical coherence tomography angiography (OCTA) enables fast and non-invasive high-resolution imaging of retinal microvasculature and is suggested as a potential tool in the early detection of retinal microvascular changes in Alzheimer’s Disease (AD). We developed a standardised OCTA analysis framework and compared their extracted parame...
Current activity recognition approaches have achieved a great success due to the advancement in deep learning and the availability of huge public benchmark datasets. These datasets focus on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. However, in real-world scenarios, e.g., functio...
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) sin...
Over the past few years, a significant progress has been made in deep convolutional neural networks (CNNs)-based image recognition. This is mainly due to the strong ability of such networks in mining discriminative object pose and parts information from texture and shape. This is often inappropriate for fine-grained visual classification (FGVC) sin...
Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making. In this paper, we propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint seg...
Automated detection of retinal structures, such as retinal vessels (RV), the foveal avascular zone (FAZ), and retinal vascular junctions (RVJ), are of great importance for understanding diseases of the eye and clinical decision-making. In this paper, we propose a novel Voting-based Adaptive Feature Fusion multi-task network (VAFF-Net) for joint seg...
This paper proposes a novel method for the detection of the symmetrical axis of the cropped face required for the aesthetic outcome estimation from the facial images of patients after their cleft treatment. It firstly applies the Gaussian filter to smooth the images in order to compress noise on the subsequent tasks, then the bilateral semantic seg...
This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight...
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capt...
This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end...
The computer vision community has extensively researched the area of human motion analysis, which primarily focuses on pose estimation, activity recognition, pose or gesture recognition and so on. However for many applications, like monitoring of functional rehabilitation of patients with musculo skeletal or physical impairments, the requirement is...
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among di...
This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end...
Head pose is a vital indicator of human attention and behavior. Therefore, automatic estimation of head pose from images is key to many applications. In this paper, we propose a novel approach for head pose estimation from a single RGB image. Many existing approaches often predict head poses by localizing facial landmarks and then solve 2D to 3D co...
It is well-known that numerical weather prediction (NWP) models require considerable computer power to solve complex mathematical equations to obtain a forecast based on current weather conditions. In this article, we propose a novel lightweight data-driven weather forecasting model by exploring temporal modelling approaches of long short-term memo...
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in l...
There is significant progress in recognizing traditional human activities from videos focusing on highly distinctive actions involving discriminative body movements, body-object and/or human-human interactions. Driver's activities are different since they are executed by the same subject with similar body parts movements, resulting in subtle change...
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among di...
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in l...
Non-predictive or inaccurate weather forecasting can severely impact the community of users such as farmers. Numerical weather prediction models run in major weather forecasting centers with several supercomputers to solve simultaneous complex nonlinear mathematical equations. Such models provide the medium-range weather forecasts, i.e., every 6 h...
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Automatic recognition and prediction of in-vehicle human activities has a significant impact on the next generation of driver assistance and intelligent autonomous vehicles. In this article, we present a novel single image driver action recognition algorithm inspired by human perception that often focuses selectively on parts of the images to acqui...
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art day-time depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Autonomous vehicles (AVs) are undergoing rapid worldwide development. They will only become a success if they are accepted by their users. Therefore, there is a need for user acceptance for these vehicles. Previous studies on acceptance of AV have identified several predictors. Inspired by these studies, the authors’ investigation is aimed at socio...
This article proposes a novel attention-based body pose encoding for human activity recognition. The approach is combined with RGB video data and 3D human pose information to give us a novel end-to-end trainable network. Most of the existing human activity recognition approaches based on 3D pose data often enrich the input data using additional han...
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models usin...
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models usin...
In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature. The state-of-the-art daytime depth estimation methods fail miserably when tested with night-time images due to a large domain shift between...
Learning involves a substantial amount of cognitive, social and emotional states. Therefore, recognizing and understanding these states in the context of learning is key in designing informed interventions and addressing the needs of the individual student to provide personalized education. In this paper, we explore the automatic detection of learn...
Background
Medical robots are increasingly used for a variety of applications in healthcare. Robots have mainly been used to support surgical procedures, and for a variety of assistive uses in dementia and elderly care. To date, there has been limited debate about the potential opportunities and risks of robotics in other areas of palliative, suppo...
In this paper, we propose a compressive sensing-based method to pan-sharpen the low-resolution multispectral (LRM) data, with the help of high-resolution panchromatic (HRP) data. In order to successfully implement the compressive sensing theory in pan-sharpening, two requirements should be satisfied: (i) forming a comprehensive dictionary in which...
Numerical Weather Prediction (NWP) requires considerable computer power to solve complex mathematical equations to obtain a forecast based on current weather conditions. In this article, we propose a lightweight data-driven weather forecasting model by exploring state-of-the-art deep learning techniques based on Artificial Neural Network (ANN). Wea...
Automatic recognition of in-vehicle activities has significant impact on the next generation intelligent vehicles. In this paper, we present a novel Multi-stream Long Short-Term Memory (M-LSTM) network for recognizing driver activities. We bring together ideas from recent works on LSTMs, transfer learning for object detection and body pose by explo...
Today, the workflows that are involved in industrial assembly and production activities are becoming increasingly complex. To efficiently and safely perform these workflows is demanding on the workers, in particular when it comes to infrequent or repetitive tasks. This burden on the workers can be eased by introducing smart assistance systems. This...
In this paper, we present a novel method to explore semantically meaningful visual information and identify the discriminative spatiotemporal relationships between them for real-time activity recognition. Our approach infers human activities using continuous egocentric (first-person-view) videos of object manipulations in an industrial setup. In or...
This paper presents an approach for recognising activities using video from an egocentric (first-person view) setup. Our approach infers activity from the interactions of objects and hands. In contrast to previous approaches to activity recognition, we do not require to use an intermediate such as object detection, pose estimation, etc. Recently, i...
Perception of scenes has typically been investigated by using static or simplified visual displays. How attention is used to perceive and evaluate dynamic, realistic scenes is more poorly understood, in part due to the problem of comparing eye fixations to moving stimuli across observers. When the task and stimulus is common across observers, consi...
The major goal of COGNITO was to develop enabling technologies for intelligent user assistance systems with focus on industrial manual tasks. This comprises technology for capturing user activity in relation to industrial workspaces through on-body sensors, algorithms for linking the captured information to underlying workflow patterns and adaptive...
This paper presents a novel approach for real-time egocentric activity recognition in which component atomic events are characterised in terms of binary relationships between parts of the body and manipulated objects. The key contribution is to summarise, within a histogram, the relationships that hold over a fixed time interval. This histogram is...
We present a method for real-time monitoring of workflows in a constrained environment. The monitoring system should not only be able to recognise the current step but also provide instructions about the possible next steps in an ongoing workflow. In this paper, we address this issue by using a robust approach (HMM-pLSA) which relies on a Hidden Ma...
Live workflow monitoring and the resulting user interaction in industrial settings faces a number of challenges. A formal workflow may be unknown or implicit, data may be sparse and certain isolated actions may be undetectable given current visual feature extraction technology. This paper attempts to address these problems by inducing a structural...
Low-level stimulus salience and task relevance together determine the human fixation priority assigned to scene locations (Fecteau and Munoz in Trends Cogn Sci 10(8):382-390, 2006). However, surprisingly little is known about the contribution of task relevance to eye movements during real-world visual search where stimuli are in constant motion and...
The presence of a visuo-motor buffer of around half a second to a second has been proposed to account for performance in real-world tasks such as motor racing (Land & Tatler, 2001), hitting the ball in cricket (Land & MacLeod, 2000), making tea (Land, Mennie & Rusted, 1999) and sandwiches (Hayhoe, 2000). In addition, the magnitude of this buffer ap...
This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the
documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically
apply the above-mentioned procedures to a lecture and automatical...
The paper describes a method by which one could use the documents captured from low-resolution handheld devices to retrieve the originals of those documents from a document store. The method considers conjunctively two complementary feature sets. First, the geometrical distribution of the color in the document's 2D image plane is preferred. Secondl...
� Abstract—This paper proposes a method, combining color and layout features, for identifying documents captured from low- resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchica...
This paper proposes a multi-signature document identification method that works robustly with low-resolution documents captured from handheld devices. The proposed method is based on the extraction of a visual signature containing both (a) the color content distribution in the image plane of the document, i.e. the color signature, and (b) the shall...
In this paper, we present (a) a method for identifying documents captured from low-resolution devices such as web-cams, digital cameras or mobile phones and (b) a technique for extracting their textual content without performing OCR. The first method associates a hierarchically structured visual signature to the low-resolution document image and fu...
In the context of a multimodal application, the article proposes an image-based method for bridging the gap between document excerpts and video extracts. The approach, called document image alignment, takes advantage of the observable events related to documents that are visible during meetings. In particular, the article presents a new method for...
Static documents play a central role in multimodal applications such as meeting recording and browsing. They provide a variety
of structures, in particular thematic, for segmenting meetings, structures that are often hard to extract from audio and video.
In this article, we present four steps for creating a strong link between static documents and...
The recording, multimodal analysis and archiving of meetings introduce new challenges for research in multimedia information management. Meetings involve multiple media that can be aligned together. This requires a global annotation framework. In particular, meetings often deal with documents, either projected or discussed, which can be aligned wit...
This paper proposes a method, combining color and layout features, for identifying documents captured from low-resolution handheld devices. On one hand, the document image color density surface is estimated and represented with an equivalent ellipse and on the other hand, the document shallow layout structure is computed and hierarchically represen...
Cette thèse porte sur le développement d’un système complet pour l’indexation automatique, centrée sur le document (DocMIR), de données multimédias issues d’environnements multimodaux tels que réunions, conférences, etc. Tant des méthodes de traitement d’images que de segmentation vidéo et d’analyse de document sont utilisées pour mettre en relatio...