ArticlePublisher preview available

A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Due to technological development, the mass production of video and its storage on the Internet has increased. This made a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the contents. Text and video queries are given as the input for testing the trained model. The features from the text query and the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The generated pool features from the text and video queries are compared with the features that are stored in the database for analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of video retrieval.
Signal, Image and Video Processing (2024) 18:1993–2006
https://doi.org/10.1007/s11760-023-02744-3
ORIGINAL PAPER
A multi-modal lecture video indexing and retrieval framework
with multi-scale residual attention network and multi-similarity
computation
A. Debnath1
·K. Sreenivasa Rao2
·Partha P. Das2
Received: 30 June 2023 / Revised: 2 August 2023 / Accepted: 10 August 2023 / Published online: 23 December 2023
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023
Abstract
Due to technological development, the mass production of video and its storage on the Internet has increased. This made
a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos
from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in
the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos
containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the
Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of
the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical
Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale
Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the
contents. Text and video queries are given as the input for testing the trained model. The features from the text query and
the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The
generated pool features from the text and video queries are compared with the features that are stored in the database for
analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features
are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the
performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of
video retrieval.
Keywords Video retrieval and indexing ·Adaptive anti-coronavirus optimization algorithm ·Optical character recognization ·
Multi-scale Residual Attention Network ·Multi-similarity indices
1 Introduction
In recent years, digital video has become a famous platform
for data retention, transmission, and reception because of
the fast growth in high-speed networks, transcribing, and
down-sizing technology [1]. These advantages made audio-
visual recording used in online learning platforms [2]. Many
schools, colleges, research, and educational institutes are
now recording their classes using electronic gadgets and
uploading these lectures online so that the students who
BA. Debnath
abhijitdebnath@iitkgp.ac.in
1Indian Institute of Technology, Kharagpur, India
2Computer Science and Engineering, Indian Institute of
Technology, Kharagpur, India
require these lectures can view or download them from any-
where at any time. Because of this, media files are mass
deposited on online websites [3]. So, it is impossible to obtain
the required data whenever a person wants to find a video
without searching for it in the archives [4]. Also, judging if
a video is relevant to their search option by only looking at
the topic of the video or the detailed data and description
provided for the searched topic is a complex process [5]. In
addition, they have to go through the full video, even if it has
only a few minutes of the requested information. The method
by which the user fetches the relevant data from the entire
video effectively is still an issue [6].
Almost all the methods to fetch a video uses the search
option for obtaining the relevant search data. It is impossi-
ble to obtain the required data from a large database without
using the search option [7]. Moreover, it is a tremendous task
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Therefore, the design of insulation systems in stator windings plays a critical role in determining the overall insulation level of electrical machinery. To ensure sufficient mechanical strength for motor insulation, the thickness of the main insulation must be increased [3,4]. However, to improve heat dissipation and reduce the size of the alternator, the main insulation thickness should be minimized, creating a design conflict [5]. ...
... Equation (3) can be converted to Equation (4). ...
Article
Full-text available
Voltage withstand tests on stator bars can cause destructive phenomena such as thermal breakdown and flashover discharge on the surface of the anti-corona layer. This study optimizes the anti-corona structure at a stator bar’s end to prevent such failures using a 120 MW water-cooled turbogenerator with a rated voltage of 15.75 kV. For a well-designed anti-corona system, the maximum potential gradient of the stator bar should be lower than the discharge intensity of air corona. In our design, the electric field intensity is maintained below 3.1 kV/cm, and the maximum surface loss in the anti-corona layer is limited to less than 0.6 W/cm². Additionally, the terminal voltage is kept lower than that of flashover voltage at rated conditions. Furthermore, the length of the anti-corona layer should be minimized. The optimization process involves determining the rotation angle of the stator bar, calculating the total length of the anti-corona layer, and analyzing the electric field and loss in the layer at different lengths. The results demonstrate that the optimized anti-corona design effectively reduces the risk of flashover and thermal failure, ensuring stable operation under rated conditions. This manuscript belongs to purely computational experiments. At present, the electrical machinery with 120 MW rated power grade is put into operation steadily. There is a growing requirement for anti-corona. In this manuscript, computing method is used to assist the anti-corona structure design. The electrical machinery insulation is improved by better anti-corona materials. Therefore, the service life of electrical machinery can be prolonged, which is significant in engineering.
... The system shows high accuracy in segmentation and indexing but is computationally intensive. Debnath et.al [44] introduced a deep learning approach for indexing and retrieving relevant lecture videos from the internet. Utilizing the Adaptive Anti-Corona Virus Optimization Algorithm (AACOA) and a Multi-scale Residual Attention Network (MRAN), the system selects optimal keyframes and extracts textual and image features for efficient video segmentation. ...
Article
Full-text available
Video analysis has attracted the attention of many researchers because of the growing need for multimedia information retrieval for computer vision applications. Content based retrieval and object-based video retrieval are challenging because of the poor feature representation of the objects and the low number of multiclass video datasets available. Even though there have been significant advances in generic object detection & information retrieval tasks, performance of object based indexing & retrieval falls due to the low number of representative video dataset and lack of information on the frequency of objects. In order to facilitate the progress of multimedia information retrieval in the field of computer vision related application research, we present the multiclass object detection datasets. This dataset consists of 732 videos from YouTube with 5 object classes and each video is about 2 minutes long and total size of the dataset being approximately 8.5 GB. In this article, we will provide the baseline evaluation results of two of the latest object detection algorithms used to evaluate the newly created multidimensional object detection dataset. The paper presents the performance metrics of object detection methods at various time frames. Accuracy remains consistently high, ranging from 93.98% at the 5th frame to 84.17% at the 30th frame, suggesting a generally strong ability to make correct predictions. The dataset is publicly available at https://drive.google.com/file/d/1BHVsB38vbu9LUY03XlFWY2o_gxEiIKY7/view?usp=sharing.
Article
Full-text available
Affected by the Corona Virus Disease 2019 (COVID-19), online lecture videos have witnessed an explosive growth. In the face of massive videos, this paper proposes a method for extracting key frames of lecture videos based on spatio-temporal subtitles, which can efficiently and quickly obtain effective information. Firstly, the spatio-temporal slices of subtitle area of the video sequence are extracted and spliced along the time axis to construct the video spatio-temporal subtitle. Then, the video spatio-temporal subtitle is processed in binarization, and the projection method is used to construct the SSPA curve of the video spatio-temporal subtitle. Finally, a selection method for steady-state key frame is designed, that is, the key frame extraction is realized by combining curve edge detection and subtitle existence threshold, which ensures the robustness of the proposed method. The test results of 8 videos show that the average value of the comprehensive index F1-score of the key frame extracted by the algorithm can reach 0.97, the average precision is 0.97, and the average recall rate is 0.98. It can effectively extract the key frames in lecture videos, and compared with other algorithms, the average running time is reduced to 0.072 of the original, which is helpful to extract video information quickly and accurately.
Article
Full-text available
Huge quantities of audio and video material are available at universities and teaching institutions, but their use can be limited because of the lack of intelligent search tools. This paper describes a possible way to set up an indexing scheme that offers a smart search modality, that combines semantic analysis of video/audio transcripts with the exact time positioning of uttered words. The proposal leverages NLP methods for topic modeling with lexical analysis of lessons’ transcripts and builds a semantic hierarchical index into the corpus of lessons analyzed. Moreover, using abstracting summarization, the system can offer short summaries on the subject semantically implied by the search carried out.
Article
Full-text available
Deep learning is particularly well suited for that kind of circumstance, but ensuring confidentiality and safety has turned into a major problem in IoT management. A principal component analysis (PCA) is included in this specific research to identify and extract features more effectively. Additionally, the primary purpose of this study project is to compile an in-depth survey mostly on various IoT installations, and concerns regarding security and privacy with something like a rapid percentage of detection. The achievement of a higher detection frequency in IoT image classification is another main objective of this research work. Mostly on IoT datasets, the CNN methodology was trained and validated for effectiveness by using a wide range of approaches. Investigating the use of deep learning with IoT capturing images might be the initial phase. Furthermore, the value of such a deep learning approach is mostly assessed for improving the suitability of image identification with continuous testing reliability whenever it pertains to IoT image registration. An image identification approach that provides a range of acceptable criteria summarizes the study findings on such use of deep learning inside the IoT platform.
Article
Full-text available
This paper introduces a new swarm intelligence strategy, anti-coronavirus optimization (ACVO) algorithm. This algorithm is a multi-agent strategy, in which each agent is a person that tries to stay healthy and slow down the spread of COVID-19 by observing the containment protocols. The algorithm composed of three main steps: social distancing, quarantine, and isolation. In the social distancing phase, the algorithm attempts to maintain a safe physical distance between people and limit close contacts. In the quarantine phase, the algorithm quarantines the suspected people to prevent the spread of disease. Some people who have not followed the health protocols and infected by the virus should be taken care of to get a full recovery. In the isolation phase, the algorithm cared for the infected people to recover their health. The algorithm iteratively applies these operators on the population to find the fittest and healthiest person. The proposed algorithm is evaluated on standard multi-variable single-objective optimization problems and compared with several counterpart algorithms. The results show the superiority of ACVO on most test problems compared with its counterparts.
Article
Full-text available
The conservation of biodiversity is crucial as many plant species are critically under extinction. The traditional medicinal system, an alternative to synthetic drugs, promote healthy living and mainly depends on the wide repository of plants. A vision-based automatic medicinal plant identification system is proposed using different neural network techniques in computer vision and deep learning. The challenge lies in the unavailability of the medicinal herb dataset. The paper showcases a novel medicinal leaf dataset entitled DeepHerb dataset comprising of 2515 leaf images from 40 varied species of Indian herbs. The efficacy of the dataset is revealed by comparing pre-trained deep convolution neural network architectures such as VGG16, VGG19, InceptionV3 and Xception. The work concentrates on adopting the transfer learning technique on the pre-trained models to extract features and classify using Artificial Neural Network (ANN) and Support Vector Machine (SVM). The SVM hyperparameters are tuned further by Bayesian optimization to achieve a better performance model. The proposed DeepHerb model learned from Xception and ANN outperformed by 97.5% accuracy. A cross-platform mobile application entitled HerbSnap developed integrating the DeepHerb model identifies the herb image with a prediction time of 1 second per image and reveals the pertinent details of herbs from the database. This research will further focus on expanding the dataset to benefit stakeholders and thus, enriches society with the knowledge of herbs and their medicinal properties.
Article
Full-text available
In planetary science, it is an important basic work to recognize and classify the features of topography and geomorphology from the massive data of planetary remote sensing.Therefore, this paper proposes a lightweight model based on VGG-16, which can selectively extract some features of remote sensing images, remove redundant information, and recognize and classify remote sensing images. This model not only ensures the accuracy, but also reduces the parameters of the model.According to our experimental results, our model has a great improvement in remote sensing image classification, from the original accuracy of 85% to 98% now. At the same time, the model has a great improvement in convergence speed and classification performance.By inputting the remote sensing image data of ultra-low pixels (64 * 64) into our model, we prove that our model still has a high accuracy rate of 95% for the remote sensing image with ultra-low pixels and less feature points.Therefore, the model has a good application prospect in remote sensing image fine classification, very low pixel, less image classification.
Article
Purpose The purpose of this research is to provide a framework in which new data quality dimensions are defined. The new dimensions provide new metrics for the assessment of lecture video indexing. As lecture video indexing involves various steps, the proposed framework containing new dimensions, introduces new integrated approach for evaluating an indexing method or algorithm from the beginning to the end. Design/methodology/approach The emphasis in this study is on the fifth step of design science research methodology (DSRM), known as evaluation. That is, the methods that are developed in the field of lecture video indexing as an artifact, should be evaluated from different aspects. In this research, nine dimensions of data quality including accuracy, value-added, relevancy, completeness, appropriate amount of data, concise, consistency, interpretability and accessibility have been redefined based on previous studies and nominal group technique (NGT). Findings The proposed dimensions are implemented as new metrics to evaluate a newly developed lecture video indexing algorithm, LVTIA and numerical values have been obtained based on the proposed definitions for each dimension. In addition, the new dimensions are compared with each other in terms of various aspects. The comparison shows that each dimension that is used for assessing lecture video indexing, is able to reflect a different weakness or strength of an indexing method or algorithm. Originality/value Despite development of different methods for indexing lecture videos, the issue of data quality and its various dimensions have not been studied. Since data with low quality can affect the process of scientific lecture video indexing, the issue of data quality in this process requires special attention.
Chapter
In this chapter, we put forward a new technique for lecture video segmentation and key frame extraction. In this chapter, the advantages of Histogram of Oriented Gradients (HOG) features and radiometric correlation with entropic measures are explored to detect the shot boundaries and the key frames of the lecture video sequences. In the initial stage of the algorithm, HOG feature is used to project all frames into an n-dimensional feature space. The similarities between the n-dimensional extracted HOG features for two consecutive frames are obtained using radiometric correlation measure. The radiometric correlation between the successive frames of the video is found to have a significant amount of uncertainty, due to variation in color, illumination, or object motion. We have used entropic measure to find the shot boundaries. The key frames are obtained after detection of the shot boundaries by analyzing the peaks and valleys of the radiometric correlation measures. The proposed scheme is tested on several lecture video sequences and compared against six existing state-of-the-art techniques by considering two evaluation measures: computational time and shot transitions.
Article
E-learning is a rapidly growing field, which is giving rise to a massive amount of digital learning objects. Sorting these objects properly so that they are correctly indexed in searches and recommendation systems is a challenge. In this paper, we present a semi-supervised method of clustering and classifying learning objects in video format to extract their most relevant topics, specifically from lesson transcripts. These videos come from the educational video platform of the Universitat Politència de València. The proposed method also uses open content from Wikipedia to help build the labelled dataset.
Article
Traveling Salesman Problem (TSP) has been seen in diverse applications, which is proven to be NP-complete in most cases. Even though there are multiple heuristic techniques, the problem is still a complex combinatorial optimization problem. The candidate solutions are chosen by considering only a set of high values of the objective function which may not lead to the best solutions. Hence, this paper develops a hybrid optimization algorithm, named Earthworm-based DHOA (EW-DHOA) to solve the TSP problem by finding an optimal solution. The proposed EW-DHOA is developed by integrating the two well-performing meta-heuristic algorithms, such as Deer Hunting Optimization Algorithm (DHOA) and Earthworm Optimization Algorithm (EWA). The EW-DHOA intends to optimize the constraint as the number of cities traveled by the salesman in terms of an optimal path. The main process for attaining this objective is to minimize the distance traveled by the salesman concerning the entire cities. The effectiveness of the proposed hybrid meta-heuristic algorithm is validated over the benchmark dataset. Finally, the experimental results show that the convergence of the proposed hybrid optimization will be better while solving TSP with less computational complexity, and improved significantly in attaining optimal results.