A preview of this full-text is provided by Springer Nature.
Content available from Signal Image and Video Processing
This content is subject to copyright. Terms and conditions apply.
Signal, Image and Video Processing (2024) 18:1993–2006
https://doi.org/10.1007/s11760-023-02744-3
ORIGINAL PAPER
A multi-modal lecture video indexing and retrieval framework
with multi-scale residual attention network and multi-similarity
computation
A. Debnath1
·K. Sreenivasa Rao2
·Partha P. Das2
Received: 30 June 2023 / Revised: 2 August 2023 / Accepted: 10 August 2023 / Published online: 23 December 2023
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023
Abstract
Due to technological development, the mass production of video and its storage on the Internet has increased. This made
a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos
from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in
the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos
containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the
Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of
the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical
Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale
Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the
contents. Text and video queries are given as the input for testing the trained model. The features from the text query and
the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The
generated pool features from the text and video queries are compared with the features that are stored in the database for
analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features
are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the
performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of
video retrieval.
Keywords Video retrieval and indexing ·Adaptive anti-coronavirus optimization algorithm ·Optical character recognization ·
Multi-scale Residual Attention Network ·Multi-similarity indices
1 Introduction
In recent years, digital video has become a famous platform
for data retention, transmission, and reception because of
the fast growth in high-speed networks, transcribing, and
down-sizing technology [1]. These advantages made audio-
visual recording used in online learning platforms [2]. Many
schools, colleges, research, and educational institutes are
now recording their classes using electronic gadgets and
uploading these lectures online so that the students who
BA. Debnath
abhijitdebnath@iitkgp.ac.in
1Indian Institute of Technology, Kharagpur, India
2Computer Science and Engineering, Indian Institute of
Technology, Kharagpur, India
require these lectures can view or download them from any-
where at any time. Because of this, media files are mass
deposited on online websites [3]. So, it is impossible to obtain
the required data whenever a person wants to find a video
without searching for it in the archives [4]. Also, judging if
a video is relevant to their search option by only looking at
the topic of the video or the detailed data and description
provided for the searched topic is a complex process [5]. In
addition, they have to go through the full video, even if it has
only a few minutes of the requested information. The method
by which the user fetches the relevant data from the entire
video effectively is still an issue [6].
Almost all the methods to fetch a video uses the search
option for obtaining the relevant search data. It is impossi-
ble to obtain the required data from a large database without
using the search option [7]. Moreover, it is a tremendous task
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.