ArticlePublisher preview available

How can users’ comments posted on social media videos be a source of effective tags?

Authors:
If you want to read the PDF, try requesting it from the authors.

Abstract and Figures

This paper proposed a new approach for the extraction of tags from users’ comments made about videos. In fact, videos on the social media, like Facebook and YouTube, are usually accompanied by comments where users may give opinions about things evoked in the video. The main challenge is how to extract relevant tags from them. To the best of the authors’ knowledge, this is the first research work to present an approach to extract tags from comments posted about videos on the social media. We do not pretend that comments can be a perfect solution for tagging videos since we rather tried to investigate the reliability of comments to tag videos and we studied how they can serve as a source of tags. The proposed approach is based on filtering the comments to retain only the words that could be possible tags. We relied on the self-organizing map clustering considering that tags of a given video are semantically and contextually close. We tested our approach on the Google YouTube 8M dataset, and the achieved results show that we can rely on comments to extract tags. They could be also used to enrich and refine the existing uploaders’ tags as a second area of application. This can mitigate the bias effect of the uploader’s tags which are generally subjective.
This content is subject to copyright. Terms and conditions apply.
International Journal of Multimedia Information Retrieval (2022) 11:431–443
https://doi.org/10.1007/s13735-022-00238-5
REGULAR PAPER
How can users’ comments posted on social media videos be a source of
effective tags?
Mehdi Ellouze1
Received: 16 December 2021 / Revised: 15 April 2022 / Accepted: 22 April 2022 / Published online: 23 May 2022
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
This paper proposed a new approach for the extraction of tags from users’ comments made about videos. In fact, videos on the
social media, like Facebook and YouTube, are usually accompanied by comments where users may give opinions about things
evoked in the video. The main challenge is how to extract relevant tags from them. To the best of the authors’ knowledge, this
is the first research work to present an approach to extract tags from comments posted about videos on the social media. We
do not pretend that comments can be a perfect solution for tagging videos since we rather tried to investigate the reliability of
comments to tag videos and we studied how they can serve as a source of tags. The proposed approach is based on filtering the
comments to retain only the words that could be possible tags. We relied on the self-organizing map clustering considering
that tags of a given video are semantically and contextually close. We tested our approach on the Google YouTube 8M dataset,
and the achieved results show that we can rely on comments to extract tags. They could be also used to enrich and refine the
existing uploaders’ tags as a second area of application. This can mitigate the bias effect of the uploader’s tags which are
generally subjective.
Keywords Video tagging ·Social media comments ·Self-organizing map ·YouTube
Abbreviations
SOM Self-organizing map
GA Genetic algorithms
DTM Document term matrix
FS Frequency score
TF Term frequency
CF Comment frequency
TRECVID Text retrieval conference video retrieval eval-
uation
NGD Normalized Google distance
1 Introduction
The unprecedented spread of videos contents generated the
need of tagging this content to make it easily browsable.
Every second, thousands of videos are shared on the inter-
BMehdi Ellouze
mehdi.ellouze@ieee.org
1Department of Computer Engineering, FSEG Sfax, Sfax
University, Airport Road Km 4, 3018 Sfax, Tunisia
net and social media by uploaders. Actually, we are facing
a new important issue related to how to find out about the
context and the story of the shared video. Some social media
repositories like YouTube or Flickers or Instagram offer the
uploader the possibility to tag the content when uploaded.
However, in some others like Facebook, no tags are added
to the video content. The only thing that we can do is to
add a title to the video to help understand the original con-
tent. In the research community, important efforts are being
made to create tools to allow exploring an important number
of videos. Most research efforts focused on automatically
recognizing semantic concepts in the video through com-
puter vision and machine learning techniques. The semantic
concepts can be entities, objects, events, and places. The
TRECVID evaluation campaign supported these efforts for
many years. TRECVID has provided researchers with a huge
dataset made of annotated videos, by the means of which they
train their visual concept detectors (Fig. 1).
However, the automatic concept detection systems suffer
from some limitations. First, the number of concepts detec-
tors is limited. Besides, the concepts contained in the original
video should be known to be able to use the suitable detec-
tors. Finally, the detectors reliability should be good enough
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
As the amount of generated information grows, reading and summarizing texts of large collections turns into a challenging task. Many documents do not come with descriptive terms, thus requiring humans to generate keywords on-the-fly. The need to automate this kind of task demands the development of keyword extraction systems with the ability to automatically identify keywords within the text. One approach is to resort to machine-learning algorithms. These, however, depend on large annotated text corpora, which are not always available. An alternative solution is to consider an unsupervised approach. In this article, we describe YAKE!, a light-weight unsupervised automatic keyword extraction method which rests on statistical text features extracted from single documents to select the most relevant keywords of a text. Our system does not need to be trained on a particular set of documents, nor does it depend on dictionaries, external corpora, text size, language, or domain. To demonstrate the merits and significance of YAKE!, we compare it against ten state-of-the-art unsupervised approaches and one supervised method. Experimental results carried out on top of twenty datasets show that YAKE! significantly outperforms other unsupervised methods on texts of different sizes, languages, and domains.
Conference Paper
Full-text available
We present a novel video browsing and retrieval system for edited videos, based on scene detection and automatic tagging. In the proposed system, database videos are automatically decomposed into meaningful and storytelling parts (i.e. scenes) and tagged in an automatic way by leveraging their transcript. We rely on computer vision and machine learning techniques to learn the optimal scene boundaries: a Triplet Deep Neural Network is trained to distinguish video sequences belonging to the same scene and sequences from different scenes, by exploiting multimodal features from images, audio and captions. The system also features a user interface build as a set of extensions to the eXo Platform Enterprise Content Management System (ECMS) (https://www.exoplatform.com/). This set of extensions enable the interactive visualization of a video, its automatic and semi-automatic annotation, as well as a keyword-based search inside the video collection. The platform also allows a natural integration with third-party add-ons, so that automatic annotations can be exploited outside the proposed platform.
Article
Full-text available
Efficient access to large scale video assets, may it be our life memories in our hard drive or a broadcaster archive which the company is eager to sell, requires content to be conveniently annotated. Manually annotating video content is, however, an intellectually expensive and time-consuming process. In this paper we argue that crowdsourcing, an approach that relies on a remote task force to perform activities that are costly or time-consuming using traditional methods, is a suitable alternative and we describe a solution based on gamification mechanisms for collaboratively collecting timed metadata. Tags introduced by registered players are validated based on a collaborative scoring mechanism that excludes erratic annotations. Voting mechanisms, enabling users to approve or refuse existing tags, provide an extra guarantee on the quality of the annotations. The sense of community is also created as users may watch the crowd’s favourite moments of the video provided by a summarization functionality. The system was tested with a pool of volunteers in order to evaluate the quality of the contributions. The results suggest that crowdsourced annotation can describe objects, persons, places, etc. correctly, as well as be very accurate in time.
Article
Full-text available
In modern industry, the development of complex products involves engineering changes that frequently require redesigning or altering the products or their components. In an engineering change process, engineering change requests (ECRs) are documents (forms) with parts written in natural language describing a suggested enhancement or a problem with a product or a component. ECRs initiate the change process and promote discussions within an organization to help to determine the impact of a change and the best possible solution. Although ECRs can contain important details, that is, recurring problems or examples of good practice repeated across a number of projects, they are often stored but not consulted, missing important opportunities to learn from previous projects. This paper explores the use of Self-Organizing Map (SOM) to the problem of unsupervised clustering of ECR texts. A case study is presented in which ECRs collected during the engineering change process of a railways industry are analyzed. The results show that SOM text clustering has a good potential to improve overall knowledge reuse and exploitation.
Article
Full-text available
In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods. We identify key differences between this domain and the work performed on other domains, such as news, which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets. These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these issues, which leads to solid improvements on this dataset for this task.
Article
Full-text available
The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to per-form content classification and c lustering and enable more effective semantic indexing and retrieval of visual data. However there is need to overcome the relatively low quality of these metadata: user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, excessively personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images, considering also the temporal patterns of their usage, and discuss extensions to tag suggestion and localization in web video sequences.
Conference Paper
Restaurant search and recommendation system is a very popular service in many countries. In those systems, most of the restaurant information such as restaurant name, address, phone number, and introduction are collected manually. In this paper, we propose a restaurant information extraction method which can automatically extract restaurant information from online reviews of restaurants in blogs. In addition, by calculating TFIDFs of words in blog posts, the hot keywords can be discovered and ranked. For restaurant search, users are allowed to search by keywords, areas, and/or extracted hot keywords. The experimental results show that the proposed method can achieve over 90 % average accuracy of hot keyword extraction and about 95 % mean average precision for restaurant search. In user study, the fact that the proposed system is more useful than Google search in restaurant search is presented.
Article
Many recent advancements in Computer Vision are attributed to large datasets. Open-source software packages for Machine Learning and inexpensive commodity hardware have reduced the barrier of entry for exploring novel approaches at scale. It is possible to train models over millions of examples within a few days. Although large-scale datasets exist for image understanding, such as ImageNet, there are no comparable size video classification datasets. In this paper, we introduce YouTube-8M, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities. To get the videos and their labels, we used a YouTube video annotation system, which labels videos with their main topics. While the labels are machine-generated, they have high-precision and are derived from a variety of human-based signals including metadata and query click signals. We filtered the video labels (Knowledge Graph entities) using both automated and manual curation strategies, including asking human raters if the labels are visually recognizable. Then, we decoded each video at one-frame-per-second, and used a Deep CNN pre-trained on ImageNet to extract the hidden representation immediately prior to the classification layer. Finally, we compressed the frame features and make both the features and video-level labels available for download. We trained various (modest) classification models on the dataset, evaluated them using popular evaluation metrics, and report them as baselines. Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow. We plan to release code for training a TensorFlow model and for computing metrics.
Article
We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training. For this challenging setting, the prevailing solutions in the literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from existing work, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors. We introduce a simple algorithm that propagates tags from a video's nearest neighbors, similar in spirit to the ones used for image retrieval, but redesign it for video event detection by including video source set refinement and varying the video tag assignment. We call our approach TagBook and study its construction, descriptiveness and detection performance on the TRECVID 2013 and 2014 multimedia event detection datasets and the Columbia Consumer Video dataset. Despite its simple nature, the proposed TagBook video representation is remarkably effective for few-example and zero-example event detection, even outperforming very recent state-of-the-art alternatives building on supervised representations.