Conference Paper

Towards AI-based Semantic Multimedia Indexing and Retrieval for Social Media on Smartphones

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The Multimedia Feature Graph (MMFG) [21] is a weighted and directed graph [22], whose nodes and edges represent features of multimedia assets. The Generic Multimedia Annotation Framework (GMAF) [23] is an MMIR system for multimedia assets that uses MMFGs as an index and access method. GMAF provides a flexible plugin architecture that allows the processing of various multimedia asset types to extract features that are stored as metadata in the MMFG. ...
... The feature metric M F is based on the vocabulary, the feature relationship metric M FR is based on the possible relationships, and the feature relationship type metric M RT is based on the actual relationship types. Semantic Graph Codes (SGCs) [23] are an extension of GCs that incorporate additional semantic structures using annotation with semantic systems such as RDF, RDFS, ontologies, or Knowledge Organization Systems [28][29][30]. This additional semantic information can help to bridge the semantic gap between the technical representations of multimedia feature graphs and their human-understandable meanings. ...
Article
Full-text available
The volume of multimedia assets in collections is growing exponentially, and the retrieval of information is becoming more complex. The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. Feature graphs contain semantic information on multimedia assets. Machine learning can produce detailed semantic information on multimedia assets, reflected in a high volume of nodes and edges in the feature graphs. While increasing the effectiveness of the information retrieval results, the high level of detail and also the growing collections increase the processing time. Addressing this problem, Multimedia Feature Graphs (MMFGs) and Graph Codes (GCs) have been proven to be fast and effective structures for information retrieval. However, the huge volume of data requires more processing time. As Graph Code algorithms were designed to be parallelizable, different paths of parallelization can be employed to prove or evaluate the scalability options of Graph Code processing. These include horizontal and vertical scaling with the use of Graphic Processing Units (GPUs), Multicore Central Processing Units (CPUs), and distributed computing. In this paper, we show how different parallelization strategies based on Graph Codes can be combined to provide a significant improvement in efficiency. Our modeling work shows excellent scalability with a theoretical speedup of 16,711 on a top-of-the-line Nvidia H100 GPU with 16,896 cores. Our experiments with a mediocre GPU show that a speedup of 225 can be achieved and give credence to the theoretical speedup. Thus, Graph Codes provide fast and effective multimedia indexing and retrieval, even in billion-scale use cases.
... It also identifies how our approach addresses open problems. An earlier instantiation of the ideas in this paper is given in [23]. ...
... A standard multimedia processing pipeline [54] exists for multimedia processing, to which our relevant contributions can be aligned as discussed previously in [23]. The general data processing pipeline can be mapped to this standard multimedia processing pipeline [54], providing a multimedia-specific representation of the eight typical processing steps of multimedia content (see Figure 1). ...
Article
Full-text available
To cope with the growing number of multimedia assets on smartphones and social media, an integrated approach for semantic indexing and retrieval is required. Here, we introduce a generic framework to fuse existing image and video analysis tools and algorithms into a unified semantic annotation, indexing and retrieval model resulting in a multimedia feature vector graph representing various levels of media content, media structures and media features. Utilizing artificial intelligence (AI) and machine learning (ML), these feature representations can provide accurate semantic indexing and retrieval. Here, we provide an overview of the generic multimedia analysis framework (GMAF) and the definition of a multimedia feature vector graph framework (MMFVGF). We also introduce AI4MMRA to detect differences, enhance semantics and refine weights in the feature vector graph. To address particular requirements on smartphones, we introduce an algorithm for fast indexing and retrieval of graph structures. Experiments to prove efficiency, effectiveness and quality of the algorithm are included. All in all, we describe a solution for highly flexible semantic indexing and retrieval that offers unique potential for applications such as social media or local applications on smartphones.
... In our previous work, we already introduced the Generic Multimedia Analysis Framework (GMAF) [5], [6], [7], [8] as an unifying framework, that is able to fuse various Multimedia features into a single data structure. The GMAF utilizes selected existing technologies as plugins to support various Multimedia feature detection algorithms for text (e.g. ...
Article
Full-text available
Multimedia Indexing and Retrieval is generally designed and implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results but also leads to more complex graph structures. However, graph-traversal-based algorithms for similarity are quite inefficient and computation intensive, especially for large data structures. To deliver fast and effective retrieval, an efficient similarity algorithm, particularly for large graphs, is mandatory. Hence, in this paper, we define a graph-projection into a 2D space (Graph Code) as well as the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph-traversals due to a simpler processing model and a high level of parallelisation. In consequence, we prove that the effectiveness of retrieval also increases substantially, as Graph Codes facilitate more levels of detail in feature fusion. Thus, Graph Codes provide a significant increase in efficiency and effectiveness (especially for Multimedia indexing and retrieval) and can be applied to images, videos, audio, and text information.
... In previous work, we already introduced the Generic Multimedia Analysis Framework (GMAF) [7][8][9][10] as a unifying framework that can integrate various multimedia features into a single data structure. The GMAF utilizes selected existing technologies as plugins to support various multimedia feature detection algorithms for text (e.g., social media posts, descriptions, tag lines) [11][12][13], images (especially object detection and spatial relationships including the use of machine learning) [11,[14][15][16], audio (transcribed to text) [15,17,18], and video including metadata [19] and detected features [18,20,21]. ...
Article
Full-text available
Multimedia feature graphs are employed to represent features of images, video, audio, or text. Various techniques exist to extract such features from multimedia objects. In this paper, we describe the extension of such a feature graph to represent the meaning of such multimedia features and introduce a formal context-free PS-grammar (Phrase Structure grammar) to automatically generate human-understandable natural language expressions based on such features. To achieve this, we define a semantic extension to syntactic multimedia feature graphs and introduce a set of production rules for phrases of natural language English expressions. This explainability, which is founded on a semantic model provides the opportunity to represent any multimedia feature in a human-readable and human-understandable form, which largely closes the gap between the technical representation of such features and their semantics. We show how this explainability can be formally defined and demonstrate the corresponding implementation based on our generic multimedia analysis framework. Furthermore, we show how this semantic extension can be employed to increase the effectiveness in precision and recall experiments.
Thesis
Full-text available
Multimedia Feature Graphs (MMFG) were developed for efficient Multimedia Retrieval (MMIR). In a further step, a method for computing similarity was presented. For this purpose, the graphs are converted into a proxy matrix called graph codes. Via mathematical calculation steps, a value to two Graph Codes can be calculated, which is higher the more similar the Graph Codes are. This has been shown to be a fast method, but it does not use parallel processing. The goal of this work is to investigate ways to parallelize the algorithms and how users can use parallel graph code algorithms. To model one or more parallel algorithms, decomposition techniques have been researched. These techniques have been used to decompose Wagenpfeil's sequential method into several parallelizable tasks. From these tasks, several parallel algorithms were modeled and considered. POSIX Threads and CUDA technologies were identified in the research. Modeling of the algorithms for the technologies was performed and discussed. Integration with the Generic Multimedia Annotation Framework (GMAF) application was modeled to use the algorithms. In addition, a console application (gcsim) was modeled to evaluate the algorithms. A selection of the algorithms was implemented prototypically. The gcsim application was also programmed for integration into GMAF. The evaluation showed that the computation of graph code similarity can be highly parallelized. In the experiments, the implementation of the algorithms with CUDA was able to achieve a speedup of 225. It could be shown that the algorithms scale with the number of processor cores in the used processor. Thus, an even stronger speedup can be achieved on more powerful hardware. As a result, users have access to fast algorithms that can be flexibly integrated into existing applications.
Conference Paper
Full-text available
In this paper, we present a hybrid and interdisciplinary approach for the calculation of argument strength, based on the coupling of two existing frameworks-the emerging Argument Recognition (eAR) framework of: (1) the emerging Named Entity Recognition Information Retrieval System (eNER-IRS) for textual medical articles, and (2) the Generic Multimedia Analysis Framework (GMAF) for semantic extraction of visual Multimedia features. We focus on combining textual and visual features to increase the capability of both frameworks and facilitate applications with a higher level of confidence in argumentation.
Article
Full-text available
The indexing and retrieval of multimedia content is generally implemented by employing feature graphs. These graphs typically contain a significant number of nodes and edges to reflect the level of detail in feature detection. A higher level of detail increases the effectiveness of the results, but also leads to more complex graph structures. However, graph traversal-based algorithms for similarity are quite inefficient and computationally expensive, especially for large data structures. To deliver fast and effective retrieval especially for large multimedia collections and multimedia big data, an efficient similarity algorithm for large graphs in particular is desirable. Hence, in this paper, we define a graph projection into a 2D space (Graph Code) and the corresponding algorithms for indexing and retrieval. We show that calculations in this space can be performed more efficiently than graph traversals due to the simpler processing model and the high level of parallelization. As a consequence, we demonstrate experimentally that the effectiveness of retrieval also increases substantially, as the Graph Code facilitates more levels of detail in feature fusion. These levels of detail also support an increased trust prediction, particularly for fused social media content. In our mathematical model, we define a metric triple for the Graph Code, which also enhances the ranked result representations. Thus, Graph Codes provide a significant increase in efficiency and effectiveness, especially for multimedia indexing and retrieval, and can be applied to images, videos, text and social media information.
Conference Paper
Full-text available
This paper introduces an approach to develop an up-todate information technology infrastructure that can support Advanced Visual User interfaces for Distributed Big Data Analysis in Virtual Labs to be used in eScience, Industrial Research, and Data Science Education. The paper introduces and motivates the current situation in this application area as a basis for a corresponding problem statement that is utilized to derived goals and objectives of the approach. Furthermore, the relevant state-of-the-art is revisited and remaining challenges are identi�ed. An exemplar set of use cases, corresponding user stereotypes as well as a conceptual design model to address these challenges is introduced. The scenario and user stereotypes have been an expert roundtable. A corresponding architectural system model is suggested as a conceptual reference architecture to support proof-of-concept implementations as well as to support interoperability in distributed infrastructures. Conclusions and an outlook on future work complete the paper.
Article
Full-text available
In today’s digital era, there are large volumes of long-duration videos resulting from movies, documentaries, sports and surveillance cameras floating over internet and video databases (YouTube). Since manual processing of these videos are difficult, time-consuming and expensive, an automatic technique of abstracting these long-duration videos are very much desirable. In this backdrop, this paper presents a novel and efficient approach of video shot boundary detection and keyframe extraction, which subsequently leads to a summarized and compact video. The proposed method detects video shot boundaries by extracting the SIFT-point distribution histogram (SIFT-PDH) from the frames as a combination of local and global features. In the subsequent step, using the distance of SIFT-PDH of consecutive frames and an adaptive threshold video shot boundaries are detected. Further, the keyframes representing the salient content of each segmented shot are extracted using entropy-based singular values measure. Thus, the summarized video is then generated by combining the extracted keyframes. The experimental results show that our method can efficiently detect shot boundaries under both abrupt and gradual transitions, and even under different levels of illumination, motion effects and camera operations (zoom in, zoom out and camera rotation). With the proposed method, the computational complexity is comparatively less and video summarization is very compact.
Conference Paper
Full-text available
The volume of worldwide digital content has increased nine-fold within the last five years, and this immense growth is predicted to continue in foreseeable future reaching 8ZB already by 2015. Traditionally, in order to cope with the growing demand for storage capacity, organizations proactively built and managed their private storage facilities. Recently, with the proliferation of public cloud infrastructure offerings, many organizations, instead, welcomed the alternative of outsourcing their storage needs to the providers of public cloud storage services. The comparative cost-efficiency of these two alternatives depends on a number of factors, among which are e.g. the prices of the public and private storage, the charging and the storage acquisition intervals, and the predictability of the demand for storage. In this paper, we study how the cost-efficiency of the private vs. public storage depends on the acquisition interval at which the organization re-assesses its storage needs and acquires additional private storage. The analysis in the paper suggests that the shorter the acquisition interval, the more likely it is that the private storage solution is less expensive as compared with the public cloud infrastructure. This phenomenon is also illustrated in the paper numerically using the storage needs encountered by a university back-up and archiving service as an example. Since the acquisition interval is determined by the organization's ability to foresee the growth of storage demand, by the provisioning schedules of storage equipment providers, and by internal practices of the organization, among other factors, the organization owning a private storage solution may want to control some of these factors in order to attain a shorter acquisition interval and thus make the private storage (more) cost-efficient.
Article
Full-text available
Creating compelling multimedia productions is a nontrivial task. This is as true for creating professional content as it is for nonprofessional editors. During the past 20 years, authoring networked content has been a part of the research agenda of the multimedia community. Unfortunately, authoring has been seen as an initial enterprise that occurs before ‘real’ content processing takes place. This limits the options open to authors and to viewers of rich multimedia content for creating and receiving focused, highly personal media presentations. This article reflects on the history of multimedia authoring. We focus on the particular task of supporting socially-aware multimedia, in which the relationships within particular social groups among authors and viewers can be exploited to create highly personal media experiences. We provide an overview of the requirements and characteristics of socially-aware multimedia authoring within the context of exploiting community content. We continue with a short historical perspective on authoring support for these types of situations. We then present an overview of a current system for supporting socially-aware multimedia authoring within the community content. We conclude with a discussion of the issues that we feel can provide a fruitful basis for future multimedia authoring support. We argue that providing support for socially-aware multimedia authoring can have a profound impact on the nature and architecture of the entire multimedia information processing pipeline.
Chapter
Full-text available
In order to retrieve and reuse non-textual media, media annotations must explain how a media object is composed of its parts and what the parts represent. Annotations need to link to background knowledge found in existing knowledge sources and to the creation and use of the media object. The representation and understanding of such facets of the media semantics is only possible through a formal language and a corresponding ontology. In this chapter, we analyze the requirements underlying the semantic representation of media objects, explain why the requirements are not fulfilled by most semantic multimedia ontologies and present COMM, a core ontology for multimedia, that has been built re-engineering the current de-facto standard for multimedia annotation, i.e. MPEG-7, and using DOLCE as its underlying foundational ontology to support conceptual clarity and soundness as well as extensibility towards new annotation requirements.
Article
Full-text available
The paper presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.
Article
Full-text available
Research on methods for detection and recognition of events and actions in videos is receiving an increasing attention from the scientific community, because of its relevance for many applications, from semantic video indexing to intelligent video surveillance systems and advanced human-computer interaction interfaces. Event detection and recognition requires to consider the temporal aspect of video, either at the low-level with appropriate features, or at a higher-level with models and classifiers than can represent time. In this paper we survey the field of event recognition, from interest point detectors and descriptors, to event modelling techniques and knowledge management technologies. We provide an overview of the methods, categorising them according to video production methods and video domains, and according to types of events and actions that are typical of these domains.
Article
Full-text available
Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap
Article
A universal fusion framework for handling multi- realm image fusion reduces the cost of manual selection in varied applications. Addressing the generality of multiple realms and the sensitivity of specific realm, we propose a novel univer- sal framework for multi-realm image fusion through learning realm-specific and realm-general feature representations. Shared principle network, adaptive realm feature extraction strategy and realm activation mechanism are designed for facilitating high generalization of across-realm and sensitivity of specific- realm simultaneously. In addition, we present realm-specific no- reference perceptual metric losses based on the edge details and contrast for optimizing the learning process, making the fused image exhibit more specific appearance. Moreover, we collect a new multi-realm image fusion dataset (MRIF), consisting of infrared and visual images, medical images and multispectral images, to facilitate our training and testing. Experimental results show that the fused image obtained by the proposed method achieves superior performance compared with the state-of-the- art methods on MRIF and the other three datasets including infrared and visual images, medical images and remote sensing images, respectively.
Book
Die Darstellung und Verarbeitung jeglicher Art von Wissen – ob sicher oder unsicher, vage, unvollständig oder sogar widersprüchlich – ist in der Künstlichen Intelligenz die zentrale Aufgabe intelligenter, wissensbasierter Systeme. Den Autoren ist es gelungen, die unterschiedlichen Methoden anschaulich zu präsentieren, so dass dieses Werk zum Selbststudium wie auch als Begleittext für entsprechende Vorlesungen geeignet ist. Die sechste Auflage wurde durchgesehen und überarbeitet und zeigt insbesondere Verbindungen zu aktuellen Entwicklungen auf. Außerdem wurden praxisnahe Selbsttestaufgaben mit online zur Verfügung gestellten ausführlichen Lösungen in den Text integriert, sie erleichtern das Lernen und machen das Buch zum idealen Begleiter für Studierende der Informatik und verwandter Gebiete. Der Inhalt Wissensbasierte Systeme im Überblick - Logikbasierte Wissensrepräsentation und Inferenz - Regelbasierte Systeme - Maschinelles Lernen - Fallbasiertes Schließen - Truth Maintenance-Systeme - Default-Logiken - Logisches Programmieren und Antwortmengen - Argumentation - Aktionen und Planen - Agenten - Quantitative Methoden I - Probabilistische Netzwerke - Quantitative Methoden II – Dempster-Shafer-Theorie, Fuzzy-Theorie und Possibilistik - Wahrscheinlichkeit und Information - Graphentheoretische Grundlagen - Anwendungsbeispiele aus Medizin, Genetik und Wirtschaft Das Autoren-Team Prof. Dr. Christoph Beierle ist Universitätsprofessor für Informatik/Wissensbasierte Systeme an der FernUniversität in Hagen. Prof. Dr. Gabriele Kern-Isberner ist Universitätsprofessorin für Informatik/Information Engineering an der Universität Dortmund.
Chapter
Local feature detectors and descriptors (hereinafter extractors) play a key role in the modern computer vision. Their scope is to extract, from any image, a set of discriminative patterns (hereinafter keypoints) present on some parts of background and/or foreground elements of the image itself. A prerequisite of a wide range of practical applications (e.g., vehicle tracking, person re-identification) is the design and development of algorithms able to detect, recognize and track the same keypoints within a video sequence. Smart cameras can acquire images and videos of an interesting scenario according to different intrinsic (e.g., focus, iris) and extrinsic (e.g., pan, tilt, zoom) parameters. These parameters can make the recognition of a same keypoint between consecutive images a hard task when some critical factors such as scale, rotation and translation are present. The aim of this chapter is to provide a comparative study of the most used and popular low-level local feature extractors: SIFT, SURF, ORB, PHOG, WGCH, Haralick and A-KAZE. At first, the chapter starts by providing an overview of the different extractors referenced in a concrete case study to show their potentiality and usage. Afterwards, a comparison of the extractors is performed by considering the Freiburg-Berkeley Motion Segmentation (FBMS-59) dataset, a well-known video data collection widely used by the computer vision community. Starting from a default setting of the local feature extractors, the aim of the comparison is to discuss their behavior and robustness in terms of invariance with respect to the most important critical factors. The chapter also reports comparative considerations about one of the basic steps based on the feature extractors: the matching process. Finally, the chapter points out key considerations about the use of the discussed extractors in real application domains.
Chapter
The chapter presents a brief introduction to the problem with the semantic gap in content-based image retrieval systems. It presents the complex process of image processing, leading from raw images, through subsequent stages to the semantic interpretation of the image. Next, the content of all chapters included in this book is shortly presented.
Chapter
Many algorithms for computer vision rely on locating interest points, or keypoints in each image, and calculating a feature description from the pixel region surrounding the interest point. This is in contrast to methods such as correlation, where a larger rectangular pattern is stepped over the image at pixel intervals and the correlation is measured at each location. The interest point is the, and often provides the scale, rotational, and illumination invariance attributes for the descriptor; the descriptor adds more detail and more invariance attributes. Groups of interest points and descriptors together describe the actual objects.
Chapter
KeywordsLocal Binary PatternInterest PointScale Invariant Feature TransformSparse CodeMaximally Stable Extremal RegionThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Conference Paper
Finding nearly semantic ‘visual units’ which visual analysis can operate on is a long-term hard work in computer vision community. Established powerful methodologies such as SIFT, BRISK often extract numerous redundant single keypoints with little information about semantic contents. We propose a novel method called Contour-Aware Regions detector (CAR) to find representative regions in images. Inspired by the recent research conclusion of general object proposal methods that contour is important in object localization, we first alleviate the problem of super pixle overlapping multi-object regions. And then perceptual regions are generated during the merging process using the data structure similar to MSER. Extensive experiments demonstrate: (1) superpixels generated by our method significantly outperform the state-of-out methods, as measured by boundary recall and under-segmentation error. (2) our method can find less and meaningful regions in 0.125 s per image, meanwhile achieve promising repeatabiliy.
Article
Providing accurate and scalable solutions to map low-level perceptual features to high-level semantics is essential for multimedia information organization and retrieval. In this paper, we propose a confidence-based dynamic ensemble (CDE) to overcome the shortcomings of the traditional static classifiers. In contrast to the traditional models, CDE can make dynamic adjustments to accommodate new semantics, to assist the discovery of useful low-level features, and to improve class-prediction accuracy. We depict two key components of CDE: a multi-level function that asserts class-prediction confidence, and the dynamic ensemble method based upon the confidence function. Through theoretical analysis and empirical study, we demonstrate that CDE is effective in annotating large-scale, real-world image datasets.
Article
A pluralistic approach, design computers from the user's point of view
Conference Paper
The visual database query language QBE (query by example) is a classical, declarative query language based on the relational domain calculus. However, due to insufficient support of vagueness QBE is not an appropriate query language for formulating similarity queries required in the context of multimedia databases. In this work we propose the query language WS-QBE which combines a schema to weight query terms as well as concepts from fuzzy logic and QBE into one language. WS-QBE enables a visual, declarative formulation of complex similarity queries. The semantics of WS-QBE is defined by a mapping of WS-QBE queries onto the similarity domain calculus SDC which is proposed here, too.
Article
Our project on digital video modeling aims to achieve efficient browsing and retrieval. Video is not merely a huge collection of still images, it is a complex temporal medium capturing high-level semantic ideas. For almost a decade researchers have worked on developing efficient techniques to model, index, and retrieve digital video information. To support intuitive video access, the video data must be properly modeled and indexed based on its characteristics and contents. One such annotation-based model is the stratification model, which focuses on segmenting the video's contextual information into multiple layers or strata. Each stratum describes the temporal occurrences of a simple concept such as the appearance of an anchor person in a news video. It thus focuses on segmenting the contextual information into chunks rather than dividing physically contiguous frames into shots, as is traditionally done
Article
MPEG-7, formally known as the Multimedia Content Description Interface, includes standardized tools (descriptors, description schemes, and language) enabling structural, detailed descriptions of audio-visual information at different granularity levels (region, image, video segment, collection) and in different areas (content description, management, organization, navigation, and user interaction). It aims to support and facilitate a wide range of applications, such as media portals, content broadcasting, and ubiquitous multimedia. We present a high-level overview of the MPEG-7 standard. We first discuss the scope, basic terminology, and potential applications. Next, we discuss the constituent components. Then, we compare the relationship with other standards to highlight its capabilities
Local Detection and Identification of Visual Data
  • A Śluzek
A.Śluzek, Local Detection and Identification of Visual Data. LAP LAMBERT Academic Publishing (29 Sept. 2013), 2013.
Seo For Social Media: It ranked first in the search engines
  • S Sami
S. Sami, Seo For Social Media: It ranked first in the search engines. Amazon Press, 2020, vol. 1.
The mirflickr retrieval evaluation
  • M Dataset
M. dataset, "The mirflickr retrieval evaluation", LIACS Medialab at Leiden University, http://press.liacs.nl/mirflickr, Tech. Rep., Jul. 2020.
Smartphones cause a photography boom
  • M Nudelman
M. Nudelman, "Smartphones cause a photography boom", Statista / Business Insider, http://www.businessinsider.com/12-trillion-photos-tobe-taken-in-2017-thanks-to-smartphones-chart-2017-8, Tech. Rep., Sep. 2020.
Deep Learning and Neural Networks
  • J Heaton
J. Heaton, Deep Learning and Neural Networks. Heaton Research Inc., 2015.
Icloud -the best place for photos, files and more
  • Apple
  • Com
Apple.com, "Icloud -the best place for photos, files and more", Apple.com, http://www.apple.com/icloud/, Tech. Rep., Jun. 2020.
Google vision ai -derive insights from images
  • Google
  • Com
Google.com, "Google vision ai -derive insights from images", Google.com, http://cloud.google.com/vision, Tech. Rep., Jul. 2020.
Description of exif file format
  • M M I Technology
M. M. I. of Technology, "Description of exif file format", MIT -Massachutsetts Institute of Technology, http://media.mit.edu/pia/Research/deepview/exif.html, Tech. Rep., Jul. 2020.
Ffmpeg documentation
  • Ffmpeg
  • Org
FFMpeg.org, "Ffmpeg documentation", FFMpeg.org, http://ffmpeg.org, Tech. Rep., Jul. 2020.
Work with metadata in adobe bridge
  • Adobe
  • Com
Adobe.com, "Work with metadata in adobe bridge", Adobe.com, http://helpx.adobe.com/bridge/using/metadata-adobe-bridge.html, Tech. Rep., Jul. 2020.
Material exchange format
  • E Recommendations
E. Recommendations, "Material exchange format", EBU Recommendations R121, http://mxf.irt.de/information/eburecommendations/R121-2007.pdf, Tech. Rep., Jul. 2007.
Face recognition in apple fotos
  • Apple
  • Com
Apple.com, "Face recognition in apple fotos", Apple.com, https://support.apple.com/de-de/guide/photos/phtad9d981ab/mac, Tech. Rep., Aug. 2020.
Apache commons imaging api
  • A S Foundation
A. S. Foundation, "Apache commons imaging api", Apache Software Foundation, https://commons.apache.org/proper/commons-imaging/, Tech. Rep., Aug. 2020.
Java enterprise edition
  • Oracle
  • Com
Oracle.com, "Java enterprise edition", Oracle.com, https://www.oracle.com/de/java/technologies/java-ee-glance.html, Tech. Rep., Aug. 2020.