Figure 2 - uploaded by Fabrizio Falchi
Content may be subject to copyright.
VISIONE User Interface

VISIONE User Interface

Source publication
Technical Report
Full-text available
The Artificial Intelligence for Multimedia Information Retrieval (AIMIR) research group is part of the NeMIS laboratory of the Information Science and Technologies Institute ``A. Faedo'' (ISTI) of the Italian National Research Council (CNR). The AIMIR group has a long experience in topics related to: Artificial Intelligence, Multimedia Information...

Contexts in source publication

Context 1
... system supports four types of queries: query by keywords, query by object location, query by colors, and query by visual similarity. The user interface, shown in Figure 2, provides a text box to specify the keywords, and a canvas for sketching objects and colors to be found in the target video. The system leverages the image tagging system proposed in [33], which is able to label images with about [34], and YOLO9000 [35] with about 9500 object tags, to detect objects in a video for the object location search. ...
Context 2
... by the relation networks (RN) employed in relational visual question answering (R-VQA), we present novel architectures to explicitly capture relational information from images in the form of network activations that can be subsequently extracted and used as visual features. We describe a two-stage relation network module (2S-RN), trained on the R-VQA task, able to collect non-aggregated visual features. Then, we propose the aggregated visual features relation network (AVF-RN) module that is able to produce better relationship-aware features by learning the aggregation directly inside the network. ...
Context 3
... system supports four types of queries: query by keywords, query by object location, query by colors, and query by visual similarity. The user interface, shown in Figure 2, provides a text box to specify the keywords, and a canvas for sketching objects and colors to be found in the target video. The system leverages the image tagging system proposed in [33], which is able to label images with about [34], and YOLO9000 [35] with about 9500 object tags, to detect objects in a video for the object location search. ...
Context 4
... by the relation networks (RN) employed in relational visual question answering (R-VQA), we present novel architectures to explicitly capture relational information from images in the form of network activations that can be subsequently extracted and used as visual features. We describe a two-stage relation network module (2S-RN), trained on the R-VQA task, able to collect non-aggregated visual features. Then, we propose the aggregated visual features relation network (AVF-RN) module that is able to produce better relationship-aware features by learning the aggregation directly inside the network. ...