About
217
Publications
44,401
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,279
Citations
Introduction
Marco Bertini currently works at the Dipartimento di Ingegneria dell'Informazione, University of Florence.
Additional affiliations
November 2010 - November 2012
Position
- euTV
December 2007 - October 2015
January 2002 - present
Publications
Publications (217)
In this paper, we address the problem of content-based image retrieval (CBIR) by learning images representations based on the activations of a Convolutional Neural Network. We propose an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on the trainable aggregation layer NetVLAD (Arandjelovic et al in P...
In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes. We show the performance of the proposed system in the context of object detection in thermal videos, a domain where 1) training datasets are very limited compared to visible...
In this paper we propose a method for improving pedestrian detection in the thermal domain using two stages: first, a generative data augmentation approach is used, then a domain adaptation method using generated data adapts an RGB pedestrian detector. Our model, based on the Least-Squares Generative Adversarial Network, is trained to synthesize re...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
The chapter will cover deep learning methodologies that can be employed to recover image and video quality. Most of the covered approaches will be based on conditional Generative Adversarial Networks (GAN) which have the benefit to produce images which look more natural. Looking at the inference phase we will show how to perform such operations wit...
Lossy video stream compression is performed to reduce the bandwidth and storage requirements. Moreover also image compression is a need that arises in many circumstances.It is often the case that older archive are stored at low resolution and with a compression rate suitable for the technology available at the time the video was created. Unfortunat...
Pedestrian detection is a core problem in computer vision that sees broad application in video surveillance and, more recently, in advanced driving assistance systems. Despite its broad application and interest, it remains a challenging problem in part due to the vast range of conditions under which it must be robust. Pedestrian detection at nightt...
Open-source software is a relevant topic in video game development. Taking a look at the most frequently employed game engines for developing Android games [1] we can see that seven out of ten ranked engines are OSS. Over the last decade, more and more game studios and individual developers switched to open-source software. Oliver Franzke from Doub...
In this paper, we propose an automatic approach for localizing the inner eye canthus in thermal face images. We first coarsely detect 5 facial keypoints corresponding to the center of the eyes, the nosetip and the ears. Then we compute a sparse 2D-3D points correspondence using a 3D Morphable Face Model (3DMM). This correspondence is used to projec...
Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This paper investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work o...
Pedestrian detection is a core problem in computer vision that sees broad application in video surveillance and, more recently, in advanced driving assistance systems. Despite its broad application and interest, it remains a challenging problem in part due to the vast range of conditions under which it must be robust. Pedestrian detection at night-...
Writing source code for programs with lightweight text editors or fully featured integrated development environments is considered the main method of programming. Notebooks, however, are an extremely practical tool. In contrast to IDEs, projects are set up more easily and they allow for running programs in a read-eval-print loop (REPL) environment....
In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on NetVLAD and a triplet mining procedure based on samples difficulty to obtain an effect...
This paper describes an action classification pipeline for detecting and evaluating correct execution of actions in video recorded by smartphone cameras; the use case is that of simplifying monitoring of how physiotherapeutic exercises are performed by patients in the comfort of their own home, reducing the need of physical presence of therapists....
Video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. This affects commercial streaming services such as Netflix, or Amazon Prime Video, but affects also video conferencing and video surveillance systems. In all these cases it is possible to improve the video quality,...
Pedestrian detection is a core problem in computer vision, and is a problem that is gaining prominence due to its importance in assisted and autonomous driving applications. Many state-of-the-art approaches, especially those used for autonomous driving, combine thermal and visible spectrum imagery in order to robustly detect persons independent of...
Video stream compression, using lossy algorithms, is performed to reduce the bandwidth required for transmission. To improve the video quality, either for human view or for automatic video analysis, videos are post-processed to eliminate the introduced compression artifacts. Generative Adversarial Network have been shown to obtain extremely high qu...
Having already discussed MatConvNet and Keras, let us continue with an open source framework for deep learning, which takes a new and interesting approach. TensorFlow.js is not only providing deep learning for JavaScript developers, but it's also making applications of deep learning available in the WebGL enabled web browsers, or more specifically,...
Following the last column on MatConvNet, let us continue to look at open source frameworks for deep learning. In this column we are going to check Keras, a Python API that allows to use several different backends like Tensorflow and CNTK. Actually, it also supports Theano, although the development of this framework has been halted by the original d...
MatConvNet is an open source MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision and multimedia applications, developed by the same authors of the famed VLFeat library. Both libraries have associated papers that have been presented within the Open Source Software Competition track of ACM Multimedia: "MatConvNet: Con...
"Open source software is software that can be freely accessed, used, changed, and shared (in modified or unmodified form) by anyone" (cp. https://opensource.org/osd). So open source software (OSS) is actually something that one or more people can work on, improve it, refine it, change it, adapt it and share or use it. Why would anyone support such...
In this article, we address the problem of creating a smart audio guide that adapts to the actions and interests of museum visitors. As an autonomous agent, our guide perceives the context and is able to interact with users in an appropriate fashion. To do so, it understands what the visitor is looking at, if the visitor is moving inside the museum...
Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify obje...
Compression artifacts arise in images whenever a lossy compression algorithm is applied. These artifacts eliminate details present in the original image, or add noise and small structures; because of these effects they make images less pleasant for the human eye, and may also lead to decreased performance of computer vision algorithms such as objec...
Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image that are more likely to contain the object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not bee...
This book constitutes the thoroughly refereed proceedings of the 12th Italian Research Conference on Digital Libraries, IRCDL 2016, held in Firence, Italy, in February 2016.
The 15 papers presented were carefully selected from 23 submissions and cover topics such as formal methods, long-term preservation, metadata creation, management and curation,...
The goal of this work is to implement a real-time computer vision system that can run on wearable devices to perform object classification and artwork recognition, to improve the experience of a museum visit through understanding the interests of users. Object classification helps to understand the context of the visit, e.g. differentiating when a...
Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary i...
Hundreds of hours of videos are uploaded every minute on YouTube and other video sharing sites: some will be viewed by millions of people and other will go unnoticed by all but the uploader. In this paper we propose to use visual sentiment and content features to predict the popularity of web videos. The proposed approach outperforms current state-...
In this paper we propose a method for video recommendation in Social Networks based on crowdsourced and automatic video annotations of salient frames. We show how two human factors, users' self-expression in user profiles and perception of visual saliency in videos, can be exploited in order to stimulate annotations and to obtain an efficient repre...
In this paper we present an efficient method for visual descriptors retrieval based on compact hash codes computed using a multiple k-means assignment. The method has been applied to the problem of approximate nearest neighbor (ANN) search of local and global visual content descriptors, and it has been tested on different datasets: three large scal...
This paper presents a novel method for efficient image retrieval, based on a simple and effective hashing of CNN features and the use of an indexing structure based on Bloom filters. These filters are used as gatekeepers for the database of image features, allowing to avoid to perform a query if the query features are not stored in the database and...
This tutorial focuses on challenges and solutions for content-based image annotation and retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement , and tag-based image retrieval. We introduce a tax-onomy to structure the growing literature, under...