Marco Bertini

Marco Bertini
University of Florence | UNIFI · Dipartimento di Ingegneria dell'Informazione

Ph.D.

About

217
Publications
44,401
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,279
Citations
Introduction
Marco Bertini currently works at the Dipartimento di Ingegneria dell'Informazione, University of Florence.
Additional affiliations
November 2010 - November 2012
Position
  • euTV
December 2007 - October 2015
University of Florence
Position
  • Professor (Assistant)
January 2002 - present
University of Florence
Position
  • Senior Researcher

Publications

Publications (217)
Article
Full-text available
In this paper, we address the problem of content-based image retrieval (CBIR) by learning images representations based on the activations of a Convolutional Neural Network. We propose an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on the trainable aggregation layer NetVLAD (Arandjelovic et al in P...
Preprint
Full-text available
In this paper we propose a novel data augmentation approach for visual content domains that have scarce training datasets, compositing synthetic 3D objects within real scenes. We show the performance of the proposed system in the context of object detection in thermal videos, a domain where 1) training datasets are very limited compared to visible...
Preprint
Full-text available
In this paper we propose a method for improving pedestrian detection in the thermal domain using two stages: first, a generative data augmentation approach is used, then a domain adaptation method using generated data adapts an RGB pedestrian detector. Our model, based on the Least-Squares Generative Adversarial Network, is trained to synthesize re...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Book
This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 works...
Chapter
The chapter will cover deep learning methodologies that can be employed to recover image and video quality. Most of the covered approaches will be based on conditional Generative Adversarial Networks (GAN) which have the benefit to produce images which look more natural. Looking at the inference phase we will show how to perform such operations wit...
Article
Full-text available
Lossy video stream compression is performed to reduce the bandwidth and storage requirements. Moreover also image compression is a need that arises in many circumstances.It is often the case that older archive are stored at low resolution and with a compression rate suitable for the technology available at the time the video was created. Unfortunat...
Chapter
Full-text available
Pedestrian detection is a core problem in computer vision that sees broad application in video surveillance and, more recently, in advanced driving assistance systems. Despite its broad application and interest, it remains a challenging problem in part due to the vast range of conditions under which it must be robust. Pedestrian detection at nightt...
Article
Open-source software is a relevant topic in video game development. Taking a look at the most frequently employed game engines for developing Android games [1] we can see that seven out of ten ranked engines are OSS. Over the last decade, more and more game studios and individual developers switched to open-source software. Oliver Franzke from Doub...
Preprint
Full-text available
In this paper, we propose an automatic approach for localizing the inner eye canthus in thermal face images. We first coarsely detect 5 facial keypoints corresponding to the center of the eyes, the nosetip and the ears. Then we compute a sparse 2D-3D points correspondence using a 3D Morphable Face Model (3DMM). This correspondence is used to projec...
Article
Full-text available
Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This paper investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work o...
Conference Paper
Full-text available
Pedestrian detection is a core problem in computer vision that sees broad application in video surveillance and, more recently, in advanced driving assistance systems. Despite its broad application and interest, it remains a challenging problem in part due to the vast range of conditions under which it must be robust. Pedestrian detection at night-...
Article
Writing source code for programs with lightweight text editors or fully featured integrated development environments is considered the main method of programming. Notebooks, however, are an extremely practical tool. In contrast to IDEs, projects are set up more easily and they allow for running programs in a read-eval-print loop (REPL) environment....
Preprint
Full-text available
In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on NetVLAD and a triplet mining procedure based on samples difficulty to obtain an effect...
Conference Paper
This paper describes an action classification pipeline for detecting and evaluating correct execution of actions in video recorded by smartphone cameras; the use case is that of simplifying monitoring of how physiotherapeutic exercises are performed by patients in the comfort of their own home, reducing the need of physical presence of therapists....
Conference Paper
Full-text available
Video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. This affects commercial streaming services such as Netflix, or Amazon Prime Video, but affects also video conferencing and video surveillance systems. In all these cases it is possible to improve the video quality,...
Chapter
Full-text available
Pedestrian detection is a core problem in computer vision, and is a problem that is gaining prominence due to its importance in assisted and autonomous driving applications. Many state-of-the-art approaches, especially those used for autonomous driving, combine thermal and visible spectrum imagery in order to robustly detect persons independent of...
Chapter
Full-text available
Video stream compression, using lossy algorithms, is performed to reduce the bandwidth required for transmission. To improve the video quality, either for human view or for automatic video analysis, videos are post-processed to eliminate the introduced compression artifacts. Generative Adversarial Network have been shown to obtain extremely high qu...
Article
Having already discussed MatConvNet and Keras, let us continue with an open source framework for deep learning, which takes a new and interesting approach. TensorFlow.js is not only providing deep learning for JavaScript developers, but it's also making applications of deep learning available in the WebGL enabled web browsers, or more specifically,...
Article
Full-text available
Following the last column on MatConvNet, let us continue to look at open source frameworks for deep learning. In this column we are going to check Keras, a Python API that allows to use several different backends like Tensorflow and CNTK. Actually, it also supports Theano, although the development of this framework has been halted by the original d...
Article
Full-text available
MatConvNet is an open source MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision and multimedia applications, developed by the same authors of the famed VLFeat library. Both libraries have associated papers that have been presented within the Open Source Software Competition track of ACM Multimedia: "MatConvNet: Con...
Article
"Open source software is software that can be freely accessed, used, changed, and shared (in modified or unmodified form) by anyone" (cp. https://opensource.org/osd). So open source software (OSS) is actually something that one or more people can work on, improve it, refine it, change it, adapt it and share or use it. Why would anyone support such...
Article
Full-text available
In this article, we address the problem of creating a smart audio guide that adapts to the actions and interests of museum visitors. As an autonomous agent, our guide perceives the context and is able to interact with users in an appropriate fashion. To do so, it understands what the visitor is looking at, if the visitor is moving inside the museum...
Conference Paper
Full-text available
Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify obje...
Article
Full-text available
Compression artifacts arise in images whenever a lossy compression algorithm is applied. These artifacts eliminate details present in the original image, or add noise and small structures; because of these effects they make images less pleasant for the human eye, and may also lead to decreased performance of computer vision algorithms such as objec...
Article
Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image that are more likely to contain the object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not bee...
Book
This book constitutes the thoroughly refereed proceedings of the 12th Italian Research Conference on Digital Libraries, IRCDL 2016, held in Firence, Italy, in February 2016. The 15 papers presented were carefully selected from 23 submissions and cover topics such as formal methods, long-term preservation, metadata creation, management and curation,...
Conference Paper
Full-text available
The goal of this work is to implement a real-time computer vision system that can run on wearable devices to perform object classification and artwork recognition, to improve the experience of a museum visit through understanding the interests of users. Object classification helps to understand the context of the visit, e.g. differentiating when a...
Article
Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary i...
Conference Paper
Full-text available
Hundreds of hours of videos are uploaded every minute on YouTube and other video sharing sites: some will be viewed by millions of people and other will go unnoticed by all but the uploader. In this paper we propose to use visual sentiment and content features to predict the popularity of web videos. The proposed approach outperforms current state-...
Conference Paper
In this paper we propose a method for video recommendation in Social Networks based on crowdsourced and automatic video annotations of salient frames. We show how two human factors, users' self-expression in user profiles and perception of visual saliency in videos, can be exploited in order to stimulate annotations and to obtain an efficient repre...
Article
In this paper we present an efficient method for visual descriptors retrieval based on compact hash codes computed using a multiple k-means assignment. The method has been applied to the problem of approximate nearest neighbor (ANN) search of local and global visual content descriptors, and it has been tested on different datasets: three large scal...
Article
Full-text available
This paper presents a novel method for efficient image retrieval, based on a simple and effective hashing of CNN features and the use of an indexing structure based on Bloom filters. These filters are used as gatekeepers for the database of image features, allowing to avoid to perform a query if the query features are not stored in the database and...
Conference Paper
Full-text available
This tutorial focuses on challenges and solutions for content-based image annotation and retrieval in the context of online image sharing and tagging. We present a unified review on three closely linked problems, i.e., tag assignment, tag refinement , and tag-based image retrieval. We introduce a tax-onomy to structure the growing literature, under...