Alessandro BozzonDelft University of Technology | TU · Faculty of Electrical Engineering, Mathematics and Computer Sciences (EEMCS)
Alessandro Bozzon
Assistant Professor
About
223
Publications
62,485
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,356
Citations
Introduction
Skills and Expertise
Additional affiliations
January 2009 - January 2013
Education
January 2006 - December 2008
October 2003 - October 2005
Poltecnico di Milano
Field of study
Publications
Publications (223)
Machine learning (ML) training data is often scattered across disparate collections of datasets, called
data silos
. This fragmentation poses a major challenge for data-intensive ML applications: integrating and transforming data residing in different sources demand a lot of manual work and computational resources. With data privacy constraints,...
Work on value alignment aims to ensure that human values are respected by AI systems. However, existing approaches tend to rely on universal framings of human values that obscure the question of which values the systems should capture and align with, given the variety of operational situations. This often results in AI systems that privilege only a...
In the shift towards human-centered manufacturing, our two-year longitudinal study investigates the real-world impact of deploying Cognitive Assistants (CAs) in factories. The CAs were designed to facilitate knowledge sharing among factory operators. Our investigation focused on smartphone-based voice assistants and LLM-powered chatbots, examining...
Digitally-supported participatory methods are often used in policy-making to develop inclusive policies by collecting and integrating citizen's opinions. However, these methods fail to capture the complexity and nuances in citizen's needs, i.e., citizens are generally unaware of other's needs, perspectives, and experiences. Consequently, policies d...
The configuration of public open spaces plays a crucial role in shaping how different people use them. Nevertheless, our understanding of how the physical features of public open spaces influence the activities conducted within them, and the extent to which this impact differs across various individuals and population groups, is currently limited....
Sustained adoption of automation is a problem for organizations, despite the promised benefits of automation and the propensity for organizations to expect it to transform their workplaces. To address this problem, previous work in HCI has mostly considered the perspectives and experiences of users interacting with automation technologies and has n...
Web search has evolved into a platform people rely on for opinion formation on debated topics. Yet, pursuing this search intent can carry serious consequences for individuals and society and involves a high risk of biases. We argue that web search can and should empower users to form opinions responsibly and that the information retrieval community...
The study of urban greenspaces typically relies on three types of data: people's subjective perceptions collected via questionnaires, vegetation indices derived from satellite imagery, such as the Normalized Difference Vegetation Index (NDVI), and Land Use or Land Cover maps, such as OpenStreetMap (OSM). Data on people's perceptions are essential w...
The proliferation of pre-trained ML models in public Web-based model zoos facilitates the engineering of ML pipelines to address complex inference queries over datasets and streams of unstructured content. Constructing optimal plan for a query is hard, especially when constraints (e.g. accuracy or execution time) must be taken into consideration, a...
Machine learning (ML) researchers and practitioners are building repositories of pre-trained models, called model zoos. These model zoos contain metadata that detail various properties of the ML models and datasets, which are useful for reporting, auditing, reproducibility, and interpretability. Unfortunately, the existing metadata representations...
Traditionally, the popularity of classical music composers is approximated through commercial figures like album releases, record sales, or live performances. However, commercial factors only provide one piece of the overall picture. The success of community-driven platforms has profoundly changed how people consume and interact with music, and, co...
City streets that feel safe and attractive motivate active travel behaviour and promote people’s well-being. However, determining what makes a street safe and attractive is a challenging task because subjective qualities of the streetscape are difficult to quantify. Existing evidence typically focuses on how different street features influence perc...
Neighborhood safety and its perception are important determinants of citizens’ health and well-being. Contemporary urban design guidelines often advocate urban forms that encourage natural surveillance or “eyes on the street” to promote community safety. However, assessing a neighborhood’s level of natural surveillance is challenging due to its sub...
Recent evidence underscores the importance of greenspace exposure in promoting physical activity, and in having a positive impact on mental health and cognitive development. Accessibility has been identified to be the primary motivating factor when it comes to encouraging greenspace use and, correspondingly, exposure. Existing quantitative approach...
Machine learning (ML) practitioners and organizations are building model repositories of pre-trained models, referred to as
model zoos
. These model zoos contain metadata describing the properties of the ML models and datasets. The metadata serves crucial roles for reporting, auditing, ensuring reproducibility, and enhancing interpretability. Des...
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no...
Music content annotation campaigns are common on paid crowdsourcing platforms. Crowd workers are expected to annotate complex music artifacts, a task often demanding specialized skills and expertise, thus selecting the right participants is crucial for campaign success. However, there is a general lack of deeper understanding of the distribution of...
In an effort to regulate Machine Learning-driven (ML) systems, current auditing processes mostly focus on detecting harmful algorithmic biases. While these strategies have proven to be impactful, some values outlined in documents dealing with ethics in ML-driven systems are still underrepresented in auditing processes. Such unaddressed values mainl...
Artificial intelligence (AI) applications can profoundly affect society. Recently, there has been extensive interest in studying how scientists design AI systems for general tasks. However, it remains an open question as to whether the AI systems developed in this way can work as expected in different regional contexts while simultaneously empoweri...
Background:
There is increasing evidence that a complex interplay of factors within environments in which children grows up, contributes to children's suboptimal mental health and cognitive development. The concept of the life-course exposome helps to study the impact of the physical and social environment, including social inequities, on cognitiv...
City events are getting popular and are attracting a large number of people. This increase needs for methods and tools to provide stakeholders with crowd size information for crowd management purposes. Previous works proposed a large number of methods to count the crowd using different data in various contexts, but no methods proposed using social...
The future of crowd work has been identified to depend on worker satisfaction, but we lack a thorough understanding of how worker satisfaction can be increased in microtask crowdsourcing. Prior work has shown that one solution is to build tasks that are engaging. To facilitate engagement, two methods that have received attention in recent HCI liter...
Many powerful Artificial Intelligence (AI) techniques have been engineered with the goals of high performance and accuracy. Recently, AI algorithms have been integrated into diverse and real-world applications. It has become an important topic to explore the impact of AI on society from a people-centered perspective. Previous works in citizen scien...
Music content annotation campaigns are common on paid crowdsourcing platforms. Crowd workers are expected to annotate complicated music artefacts, which can demand certain skills and expertise. Traditional methods of participant selection are not designed to capture these kind of domain-specific skills and expertise, and often domain-specific quest...
The automatic detection of conflictual languages (harmful, aggressive, abusive, and offensive languages) is essential to provide a healthy conversation environment on the Web. To design and develop detection systems that are capable of achieving satisfactory performance, a thorough understanding of the nature and properties of the targeted type of...
As cities resume life in public space, they face the difficult task of retaining outdoor activity while decreasing exposure to airborne viruses, such as the novel coronavirus. Even though the transmission risk is higher in indoor spaces, recent evidence suggests that physical contact outdoors also contributes to an increased virus exposure. Given t...
Hybrid crowd-machine classifiers can achieve superior performance by combining the cost-effectiveness of automatic classification with the accuracy of human judgment. This paper shows how crowd and machines can support each other in tackling classification problems. Specifically, we propose an architecture that orchestrates active learning and crow...
In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a chall...
In online crowd mapping, crowd workers recruited through crowdsourcing marketplaces collect geographic data. Compared to traditional mapping methods, where workers physically explore the area, the benefit of using online crowd mapping is the potential to be cost-effective and time-efficient. Previous studies have focused on mapping urban objects us...
Human annotation is still an essential part of modern transcription workflows for digitizing music scores, either as a standalone approach where a single expert annotator transcribes a complete score, or for supporting an automated Optical Music Recognition (OMR) system. Research on human computation has shown the effectiveness of crowdsourcing for...
In online crowd mapping, crowd workers recruited through crowdsourcing marketplaces collect geographic data. Compared to traditional mapping methods, where workers physically explore the area, the benefit of using online crowd mapping is the potential to be cost-effective and time-efficient. Previous studies have focused on mapping urban objects us...
The way pages are ranked in search results influences whether the users of search engines are exposed to more homogeneous, or rather to more diverse viewpoints. However, this viewpoint diversity is not trivial to assess. In this paper we use existing and novel ranking fairness metrics to evaluate viewpoint diversity in search result rankings. We co...
Due to the coronavirus pandemic, remote work from home has rapidly become a necessity around the world, drastically changing the potential landscape for the future of work. Over the last couple of decades, microtask crowdsourcing has emerged as a viable means of carrying out remote online work to earn one's living-an alternative to traditional work...
Large-scale events are becoming more frequent in contemporary cities, increasing the need for novel methods and tools that can provide relevant stakeholders with quantitative and qualitative insights about attendees’ characteristics. In this work, we investigate how social media can be used to provide such insights. First, we screen a set of factor...
Conversational agents are playing an increasingly important role in providing users with natural communication environments, improving outcomes in a variety of domains in human-computer interaction. Crowdsourcing marketplaces are simultaneously flourishing, and it has never been easier to acquire large-scale human input from online workers. Recent...
Credit scoring is an important tool to assess the solidity of small and medium-sized enterprises (SMEs), and to unlock for them new options for credit and improvement of cash flow. Credit scoring is, in its most common form, used by (potential) creditors to predict the probability of SMEs to default in the future, as an inverse measure of creditwor...
Up-to-date listings of retail stores and related building functions are challenging and costly to maintain. We introduce a novel method for automatically detecting, geo-locating, and classifying retail stores and related commercial functions, on the basis of storefronts extracted from street-level imagery. Specifically, we present a deep learning a...
Crowdsourcing marketplaces have provided a large number of opportunities for online workers to earn a living. To improve satisfaction and engagement of such workers, who are vital for the sustainability of the marketplaces, recent works have used conversational interfaces to support the execution of a variety of crowdsourcing tasks. The rationale b...
The rise in popularity of conversational agents has enabled humans to interact with machines more naturally. Recent work has shown that crowd workers in microtask marketplaces can complete a variety of human intelligence tasks (HITs) using conversational interfaces with similar output quality compared to the traditional Web interfaces. In this pape...
This demo presents VirtualCrowd, a simulation platform for crowdsourcing campaigns. The platform allows the design, configuration, step-by-step execution, and analysis of customized tasks, worker profiles, and crowdsourcing strategies. The platform will be demonstrated through a crowd-mapping example in two cities, which will highlight the utility...
Despite the high interest for Machine Learning (ML) in academia and industry, many issues related to the application of ML to real-life problems are yet to be addressed. Here we put forward one limitation which arises from a lack of adaptation of ML models and datasets to specific applications. We formalise a new notion of unfairness as exclusion o...
Machine Learning (ML) is increasingly applied in real-life scenarios, raising concerns about bias in automatic decision making. We focus on bias as a notion of opinion exclusion, that stems from the direct application of traditional ML pipelines to infer subjective properties. We argue that such ML systems should be evaluated with subjectivity and...
Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-...
City events are being organized more frequently, and with larger crowds, in urban areas. There is an increased need for novel methods and tools that can provide information on the sentiments of crowds as an input for crowd management. Previous work has explored sentiment analysis and a large number of methods have been proposed relating to various...
Conversational interfaces can facilitate human-computer interactions. Whether or not conversational interfaces can improve worker experience and work quality in crowdsourcing marketplaces has remained unanswered. We investigate the suitability of text-based conversational interfaces for microtask crowdsourcing. We designed a rigorous experimental c...
Knowledge about the organization of the main physical elements (e.g. streets) and objects (e.g. trees) that structure cities is important in the maintenance of city infrastructure and the planning of future urban interventions. In this paper, a novel approach to crowd-mapping urban objects is proposed. Our method capitalizes on strategies for gener...
Street-level imagery contains a variety of visual information about the facades of Points of Interest (POIs). In addition to general morphological features, signs on the facades of, primarily, business-related POIs could be a valuable source of information about the type and identity of a POI. Recent advancements in computer vision could leverage v...
This paper describes the system that team MYTOMORROWS-TU DELFT developed for the 2019 Social Media Mining for Health Applications (SMM4H) Shared Task 3, for the end-to-end normalization of ADR tweet mentions to their corresponding MEDDRA codes. For the first two steps, we reuse a state-of-the art approach, focusing our contribution on the final ent...
Understanding and improving the energy consumption behavior of individuals is considered a powerful approach to improve energy conservation and stimulate energy efficiency. To motivate people to change their energy consumption behavior, we need to have a thorough understanding of which energy-consuming activities they perform and how these are perf...
Dialog agents like digital assistants and automated chat interfaces (e.g.chatbots) are becoming more and more popular as users adapt to conversing with their devices like with humans. In this article we present approaches and available tools for dialog management, a component of dialog agents that handles dialog context and decides the next action...
Knowledge graphs (KGs) have proven to be effective to improve recommendation. Existing methods mainly rely on hand-engineered features from KGs (e.g., meta paths), which requires domain knowledge. This paper presents RKGE, a KG embedding approach that automatically learns semantic representations of both entities and paths between entities for char...
Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”,“StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches emp...