About
258
Publications
107,797
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,072
Citations
Introduction
Additional affiliations
June 2012 - present
July 2011 - present
July 2003 - June 2011
Publications
Publications (258)
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts, such as transport control, financial activities, and medical diagnosis. While local explanation techniques are popular methods to interpret ML models on a single instance, they do not scale to the understand...
There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery...
Noise is one of the primary quality‐of‐life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low‐cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant...
Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant...
While designing sustainable and resilient urban built environment is increasingly promoted around the world, significant data gaps have made research on pressing sustainability issues challenging to carry out. Pavements are known to have strong economic and environmental impacts; however, most cities lack a spatial catalog of their surfaces due to...
Graffiti is an inseparable element of most large cities. It is of critical value to recognize whether it is an artistry product or a distortion sign. This study develops a larger graffiti dataset containing a variety of graffiti types and annotated boundary boxes. We use this data to obtain a robust graffiti detection model. Compared with existing...
Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings...
Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings...
Background The trapezius muscle is often utilized as a muscle or nerve donor for repairing shoulder function in those with brachial plexus birth palsy (BPBP). To evaluate the native role of the trapezius in the affected limb, we demonstrate use of the Motion Browser, a novel visual analytics system to assess an adolescent with BPBP.
Method An 18-ye...
Extracting and analyzing crime patterns in big cities is a challenging spatiotemporal problem. The problem's hardness is linked to the sparse nature of the crime activity and its spread in large spatial areas. Sparseness hampers most time series comparison methods from working properly, while handling large areas tends to render the computational c...
Urban art constitutes an important issue in urbanism. Previous studies on the spatial distribution of graffiti rarely consider visual categories and how the city topology can impact graffiti production. In this work, after assigning graffiti occurrences to three categories, we analyzed their spatial distribution while searching for possible biases....
Exploring large virtual environments, such as cities, is a central task in several domains, such as gaming and urban planning. VR systems can greatly help this task by providing an immersive experience; however, a common issue with viewing and navigating a city in the traditional sense is that users can either obtain a local or a global view, but n...
Many esports use a pick and ban process to define the parameters of a match before it starts. In Counter-Strike: Global Offensive (CSGO) matches, two teams first pick and ban maps, or virtual worlds, to play. Teams typically ban and pick maps based on a variety of factors, such as banning maps which they do not practice, or choosing maps based on t...
Interactive visualizations are at the core of the exploratory data analysis process, enabling users to directly manipulate and gain insights from data. In this article, we present three different ways in which interactive visualizations can be included in Jupyter Notebooks: 1) matplotlib callbacks; 2) visualization toolkits; and 3) embedding HTML v...
Esports, despite its expanding interest, lacks fundamental sports analytics resources such as accessible data or proven and reproducible analytical frameworks. Even Counter-Strike: Global Offensive (CSGO), the second most popular esport, suffers from these problems. Thus, quantitative evaluation of CSGO players, a task important to teams, media, be...
Multidimensional Projection is a fundamental tool for high-dimensional data analytics and visualization. With very few exceptions, projection techniques are designed to map data from a high-dimensional space to a visual space so as to preserve some dissimilarity (similarity) measure, such as the Euclidean distance for example. In fact, although ado...
Despite the great differences among cities, they face similar challenges regarding social inequality, politics and criminality. Urban art express these feelings from the citizen point-of-view. In particular, the drawing and painting of public surfaces may carry rich information about the time and region it was made. Existing studies have explored t...
We present an atmospheric model tailored for the interactive visualization of planetary surfaces. As the exploration of the solar system is progressing with increasingly accurate missions and instruments, the faithful visualization of planetary environments is gaining increasing interest in space research, mission planning, and science communicatio...
In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers t...
We present an atmospheric model tailored for the interactive visualization of planetary surfaces. As the exploration of the solar system is progressing with increasingly accurate missions and instruments, the faithful visualization of planetary environments is gaining increasing interest in space research, mission planning, and science communicatio...
Multidimensional Projection is a fundamental tool for high-dimensional data analytics and visualization. With very few exceptions, projection techniques are designed to map data from a high-dimensional space to a visual space so as to preserve some dissimilarity (similarity) measure, such as the Euclidean distance for example. In fact, although ado...
Urban planning is increasingly data driven, yet the challenge of designing with data at a city scale and remaining sensitive to the impact at a human scale is as important today as it was for Jane Jacobs. We address this challenge with Urban Mosaic,a tool for exploring the urban fabric through a spatially and temporally dense data set of 7.7 millio...
With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text),...
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-expl...
In data science, there is a long history of using synthetic data for method development, feature selection and feature engineering. Our current interest in synthetic data comes from recent work in explainability. Today's datasets are typically larger and more complex - requiring less interpretable models. In the setting of \textit{post hoc} explain...
An understanding of person dynamics is indispensable for numerous urban applications, including the design of transportation networks and planning for business development. Pedestrian counting often requires utilizing manual or technical means to count individuals in each location of interest. However, such methods do not scale to the size of a cit...
Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in rea...
In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to search and generate end-to-end learning pipelines. While these techniques facilitate the creation of models for real-world applications, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines the...
Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in rea...
Urban planning is increasingly data driven, yet the challenge of designing with data at a city scale and remaining sensitive to the impact at a human scale is as important today as it was for Jane Jacobs. We address this challenge with Urban Mosaic, a tool for exploring the urban fabric through a spatially and temporally dense data set of 7.7 milli...
The baseball game is often seen as many contests that are performed between individuals. The duel between the pitcher and the batter, for example, is considered the engine that drives the sport. The pitchers use a variety of strategies to gain competitive advantage against the batter, who does his best to figure out the ball trajectory and react in...
Boundary detection has long been a fundamental tool for image processing and computer vision, supporting the analysis of static and time-varying data. In this work, we built upon the theory of Graph Signal Processing to propose a novel boundary detection filter in the context of graphs, having as main application scenario the visual analysis of spa...
Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models , such as gravity model, are mainly derived from physics principles and limited by their predictive power in re...
Abstract The Marching Cubes algorithm is arguably the most popular isosurface extraction algorithm. Since its inception, two problems have lingered, namely, triangle quality and topology correctness. Although there is an extensive literature to solve them, topology correctness is achieved in detriment of triangle quality and vice versa. In this pap...
São Paulo is the largest city in South America, with high criminality rates. The number and type of crimes varies considerably around the city, assuming different patterns depending on urban and social characteristics. In this scenario, enabling tools to explore particular locations of the city is very important for domain experts to understand how...
Boundary detection has long been a fundamental tool for image processing and computer vision, supporting the analysis of static and time-varying data. In this work, we built upon the theory of Graph Signal Processing to propose a novel boundary detection filter in the context of graphs, having as main application scenario the visual analysis of spa...
The brachial plexus is a complex network of peripheral nerves that enables sensing from and control of the movements of the arms and hand. Nowadays, the coordination between the muscles to generate simple movements is still not well understood, hindering the knowledge of how to best treat patients with this type of peripheral nerve injury. To acqui...
Dataflow visualization systems enable flexible visual data exploration by allowing the user to construct a dataflow diagram that composes query and visualization modules to specify system functionality. However learning dataflow diagram usage presents overhead that often discourages the user. In this work we design FlowSense, a natural language int...
Human knowledge about the cosmos is rapidly increasing as instruments and simulations are generating new data supporting the formation of theory and understanding of the vastness and complexity of the universe. OpenSpace is a software system that takes on the mission of providing an integrated view of all these sources of data and supports interact...
Dataflow visualization systems enable flexible visual data exploration by allowing the user to construct a dataflow diagram that composes query and visualization modules to specify system functionality. However learning dataflow diagram usage presents overhead that often discourages the user. In this work we design FlowSense, a natural language int...
The brachial plexus is a complex network of peripheral nerves that enables sensing from and control of the movements of the arms and hand. Nowadays, the coordination between the muscles to generate simple movements is still not well understood, hindering the knowledge of how to best treat patients with this type of peripheral nerve injury. To acqui...
Large scale shadows from buildings in a city play an important role in determining the environmental quality of public spaces. They can be both beneficial, such as for pedestrians during summer, and detrimental, by impacting vegetation and by blocking direct sunlight. Determining the effects of shadows requires the accumulation of shadows over time...
While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in...
While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in...
We present a theoretical and empirical study of the gradient dynamics of overparameterized shallow ReLU networks with one-dimensional input, solving least-squares interpolation. We show that the gradient dynamics of such networks are determined by the gradient flow in a non-redundant parameterization of the network function. We examine the principa...
Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we ext...
The sport data tracking systems available today are based on specialized hardware (high-definition cameras, speed radars, RFID) to detect and track targets on the field. While effective, implementing and maintaining these systems pose a number of challenges, including high cost and need for close human monitoring. On the other hand, the sports anal...
The ScanAllFish project is a large-scale effort to scan all the world's 33,100 known species of fishes. It has already generated thousands of volumetric CT scans of fish species which are available on open access platforms such as the Open Science Framework. To achieve a scanning rate required for a project of this magnitude, many specimens are gro...
Graffiti is a common phenomenon in urban scenarios. Differently from urban art, graffiti tagging is a vandalism act and many local governments are putting great effort to combat it. The graffiti map of a region can be a very useful resource because it may allow one to potentially combat vandalism in locations with high level of graffiti and also to...
Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitiga...
Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitiga...
SONYC integrates sensors, machine listening, data analytics, and citizen science to address noise pollution in New York City.
The wavelet coefficients associated with each node of the graph encode information about the signal under analysis considering all nodes in its neighborhood. However, understanding and extracting insight out of this wealth of information can be a challenging task. In this chapter, we will briefly review how the wavelet coefficients can be interpret...
The reconstruction of a discrete surface from a point cloud is a fundamental geometry processing problem that has been studied for decades, with many methods developed. We propose the use of a deep neural network as a geometric prior for surface reconstruction. Specifically, we overfit a neural network representing a local chart parameterization to...
An understanding of pedestrians dynamics is indispensable for numerous urban applications including the design of transportation networks and planing for business development. Pedestrian counting often requires utilizing manual or technical means to count individual pedestrians in each location of interest. However, such methods do not scale to the...
Geographical maps encoded with rainbow color scales are widely used for spatial data analysis in climate science, despite evidence from the visualization literature that they are not perceptually optimal. We present a controlled user study that compares the effect of color scales on performance accuracy for climate-modeling tasks using pairs of con...
Visual analytics systems can greatly help in the analysis of urban data allowing domain experts from academia and city governments to better understand cities, and thus enable better operations, informed planning and policies. Effectively designing these systems is challenging and requires bringing together methods from different domains. In this p...
We introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta
reinforcement learning using sequence models with self play. AlphaD3M is based on edit
operations performed over machine learning pipeline primitives providing explainability.
We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker,
and TPO...
In sports, Play Diagrams are the standard way to represent and convey information. They are widely used by coaches, managers, journalists and fans in general. There are situations where diagrams may be hard to understand, for example, when several actions are packed in a certain region of the field or there are just too many actions to be transform...
Advances in technology coupled with the availability of low‐cost sensors have resulted in the continuous generation of large time series from several sources. In order to visually explore and compare these time series at different scales, analysts need to execute online analytical processing (OLAP) queries that include constraints and group‐by's at...
We present a novel method to generate quad meshes for non-rigid objects. Our method takes into account the geometry of a collection of key poses in one-to-one correspondence or even an entire animation sequence. From this input, on a common computational domain, an extremal metric is computed that captures the local worst case behavior in terms of...
Large scale shadows from buildings in a city play an important role in determining the environmental quality of public spaces. They can be both beneficial, such as for pedestrians during summer, and detrimental, by impacting vegetation and by blocking direct sunlight. Determining the effects of shadows requires the accumulation of shadows over time...
Art historians have traditionally used physical light boxes to prepare exhibits or curate collections. On a light box, they can place slides or printed images, move the images around at will, group them as desired, and visually compare them. The transition to digital images has rendered this workflow obsolete. Now, art historians lack well-designed...