Abolfazl Asudeh

Abolfazl Asudeh
University of Illinois at Chicago | UIC · Department of Computer Science

PhD

About

81
Publications
4,889
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
901
Citations
Citations since 2017
72 Research Items
893 Citations
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250
2017201820192020202120222023050100150200250
Introduction
Abolfazl Asudeh is an assistant professor of CS at the UIC and the director of InDeX Lab. His research spans to Big Data and Data Science, including Data Management, Information Retrieval, and Data Mining, for which he designs efficient, accurate, and scalable algorithmic solutions. Data Equity Systems, Algorithmic Fairness, and Data-centric AI are his major focus in research. His research interest also includes Ranking, Social Networks, Fact Checking, DM4ML, and applied Computational Geometry.

Publications

Publications (81)
Article
Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas, such as network traffic engineering, medical image reconstruction, acoustics, astronom...
Preprint
Full-text available
The detection of fake news has received increasing attention over the past few years, but there are more subtle ways of deceiving one's audience. In addition to the content of news stories, their presentation can also be made misleading or biased. In this work, we study the impact of the ordering of news stories on audience perception. We introduce...
Preprint
Full-text available
Deep neural networks are superior to shallow networks in learning complex representations. As such, there is a fast-growing interest in utilizing them in large-scale settings. The training process of neural networks is already known to be time-consuming, and having a deep architecture only aggravates the issue. This process consists mostly of matri...
Preprint
Full-text available
Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap,...
Article
Full-text available
Ensuring fairness in computational problems has emerged as a key topic during recent years, buoyed by considerations for equitable resource distributions and social justice. It is possible to incorporate fairness in computational problems from several perspectives, such as using optimization, game-theoretic or machine learning frameworks. In this p...
Article
Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap,...
Preprint
Full-text available
Web 2.0 recommendation systems, such as Yelp, connect users and businesses so that users can identify new businesses and simultaneously express their experiences in the form of reviews. Yelp recommendation software moderates user-provided content by categorizing them into recommended and not-recommended sections. Due to Yelp's substantial popularit...
Preprint
Full-text available
The maximum independent set problem is a classical NP-hard problem in theoretical computer science. In this work, we study a special case where the family of graphs considered is restricted to intersection graphs of sets of axis-aligned hyperrectangles and the input is provided in an online fashion. We prove bounds on the competitive ratio of an op...
Preprint
Full-text available
There is a large amount of work constructing hashmaps to minimize the number of collisions. However, to the best of our knowledge no known hashing technique guarantees group fairness among different groups of items. We are given a set $P$ of $n$ tuples in $\mathbb{R}^d$, for a constant dimension $d$ and a set of groups $\mathcal{G}=\{\mathbf{g}_1,\...
Preprint
Full-text available
Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robu...
Preprint
Full-text available
Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attrib...
Preprint
Full-text available
Historical systematic exclusionary tactics based on race have forced people of certain demographic groups to congregate in specific urban areas. Aside from the ethical aspects of such segregation, these policies have implications for the allocation of urban resources including public transportation, healthcare, and education within the cities. The...
Preprint
Full-text available
Despite the potential benefits of machine learning (ML) in high-risk decision-making domains, the deployment of ML is not accessible to practitioners, and there is a risk of discrimination. To establish trust and acceptance of ML in such domains, democratizing ML tools and fairness consideration are crucial. In this paper, we introduce FairPilot, a...
Article
Full-text available
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that...
Article
Full-text available
Addressing the increasing demand for data exchange has led to the development of data markets that facilitate transactional interactions between data buyers and data sellers. Still, cost-effective and distribution-aware query answering is a substantial challenge in these environments. In this paper, while differentiating different types of data mar...
Article
Full-text available
Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently...
Preprint
Full-text available
Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently...
Conference Paper
Full-text available
We are being constantly judged by automated decision systems that have been widely criticised for being discriminatory and unfair. Since an algorithm is only as good as the data it works with, biases in the data can significantly amplify unfairness issues. In this paper, we take initial steps towards integrating fairness conditions into database qu...
Preprint
Full-text available
At the same time that AI and machine learning are becoming central to human life, their potential harms become more vivid. In the presence of such drawbacks, a critical question one needs to address before using these data-driven technologies to make a decision is whether to trust their outcomes. Aligned with recent efforts on data-centric AI, this...
Article
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by intera...
Article
Full-text available
Selecting the best items in a dataset is a common task in data exploration. However, the concept of “best” lies in the eyes of the beholder: Different users may consider different attributes more important and, hence, arrive at different rankings. Nevertheless, one can remove “dominated” items and create a “representative” subset of the data, compr...
Preprint
Full-text available
The grand goal of data-driven decision-making is to help humans make decisions, not only easily and at scale but also wisely, accurately, and just. However, data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often miss representing minorities. Representation Bias in data can happen due to va...
Preprint
Full-text available
It is of critical importance to be aware of the historical discrimination embedded in the data and to consider a fairness measure to reduce bias throughout the predictive modeling pipeline. Various notions of fairness have been defined, though choosing an appropriate metric is cumbersome. Trade-offs and impossibility theorems make such selection ev...
Article
Full-text available
Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: it meets desired distribution requirements. Whether data is collected through some experiment or obtained from...
Preprint
Full-text available
Machine learning (ML) is increasingly being used to make decisions in our society. ML models, however, can be unfair to certain demographic groups (e.g., African Americans or females) according to various fairness metrics. Existing techniques for producing fair ML models either are limited to the type of fairness constraints they can handle (e.g.,...
Preprint
Given a data set, misleading conclusions can be drawn from it by cherry-picking selected samples. One important class of conclusions is a trend derived from a data set of values over time. Our goal is to evaluate whether the 'trends' described by the extracted samples are representative of the true situation represented in the data. We demonstrate...
Article
Reconstructing a high dimensional unknown signal, using lower dimensional observations is a challenging problem, known as signal reconstruction problem (SRP), with diverse applications including network traffic engineering, medical image reconstruction, and astronomy. Recently the database community has shown significant advancements in solving the...
Article
We frequently compute a score for each item in a data set, sometimes for its intrinsic value, but more often as a step towards classification, ranking, and so forth. The importance of computing this score fairly cannot be overstated. In this tutorial, we will develop a framework for how to think about this task, and then present techniques for resp...
Article
In today's data-driven world, it is critical that we use appropriate datasets for analysis and decision-making. Datasets could be biased because they reflect existing inequalities in the world, due to the data scientists' biased world view, or due to the data scientists' limited control over the data collection process. For these reasons, it is imp...
Preprint
Ensuring fairness in computational problems has emerged as a $key$ topic during recent years, buoyed by considerations for equitable resource distributions and social justice. It $is$ possible to incorporate fairness in computational problems from several perspectives, such as using optimization, game-theoretic or machine learning frameworks. In th...
Preprint
Machine learning (ML) is increasingly being used in high-stakes applications impacting society. Therefore, it is of critical importance that ML models do not propagate discrimination. Collecting accurate labeled data in societal applications is challenging and costly. Active learning is a promising approach to build an accurate classifier by intera...
Article
Full-text available
Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoustics, astrono...
Article
Full-text available
Poorly supported stories can be told based on data by cherry-picking the data points included. While such stories may be technically accurate, they are misleading. In this paper, we build a system for detecting cherry-picking, with a focus on trendlines extracted from temporal data. We define a support metric for detecting such trendlines. Given a...
Preprint
Full-text available
Bias in training data and proxy attributes are probably the main reasons for bias in machine learning. ML models are trained on historical data that are biased due to the inherent societal bias. This causes unfairness in model outcomes. On the other hand, collecting labeled data in societal applications is challenging and costly. Hence, other attri...
Preprint
Full-text available
Human decision-makers often receive assistance from data-driven algorithmic systems that provide a score for evaluating objects, including individuals. The scores are generated by a function (mechanism) that takes a set of features as input and generates a score.The scoring functions are either machine-learned or human-designed and can be used for...
Article
The signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an under-determined system of linear equations AX = b that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoust...
Article
Given a database with numeric attributes, it is often of interest to rank the tuples according to linear scoring functions. For a scoring function and a subset of tuples, the regret of the subset is defined as the (relative) difference in scores between the top-1 tuple of the subset and the top-1 tuple of the entire database. Finding the regret-rat...
Article
Full-text available
Human decision makers often receive assistance from data-driven algorithmic systems that provide a score for evaluating the quality of items such as products, services, or individuals. These scores can be obtained by combining different features either through a process learned by ML models, or using a weight vector designed by human experts, with...
Conference Paper
Full-text available
Using inappropriate datasets for data science tasks can be harmful, especially for applications that impact humans. Targeting data ethics, we demonstrate MithraLabel, a system for generating task- specific information about a dataset, in the form of a set of visual widgets, as a flexible "nutritional label" that provides a user with information to...
Article
Machine learning (ML) has gained a pivotal role in answering complex predictive analytic queries. Model building for large scale datasets is one of the time consuming parts of the data science pipeline. Often data scientists are willing to sacrifice some accuracy in order to speed up this process during the exploratory phase. In this paper, we prop...
Conference Paper
Full-text available
Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best'' lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated'' items and create a "representative'' subset of the data, com...
Conference Paper
Full-text available
Items from a database are often ranked based on a combination of criteria. The weight given to each criterion in the combination can greatly affect the fairness of the produced ranking, for example, preferring men over women. A user may have the flexibility to choose combinations that weigh these criteria differently, within limits. In this paper,...
Conference Paper
Full-text available
Items from a database are often ranked based on a combination of criteria. The weight given to each criterion in the combination can greatly affect the ranking produced. Often, a user may have a general sense of the relative importance of the different criteria, but beyond this may have the flexibility, within limits, to choose combinations that we...
Chapter
Peer to peer marketplaces enable transactional exchange of services directly between people. In such platforms, those providing a service are faced with various choices. For example in travel peer to peer marketplaces, although some amenities (attributes) in a property are fixed, others are relatively flexible and can be provided without significan...
Article
Items from a database are often ranked based on a combination of multiple criteria. Often, a user may have flexibility to accept combinations that weight these criteria differently, within limits. On the other hand, this choice of weights can greatly affect the fairness of the ranking produced. In this paper, we develop a system that helps users ch...
Article
Full-text available
Selecting the best items in a dataset is a common task in data exploration. However, the concept of ``best'' lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove ``dominated'' items and create a ``representative'' subset of the data s...
Article
We often have to rank items with multiple attributes in a dataset. A typical method to achieve this is to compute a goodness score for each item as a weighted sum of its attribute values, and then to rank by sorting on this score. Clearly, the ranking obtained depends on the weights used for this summation. Ideally, we would want the ranked order n...
Preprint
Full-text available
Data analysis impacts virtually every aspect of our society today. Often, this analysis is performed on an existing dataset, possibly collected through a process that the data scientists had limited control over. The existing data analyzed may not include the complete universe, but it is expected to cover the diversity of items in the universe. Lac...
Preprint
Full-text available
The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that us...
Article
Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approxi...
Article
Machine learning has become an essential toolkit for complex analytic processing. Data is typically stored in large data warehouses with multiple dimension hierarchies. Often, data used for building an ML model are aligned on OLAP hierarchies such as location or time. In this paper, we investigate the feasibility of efficiently constructing approxi...
Article
Signal reconstruction problem (SRP) is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior. It has a substantial number of applications in diverse areas including network traffic engineering, medical image reconstruction, acoustics, astrono...
Preprint
Full-text available
We often have to rank items with multiple attributes in a dataset. A typical method to achieve this is to compute a goodness score for each item as a weighted sum of its attribute values, and then to rank by sorting on this score. Clearly, the ranking obtained depends on the weights used for this summation. Ideally, we would want the ranked order n...
Article
Algorithmic decisions often result in scoring and ranking individuals to determine credit worthiness, qualifications for college admissions and employment, and compatibility as dating partners. While automatic and seemingly objective, ranking algorithms can discriminate against individuals and protected groups, and exhibit low diversity. Furthermor...
Conference Paper
Full-text available
Platforms such as AirBnB, Zillow, Yelp, and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out whi...
Conference Paper
Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. A critical observation is that the em convex hull is the subset of tuples that can be used to find the maxima of any linear function. However, in real world application...
Article
Full-text available
Peer to peer marketplaces such as AirBnB enable transactional exchange of services directly between people. In such platforms, those providing a service (hosts in AirBnB) are faced with various choices. For example in AirBnB, although some amenities in a property (attributes of the property) are fixed, others are relatively flexible and can be prov...
Article
Full-text available
Platforms such as AirBnB, TripAdvisor, Yelp and related sites have transformed the way we search for accommodation, restaurants, etc. The underlying datasets in such applications have numerous attributes that are mostly Boolean or Categorical. Discovering the skyline of such datasets over a subset of attributes would identify entries that stand out...
Article
Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. A critical observation is that the convex hull is the subset of tuples that can be used to find the maxima of any linear function. However, in real world applications t...
Article
Full-text available
Many web databases are "hidden" behind proprietary search interfaces that enforce the top-$k$ output constraint, i.e., each query returns at most $k$ of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top-$k$ h...
Article
Full-text available
The ranked retrieval model has rapidly become the de facto way for search query processing in client-server databases, especially those on the web. Despite of the extensive efforts in the database community on designing better ranking functions/mechanisms, many such databases in practice still fail to address the diverse and sometimes contradicting...
Article
Users and developers are tapping into big, complex entity graphs for numerous applications. It is challenging to select entity graphs for a particular need, given abundant datasets from many sources and the oftentimes scarce information available for them. We propose methods to automatically produce preview tables for entity graphs, for compact pre...
Conference Paper
Full-text available
Users are tapping into massive, heterogeneous entity graphs for many applications. It is challenging to select entity graphs for a particular need, given abundant datasets from many sources and the oftentimes scarce information for them. We propose methods to produce preview tables for compact presentation of important entity types and relationship...
Article
Wireless Sensor Networks (WSNs) are being deployed for different applications, each having its own structure, goals and requirements. Medium access control (MAC) protocols play a significant role in WSNs and hence should be tuned to the applications. However, there is no for selecting MAC protocols for different situations. Therefore, it is hard to...