Preprint

Much Ado About Gender: Current Practices and Future Recommendations for Appropriate Gender-Aware Information Access

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Information access research (and development) sometimes makes use of gender, whether to report on the demographics of participants in a user study, as inputs to personalized results or recommendations, or to make systems gender-fair, amongst other purposes. This work makes a variety of assumptions about gender, however, that are not necessarily aligned with current understandings of what gender is, how it should be encoded, and how a gender variable should be ethically used. In this work, we present a systematic review of papers on information retrieval and recommender systems that mention gender in order to document how gender is currently being used in this field. We find that most papers mentioning gender do not use an explicit gender variable, but most of those that do either focus on contextualizing results of model performance, personalizing a system based on assumptions of user gender, or auditing a model's behavior for fairness or other privacy-related issues. Moreover, most of the papers we review rely on a binary notion of gender, even if they acknowledge that gender cannot be split into two categories. We connect these findings with scholarship on gender theory and recent work on gender in human-computer interaction and natural language processing. We conclude by making recommendations for ethical and well-grounded use of gender in building and researching information access systems.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Scholars are increasingly concerned about social biases in facial analysis systems, particularly with regard to the tangible consequences of misidentification of marginalized groups. However, few have examined how automated facial analysis technologies intersect with the historical genealogy of racialized gender—the gender binary and its classification as a highly racialized tool of colonial power and control. In this paper, we introduce the concept of auto-essentialization: the use of automated technologies to re-inscribe the essential notions of difference that were established under colonial rule. We consider how the face has emerged as a legitimate site of gender classification, despite being historically tied to projects of racial domination. We examine the history of gendering the face and body, from colonial projects aimed at disciplining bodies which do not fit within the European gender binary, to sexology's role in normalizing that binary, to physiognomic practices that ascribed notions of inferiority to non-European groups and women. We argue that the contemporary auto-essentialization of gender via the face is both racialized and trans-exclusive: it asserts a fixed gender binary and it elevates the white face as the ultimate model of gender difference. We demonstrate that imperialist ideologies are reflected in modern automated facial analysis tools in computer vision through two case studies: (1) commercial gender classification and (2) the security of both small-scale (women-only online platforms) and large-scale (national borders) spaces. Thus, we posit a rethinking of ethical attention to these systems: not as immature and novel, but as mature instantiations of much older technologies.
Conference Paper
Full-text available
As recommender systems play an important role in everyday life, there is an increasing pressure that such systems are fair. Besides serving diverse groups of users, recommenders need to represent and serve item providers fairly as well. In interviews with music artists, we identified that gender fairness is one of the artists' main concerns. They emphasized that female artists should be given more exposure in music recommendations. We analyze a widely-used collaborative filtering approach with two public datasets-enriched with gender information-to understand how this approach performs with respect to the artists' gender. To achieve gender balance, we propose a progressive re-ranking method that is based on the insights from the interviews. For the evaluation, we rely on a simulation of feedback loops and provide an in-depth analysis using state-of-the-art performance measures and metrics concerning gen-der fairness.
Article
Full-text available
Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of these patterns reflect important real-world phenomena driving interactions between the various users and items; other patterns may be irrelevant or reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to one dimension of social concern, namely content creator gender. Using publicly available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms tend to propagate at least some of each user’s tendency to rate or read male or female authors into their resulting recommendations, although they differ in both the strength of this propagation and the variance in the gender balance of the recommendation lists they produce. The data, experimental design, and statistical methods are designed to be reusable for studying potentially discriminatory social dimensions of recommendations in other domains and settings as well.
Conference Paper
Full-text available
Manually extracting relevant aspects and opinions from large volumes of user-generated text is a time-consuming process. Summaries, on the other hand, help readers with limited time budgets to quickly consume the key ideas from the data. State-of-the-art approaches for multi-document summarization, however, do not consider user preferences while generating summaries. In this work, we argue the need and propose a solution for generating personalized aspect-based opinion summaries from large collections of online tourist reviews. We let our readers decide and control several attributes of the summary such as the length and specific aspects of interest among others. Specifically, we take an unsupervised approach to extract coherent aspects from tourist reviews posted on TripAdvisor. We then propose an Integer Linear Programming (ILP) based extractive technique to select an informative subset of opinions around the identified aspects while respecting the user-specified values for various control parameters. Finally, we evaluate and compare our summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results.
Conference Paper
Full-text available
To hold a true conversation, an intelligent agent should be able to occasionally take initiative and recommend the next natural conversation topic. This is a challenging task. A topic suggested by the agent should be relevant to the person, appropriate for the conversation context, and the agent should have something interesting to say about it. Thus, a scripted, or one-size-fits-all, popularity-based topic suggestion is doomed to fail. Instead, we explore different methods for a personalized, contextual topic suggestion for open-domain conversations. We formalize the Conversational Topic Suggestion problem (CTS) to more clearly identify the assumptions and requirements. We also explore three possible approaches to solve this problem: (1) model-based sequential topic suggestion to capture the conversation context (CTS-Seq), (2) Collaborative Filtering-based suggestion to capture previous successful conversations from similar users (CTS-CF), and (3) a hybrid approach combining both conversation context and collaborative filtering. To evaluate the effectiveness of these methods, we use real conversations collected as part of the Amazon Alexa Prize 2018 Conversational AI challenge. The results are promising: the CTS-Seq model suggests topics with 23% higher accuracy than the baseline, and incorporating collaborative filtering signals into a hybrid CTS-Seq-CF model further improves recommendation accuracy by 12%. Together, our proposed models, experiments, and analysis significantly advance the study of open-domain conversational agents, and suggest promising directions for future improvements.
Preprint
Full-text available
The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.
Conference Paper
Full-text available
We analyze the effect of a smile in personas pictures on persona perceptions, including credibility, likability, similarity, and willingness to use. We conduct an online experiment with 2,400 participants using a 16-item survey and multiple persona profile treatments of which half have a smiling photo and half do not. We find that persona profiles with a smiling photo result in an increase in perceived similarity with, likability of, and willingness to use the personas. In contrast, a smile does not increase the credibility of the personas. Our research has implications for the design of persona profiles and adds to previous findings of persona research that the picture choice influences individuals' persona perceptions in profound ways.
Article
Full-text available
User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on feature engineering and shallow modeling. Recently, deep neural networks have attracted research attention on such a problem for their high capacity and end-to-end training scheme. In this article, we study user response prediction in the scenario of click prediction. We first analyze a coupled gradient issue in latent vector-based models and propose kernel product to learn field-aware feature interactions. Then, we discuss an insensitive gradient issue in DNN-based models and propose Product-based Neural Network, which adopts a feature extractor to explore feature interactions. Generalizing the kernel product to a net-in-net architecture, we further propose Product-network in Network (PIN), which can generalize previous models. Extensive experiments on four industrial datasets and one contest dataset demonstrate that our models consistently outperform eight baselines on both area under curve and log loss. Besides, PIN makes great click-through rate improvement (relatively 34.67%) in online A/B test.
Conference Paper
Full-text available
There is growing evidence that search engines produce results that are socially biased, reinforcing a view of the world that aligns with prevalent social stereotypes. One means to promote greater transparency of search algorithms - which are typically complex and proprietary - is to raise user awareness of biased result sets. However, to date, little is known concerning how users perceive bias in search results, and the degree to which their perceptions differ and/or might be predicted based on user attributes. One particular area of search that has recently gained attention, and forms the focus of this study, is image retrieval and gender bias. We conduct a controlled experiment via crowdsourcing using participants recruited from three countries to measure the extent to which workers perceive a given image results set to be subjective or objective. Demographic information about the workers, along with measures of sexism, are gathered and analysed to investigate whether (gender) biases in the image search results can be detected. Amongst other findings, the results confirm that sexist people are less likely to detect and report gender biases in image search results.
Conference Paper
Full-text available
Automatic Gender Recognition (AGR) refers to various computational methods that aim to identify an individual's gender by extracting and analyzing features from images, video, and/or audio. Applications of AGR are increasingly being explored in domains such as security, marketing, and social robotics. However, little is known about stakeholders' perceptions and attitudes towards AGR and how this technology might disproportionately affect vulnerable communities. To begin to address these gaps, we interviewed 13 transgender individuals, including three transgender technology designers, about their perceptions and attitudes towards AGR. We found that transgender individuals have overwhelmingly negative attitudes towards AGR and fundamentally question whether it can accurately recognize such a subjective aspect of their identity. They raised concerns about privacy and potential harms that can result from being incorrectly gendered, or misgendered, by technology. We present a series of recommendations on how to accommodate gender diversity when designing new digital systems.
Conference Paper
Full-text available
To more effectively convey relevant information to end users of persona profiles, we conducted a user study consisting of 29 participants engaging with three persona layout treatments. We were interested in confusion engendered by the treatments on the participants, and conducted a within-subjects study in the actual work environment, using eye-tracking and talk-aloud data collection. We coded the verbal data into classes of informativeness and confusion and correlated it with fixations and durations on the Areas of Interests recorded by the eye-tracking device. We used various analysis techniques, including Mann-Whitney, regression, and Levenshtein distance, to investigate how confused users differed from non-confused users, what information of the personas caused confusion, and what were the predictors of confusion of end users of personas. We consolidate our various findings into a confusion ratio measure, which highlights in a succinct manner the most confusing elements of the personas. Findings show that inconsistencies among the informational elements of the persona generate the most confusion, especially with the elements of images and social media quotes. The research has implications for the design of personas and related information products, such as user profiling and customer segmentation.
Article
Ranking items or people is a fundamental operation at the basis of several processes and services, not all of them happening online. Ranking is required for different tasks, including search, personalization, recommendation, and filtering. While traditionally ranking has been aimed solely at maximizing some global utility function, recently the awareness of potential discrimination for some of the elements to rank, has captured the attention of researchers, which have thus started devising ranking systems which are non-discriminatory or fair for the items being ranked. So far, researchers have mostly focused on group fairness, which is usually expressed in the form of constraints on the fraction of elements from some protected groups that should be included in the top-k positions, for any relevant k. These constraints are needed in order to correct implicit societal biases existing in the input data and reflected in the relevance or fitness score computed. In this article, we tackle the problem of selecting a subset of k individuals from a pool of n≫k candidates, maximizing global utility (i.e., selecting the “best” candidates) while respecting given group-fairness criteria. In particular, to tackle this Fair Top-k Ranking problem, we adopt a ranked group-fairness definition which extends the standard notion of group fairness based on protected groups, by ensuring that the proportion of protected candidates in every prefix of the top-k ranking remains statistically above, or indistinguishable from, a given minimum threshold. Our notion of utility requires, intuitively, that every individual included in the top-k should be more qualified than every candidate not included; and that for every pair of candidates in the top-k, the more qualified candidate should be ranked above. The main contribution of this paper is an algorithm for producing a fair top-k ranking that can be used when more than one protected group is present, which means that a statistical test based on a multinomial distribution needs to be used instead of one for a binomial distribution, as the original FA*IR algorithms does. This poses important technical challenges and increases both the space and time complexity of the re-ranking algorithm. Our experimental assessment on real-world datasets shows that our approach yields small distortions with respect to rankings that maximize utility without considering our fairness criteria.
Article
Algorithmically-mediated content is both a product and producer of dominant social narratives, and it has the potential to impact users' beliefs and behaviors. We present two studies on the content and impact of gender and racial representation in image search results for common occupations. In Study 1, we compare 2020 workforce gender and racial composition to that reflected in image search. We find evidence of underrepresentation on both dimensions: women are underrepresented in search at a rate of 42% women for a field with 50% women; people of color are underrepresented with 16% in search compared to an occupation with 22% people of color (the latter being proportional to the U.S. workforce). We also compare our gender representation data with that collected in 2015 by Kay et al., finding little improvement in the last half-decade. In Study 2, we study people's impressions of occupations and sense of belonging in a given field when shown search results with different proportions of women and people of color. We find that both axes of representation as well as people's own racial and gender identities impact their experience of image search results. We conclude by emphasizing the need for designers and auditors of algorithms to consider the disparate impacts of algorithmic content on users of marginalized identities.
Article
Race and gender have long sociopolitical histories of classification in technical infrastructures-from the passport to social media. Facial analysis technologies are particularly pertinent to understanding how identity is operationalized in new technical systems. What facial analysis technologies can do is determined by the data available to train and evaluate them with. In this study, we specifically focus on this data by examining how race and gender are defined and annotated in image databases used for facial analysis. We found that the majority of image databases rarely contain underlying source material for how those identities are defined. Further, when they are annotated with race and gender information, database authors rarely describe the process of annotation. Instead, classifications of race and gender are portrayed as insignificant, indisputable, and apolitical. We discuss the limitations of these approaches given the sociohistorical nature of race and gender. We posit that the lack of critical engagement with this nature renders databases opaque and less trustworthy. We conclude by encouraging database authors to address both the histories of classification inherently embedded into race and gender, as well as their positionality in embedding such classifications.
Conference Paper
Image analysis algorithms have been a boon to personalization in digital systems and are now widely available via easy-to-use APIs. However, it is important to ensure that they behave fairly in applications that involve processing images of people, such as dating apps. We conduct an experiment to shed light on the factors influencing the perception of "fairness. " Participants are shown a photo along with two descriptions (human-and algorithm-generated). They are then asked to indicate which is "more fair" in the context of a dating site, and explain their reasoning. We vary a number of factors, including the gender, race and attractiveness of the person in the photo. While participants generally found human-generated tags to be more fair, API tags were judged as being more fair in one setting-where the image depicted an "attractive, " white individual. In their explanations, participants often mention accuracy, as well as the objectivity/subjectivity of the tags in the description. We relate our work to the ongoing conversation about fairness in opaque tools like image tagging APIs, and their potential to result in harm.
Article
Investigations of facial analysis (FA) technologies-such as facial detection and facial recognition-have been central to discussions about Artificial Intelligence's (AI) impact on human beings. Research on automatic gender recognition, the classification of gender by FA technologies, has raised potential concerns around issues of racial and gender bias. In this study, we augment past work with empirical data by conducting a systematic analysis of how gender classification and gender labeling in computer vision services operate when faced with gender diversity. We sought to understand how gender is concretely conceptualized and encoded into commercial facial analysis and image labeling technologies available today. We then conducted a two-phase study: (1) a system analysis of ten commercial FA and image labeling services and (2) an evaluation of five services using a custom dataset of diverse genders using self-labeled Instagram images. Our analysis highlights how gender is codified into both classifiers and data standards. We found that FA services performed consistently worse on transgender individuals and were universally unable to classify non-binary genders. In contrast, image labeling often presented multiple gendered concepts. We also found that user perceptions about gender performance and identity contradict the way gender performance is encoded into the computer vision infrastructure. We discuss our findings from three perspectives of gender identity (self-identity, gender performativity, and demographic identity) and how these perspectives interact across three layers: the classification infrastructure, the third-party applications that make use of that infrastructure, and the individuals who interact with that software. We employ Bowker and Star's concepts of "torque" and "residuality" to further discuss the social implications of gender classification. We conclude by outlining opportunities for creating more inclusive classification infrastructures and datasets, as well as with implications for policy.
Conference Paper
Research in Recommender Systems evaluation remains critical to study the efficiency of developed algorithms. Even if different aspects have been addressed and some of its shortcomings - such as biases, robustness, or cold start - have been analyzed and solutions or guidelines have been proposed, there are still some gaps that need to be further investigated. At the same time, the increasing amount of data collected by most recommender systems allows to gather valuable information from users and items which is being neglected by classical offline evaluation metrics. In this work, we integrate such information into the evaluation process in two complementary ways: on the one hand, we aggregate any evaluation metric according to the groups defined by the user attributes, and, on the other hand, we exploit item attributes to consider some recommended items as surrogates of those interacted by the user, with a proper penalization. Our results evidence that this novel evaluation methodology allows to capture different nuances of the algorithms performance, inherent biases in the data, and even fairness of the recommendations.
Conference Paper
Businesses, such as Amazon, department store chains, home furnishing store chains, Uber, and Lyft, frequently offer deals, product discounts and incentives to drive sales, increase new product acceptance and engage with users. In order to appeal to diverse user groups, these businesses typically design more than one promotion offer but market different ones to different users. For instance, Uber offers a percentage discount in the rides to some users and a low fixed price to others. In this paper, we propose solutions to optimally recommend promotions and items to maximize user conversion constrained by user eligibility and item or offer capacity (limited quantity of items or offers) simultaneously. We achieve this through an offer recommendation model based on Min-Cost Flow network optimization, which enables us to satisfy the constraints within the optimization itself and solve it in polynomial time. We present two approaches that can be used in various settings: single period solution and sequential time period offering. We evaluate these approaches against competing methods using counterfactual evaluation in offline mode. We also discuss three practical aspects that may affect the online performance of constrained optimization: capacity determination, traffic arrival pattern and clustering for large scale setting.
Conference Paper
The Precision Medicine (PM) track at the Text REtrieval Conference (TREC) focuses on providing useful precision medicine-related information to clinicians treating cancer patients. The PM track gives the unique opportunity to evaluate medical IR systems using the same set of topics on two different collections: scientific literature and clinical trials. In the paper, we take advantage of this opportunity and we propose and evaluate state-of-the-art query expansion and reduction techniques to identify whether a particular approach can be helpful in both scientific literature and clinical trial retrieval. We present those approaches that are consistently effective in both TREC editions and we compare the results obtained with the best performing runs submitted to TREC PM 2017 and 2018.
Article
In the field of sentiment analysis and emotion detection in social media, or other tasks such as text classification involving supervised learning, researchers rely more heavily on large and accurate labelled training datasets. However, obtaining large-scale labelled datasets is time-consuming and high-quality labelled datasets are expensive and scarce. To deal with these problems, online crowdsourcing systems provide us an efficient way to accelerate the process of collecting training data via distributing the enormous tasks to various annotators to help create large amounts of labelled data at an affordable cost. Nowadays, these crowdsourcing platforms are heavily needed in dealing with social media text, since the social network platforms (e.g., Twitter) generate huge amounts of data in textual form everyday. However, people from different social and knowledge backgrounds have different views on various texts, which may lead to noisy labels. The existing noisy label aggregation/refinement algorithms mostly focus on aggregating labels from noisy annotations, which would not guarantee their effectiveness on the subsequent classification/ranking tasks. In this article, we propose a noise-aware classification framework that integrates the steps of noisy label aggregation and classification. The aggregated noisy crowd labels are fed into a classifier for training, while the predicted labels are employed as feedback for adjusting the parameters at the label aggregating stage. The classification framework is suitable for directly running on crowdsourcing datasets and applies to various kinds of classification algorithms. The feedback strategy makes it possible for us to find optimal parameters instead of using known data for parameter selection. Simulation experiments demonstrate that our method provide significant label aggregation performance for both binary and multiple classification tasks under various noisy environments. Experimenting on real-world data validates the feasibility of our framework in real noise data and helps us verify the reasonableness of the simulated experiment settings.
Article
Automatic Gender Recognition (AGR) is a subfield of facial recognition that aims to algorithmically identify the gender of individuals from photographs or videos. In wider society the technology has proposed applications in physical access control, data analytics and advertising. Within academia, it is already used in the field of Human-Computer Interaction (HCI) to analyse social media usage. Given the long-running critiques of HCI for failing to consider and include transgender (trans) perspectives in research, and the potential implications of AGR for trans people if deployed, I sought to understand how AGR and HCI understand the term "gender", and how HCI describes and deploys gender recognition technology. Using a content analysis of papers from both fields, I show that AGR consistently operationalises gender in a trans-exclusive way, and consequently carries disproportionate risk for trans people subject to it. In addition, I use the dearth of discussion of this in HCI papers that apply AGR to discuss how HCI operationalises gender, and the implications that this has for the field's research. I conclude with recommendations for alternatives to AGR, and some ideas for how HCI can work towards a more effective and trans-inclusive treatment of gender.
Conference Paper
Automated platforms which support users in finding a mutually beneficial match, such as online dating and job recruitment sites, are becoming increasingly popular. These platforms often include recommender systems that assist users in finding a suitable match. While recommender systems which provide explanations for their recommendations have shown many benefits, explanation methods have yet to be adapted and tested in recommending suitable matches. In this paper, we introduce and extensively evaluate the use of "reciprocal explanations" - explanations which provide reasoning as to why both parties are expected to benefit from the match. Through an extensive empirical evaluation, in both simulated and real-world dating platforms with 287 human participants, we find that when the acceptance of a recommendation involves a significant cost (e.g., monetary or emotional), reciprocal explanations outperform standard explanation methods, which consider the recommendation receiver alone. However, contrary to what one may expect, when the cost of accepting a recommendation is negligible, reciprocal explanations are shown to be less effective than the traditional explanation methods.
Conference Paper
Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility.
Preprint
Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of the patterns in rating datasets reflect important real-world differences between the various users and items in the data; other patterns may be irrelevant or possibly undesirable for social or ethical reasons, particularly if they reflect undesired discrimination, such as discrimination in publishing or purchasing against authors who are women or ethnic minorities. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to a dimension of social concern, namely content creator gender. Using publicly-available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms differ in the gender distribution of their recommendation lists, and in the relationship of that output distribution to user profile distribution.
Conference Paper
The trade-off between relevance and fairness in personalized recommendations has been explored in recent works, with the goal of minimizing learned discrimination towards certain demographics while still producing relevant results. We present a fairness-aware variation of the Maximal Marginal Relevance (MMR) re-ranking method which uses representations of demographic groups computed using a labeled dataset. This method is intended to incorporate fairness with respect to these demographic groups. We perform an experiment on a stock photo dataset and examine the trade-off between relevance and fairness against a well known baseline, MMR, by using human judgment to examine the results of the re-ranking when using different fractions of a labeled dataset, and by performing a quantitative analysis on the ranked results of a set of query images. We show that our proposed method can incorporate fairness in the ranked results while obtaining higher precision than the baseline, while our case study shows that even a limited amount of labeled data can be used to compute the representations to obtain fairness. This method can be used as a post-processing step for recommender systems and search.
Conference Paper
Inferring socio-demographic attributes of users is an important and challenging task that could help with personalization, recommendation, advertising, etc. Sensor data collected from mobile devices can be utilized for inferring such attributes. Previous works have focused on combining different types of sensors, such as applications, accelerometer, GPS, battery, and many others, to achieve this task. In this study, we were able to infer attributes, such as gender, age, marital status, and whether the user has children, using solely the GPS sensor. We suggest a novel inference technique, which learns an embedding representation of preprocessed spatial GPS trajectories using an adaption of the Word2vec approach. Based on the embedding representation, we later train multiple classification models to achieve the inference goals. Our empirical results indicate that the suggested embedding approach outperforms a classification approach which does not take into consideration the embedding patterns. Experiments on real datasets collected from Android devices show that the proposed method achieves over 80% accuracy for various demographic prediction tasks.
Conference Paper
This paper presents a comprehensive analysis on social media use and engagement by age and gender on Instagram. We define five user age groups (from 10s to 50s) and two user gender groups (males and females), and compare them based on three aspects: activity, image object, and tag. We especially excluded the information that indicates human (e.g., selfies, faces, body) for each aspect in order to examine whether users are still identifiable without that information. Our study results indicate that each user group exhibits unique characteristics and the features from each aspect can be used to develop user classification models (82% for gender and 43% for age classification) without relying on the information that specifically indicates age and gender.
Conference Paper
The booming of social networks has given rise to a large volume of user-generated contents (UGCs), most of which are free and publicly available. A lot of users' personal aspects can be extracted from these UGCs to facilitate personalized applications as validated by many previous studies. Despite their value, UGCs can place users at high privacy risks, which thus far remains largely untapped. Privacy is defined as the individual's ability to control what information is disclosed, to whom, when and under what circumstances. As people and information both play significant roles, privacy has been elaborated as a boundary regulation process, where individuals regulate interaction with others by altering the openness degree of themselves to others. In this paper, we aim to reduce users' privacy risks on social networks by answering the question of Who Can See What. Towards this goal, we present a novel scheme, comprising of descriptive, predictive and prescriptive components. In particular, we first collect a set of posts and extract a group of privacy-oriented features to describe the posts. We then propose a novel taxonomy-guided multi-task learning model to predict which personal aspects are uncovered by the posts. Lastly, we construct standard guidelines by the user study with 400 users to regularize users' actions for preventing their privacy leakage. Extensive experiments on a real-world dataset well verified our scheme.
Conference Paper
Nowadays natural language user interfaces, such as chatbots and conversational agents, are very common. A desirable trait of such applications is a sense of humor. It is, therefore, important to be able to measure quality of humorous responses. However, humor evaluation is hard since humor is highly subjective. To address this problem, we conducted an online evaluation of 30 dialog jokes from different sources by almost 300 participants -- volunteers and Mechanical Turk workers. We collected joke ratings along with participants» age, gender, and language proficiency. Results show that demographics and joke topics can partly explain variation in humor judgments. We expect that these insights will aid humor evaluation and interpretation. The findings can also be of interest for humor generation methods in conversational systems.
Article
This article examines the concept of sex as a biological “fact” in Western science from the eighteenth century to the present and, in particular, how the binary definition of sex has been maintained despite empirical flaws and contradictory data. Since the 1990s, feminist science studies scholars have produced detailed empirical accounts of the different historical periods of the sciences of sex, analyzing how different scientific disciplines have focused on various body parts as markers of biological sex, including anatomy, gonads, chromosomes, hormones, genes, and the brain. Despite all these works, I argue that the concrete mechanisms by which binary sex has operated have yet to be understood. This article deals with the history, changes, fractures, and alliances in sex research and traces how the uncritical commitment to the two-sex model has operated in the different areas of the biological sciences. Further, I show how the different variables of sex have intertwined with each other through their scientific history in ways that have been used to maintain a shaky yet persistent concept of binary sex.
Conference Paper
It is increasingly possible to use cameras and sensors to detect and analyze human appearance for the purposes of personalizing user experiences. Such systems are already deployed in some public places to personalize advertisements and recommend items. However, since these technologies are not yet widespread, we do not have a good sense of users' perceptions of the benefits and drawbacks of public display systems that use face detection as an input for personalized recommendations. We conducted a user study with a system that inferred participants' gender and age from a facial detection and analysis algorithm and used this to present recommendations in two scenarios (finding stores to visit in a mall and finding a pair of sunglasses to buy). This work provides an initial step towards understanding user reactions to a new and emerging form of implicit recommendation based on physical appearance.
Conference Paper
With the advancement of mobile computing technology and cloud-based streaming music service, user-centered music retrieval has become increasingly important. User-specific information has a fundamental impact on personal music preferences and interests. However, existing research pays little attention to the modeling and integration of user-specific information in music retrieval algorithms/models to facilitate music search. In this paper, we propose a novel model, named User-Information-Aware Music Interest Topic (UIA-MIT) model. The model is able to effectively capture the influence of user-specific information on music preferences, and further associate users' music preferences and search terms under the same latent space. Based on this model, a user information aware retrieval system is developed, which can search and re-rank the results based on age- and/or gender-specific music preferences. A comprehensive experimental study demonstrates that our methods can significantly improve the search accuracy over existing text-based music retrieval methods.