Summary of auditing and ranking results on the Bank Marketing dataset using the trustworthiness index given in (2) for α = 0.1. This means that the uncertainty is taking into account in both model selection and in ranking the synthetic data using the trustworthiness index. (a) and (b) show trust dimension indices π T (where 'T' corresponds to Fidelity, Privacy, Utility, Fairness, or Robustness), and their "variance" (∆ T ) on TrustFormer (TF) and baseline models. The format is π T (∆ T )|| Name of the synthetic data model. (c) shows the ranking of the models across different trustworthiness profiles ω given in Table II.

Summary of auditing and ranking results on the Bank Marketing dataset using the trustworthiness index given in (2) for α = 0.1. This means that the uncertainty is taking into account in both model selection and in ranking the synthetic data using the trustworthiness index. (a) and (b) show trust dimension indices π T (where 'T' corresponds to Fidelity, Privacy, Utility, Fairness, or Robustness), and their "variance" (∆ T ) on TrustFormer (TF) and baseline models. The format is π T (∆ T )|| Name of the synthetic data model. (c) shows the ranking of the models across different trustworthiness profiles ω given in Table II.

Source publication
Article
Full-text available
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues by enabling a paradigm that relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models i...

Contexts in source publication

Context 1
... information: DOI 10.1109/JETCAS.2024.3477976 α = 0. To account for data split uncertainties, we've included audit results for α = 0.1, depicted in Figure 4 for the Law School dataset, and in the Supplementary Information for the other datasets (see Figures 11 and 12 in Supplementary Information W). ...
Context 2
... effect is achieved through our trustworthiness index model selection that offers control over trust trade-offs. On the other hand, for the law school dataset, TF models fall short with respect to baselines when uncertainty is not considered (α = 0, Supplementary Figure 10, but lead to top performing synthetic data when uncertainty is considered (α = 0.1, Figure 4). This highlights the importance of assessing the uncertainty in auditing synthetic data. ...

Citations

... Computer science research in algorithmic fairness often assumes access to sensitive attributes (or 'protected characteristics' in EU discrimination law terms), which may be reasonable when dealing with publicly available benchmark datasets, but less so in real world auditing contexts [69,130]. Accessing and sharing such data is inherently constrained by the need to protect privacy [127], and poses a significant challenge in widening access to data while safeguarding individuals' privacy [18,120]. In the EU, the Digital Services Act states that audited platforms should anonymize or pseudonymize personal data, unless doing so would render impossible the research purposes [52]. ...
... This approach provides the auditor with a flexible alternative to access individuallevel data, and is presented as a promising approach for private data sharing. As it attracts significant interest from practitioners [18,119,120], it is crucial to evaluate its trustworthiness and suitability for algorithm audits. Synthetic data generation could limit audit reliability by introducing artifacts and random noise that reduce the overall quality of the data as well as remove statistical outliers [12,120], more likely to represent minorities. ...
Preprint
Full-text available
Independent algorithm audits hold the promise of bringing accountability to automated decision-making. However, third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data. To study how these limitations impact research integrity, we conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction. We examine the accuracy of estimating group parity metrics across three levels of access: (a) aggregated statistics, (b) individual-level data with model outputs, and (c) individual-level data without model outputs. Despite selecting one of the simplest tasks for algorithmic auditing, we find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments. We discuss implications for independent auditors, as well as potential avenues for HCI researchers and regulators to improve data access and enable both reliable and holistic evaluations.
... Although not focused specifically on TSMO, this study provides actionable frameworks for ensuring fairness and privacy compliance in transportation-related AI applications. Finally, Belgodere et al. [61] introduce a holistic framework for auditing synthetic datasets, focusing on fidelity, fairness, privacy, and robustness. Their work is particularly relevant for designing trustworthy Gen-AI systems, offering methods for balancing trust trade-offs and ensuring compliance with regulatory safeguards. ...
... These biased GAI predictions may influence the final decisions made by humans. Belgodere et al. (2023) conducted the study on an admission bar exam dataset sourced from the Law School Admission Council (LSAC). This dataset records each student with their personal information of gender, race, Law School Admission Test (LSAT) score, undergraduate GPAs. ...
Preprint
Full-text available
Purpose Generative Artificial Intelligence (GAI) models, such as ChatGPT, may inherit or amplify societal biases due to their training on extensive datasets. With the increasing usage of GAI by students, faculty, and staff in higher education institutions (HEIs), it is urgent to examine the ethical issues and potential biases associated with these technologies. Design/Approach/Methods This scoping review aims to elucidate how biases related to GAI in HEIs have been researched and discussed in recent academic publications. We categorized the potential societal biases that GAI might cause in the field of higher education. Our review includes articles written in English, Chinese, and Japanese across four main databases, focusing on GAI usage in higher education and bias. Findings Our findings reveal that while there is meaningful scholarly discussion around bias and discrimination concerning LLMs in the AI field, most articles addressing higher education approach the issue superficially. Few articles identify specific types of bias under different circumstances, and there is a notable lack of empirical research. Most papers in our review focus primarily on educational and research fields related to medicine and engineering, with some addressing English education. However, there is almost no discussion regarding the humanities and social sciences. Additionally, a significant portion of the current discourse is in English and primarily addresses English-speaking contexts. Originality/Value To the best of our knowledge, our study is the first to summarize the potential societal biases in higher education. This review highlights the need for more in-depth studies and empirical work to understand the specific biases that GAI might introduce or amplify in educational settings, guiding the development of more ethical AI applications in higher education.
... A sample-level metric framework evaluates generative models through fidelity and utility lenses, facilitating the identification of discrepancies and similarities between real and synthetic datasets [1]. Another methodology emphasizes auditing and generating synthetic data with controllable trust trade-offs, allowing customization based on specific requirements [5]. Further exploration of synthetic data generation discusses its benefits and limitations across various contexts, particularly in creating a practical approach for deployment [22]. ...
Preprint
Full-text available
The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stability and generalizability. Stability ensures synthetic data accurately replicates known data distributions, while generalizability confirms its robustness in novel scenarios. Utility is demonstrated through the synthetic data's effectiveness in critical retail tasks such as demand forecasting and dynamic pricing, proving its value in predictive analytics and strategic planning. Privacy is safeguarded using Differential Privacy, ensuring synthetic data maintains a perfect balance between resembling training and holdout datasets without compromising security. Our findings validate that this framework provides reliable and scalable evaluation for synthetic retail data. It ensures high fidelity, utility, and privacy, making it an essential tool for advancing retail data science. This framework meets the evolving needs of the retail industry with precision and confidence, paving the way for future advancements in synthetic data methodologies.
... Several works propose the use of synthetic datasets for mitigating dataset bias, such as with GANs [50] or diffusion models [21]. However, synthetic or generated data may not necessarily represent underlying distributions of marginalised groups within populations and thus still unfairly disadvantage certain groups [2,4,6,36]. To combat these risks, fairness in generative models is an area gaining popularity: StyleGan [31] has been used to edit images on a spectrum, rather than using binary categories [26]; [21] use human feedback to guide diffusion models to generate diverse human images; and [32] learn to transfer age, race and gender across images. ...
Preprint
Full-text available
Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate there are spurious correlations in COCO Captions, the most commonly used dataset for evaluating bias, between background context and the gender of people in-situ. This is problematic because commonly-used bias metrics (such as Bias@K) rely on per-gender base rates. To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed. However, existing image editing methods have limitations and sometimes produce low-quality images; so, we introduce a method to automatically filter the generated images based on their similarity to real images. Using our balanced synthetic contrast sets, we benchmark bias in multiple CLIP-based models, demonstrating how metrics are skewed by imbalance in the original COCO images. Our results indicate that the proposed approach improves the validity of the evaluation, ultimately contributing to more realistic understanding of bias in vision-language models.
Article
Full-text available
The advancement of artificial intelligence (AI) technologies, including generative pre-trained transformers (GPTs) and generative models for text, image, audio, and video creation, has revolutionized content generation, creating unprecedented opportunities and critical challenges. This paper systematically examines the characteristics, methodologies, and challenges associated with detecting the synthetic content across multiple modalities, to safeguard digital authenticity and integrity. Key detection approaches reviewed include stylometric analysis, watermarking, pixel prediction techniques, dual-stream networks, machine learning models, blockchain, and hybrid approaches, highlighting their strengths and limitations, as well as their detection accuracy, independent accuracy of 80% for stylometric analysis and up to 92% using multiple modalities in hybrid approaches. The effectiveness of these techniques is explored in diverse contexts, from identifying deepfakes and synthetic media to detecting AI-generated scientific texts. Ethical concerns, such as privacy violations, algorithmic bias, false positives, and overreliance on automated systems, are also critically discussed. Furthermore, the paper addresses legal and regulatory frameworks, including intellectual property challenges and emerging legislation, emphasizing the need for robust governance to mitigate misuse. Real-world examples of detection systems are analyzed to provide practical insights into implementation challenges. Future directions include developing generalizable and adaptive detection models, hybrid approaches, fostering collaboration between stakeholders, and integrating ethical safeguards. By presenting a comprehensive overview of AIGC detection, this paper aims to inform stakeholders, researchers, policymakers, and practitioners on addressing the dual-edged implications of AI-driven content creation.