December 2024
·
7 Reads
Ad
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2024
·
7 Reads
November 2024
·
10 Reads
·
1 Citation
Nature Reviews Methods Primers
November 2024
·
47 Reads
Spurious features associated with class labels can lead image classifiers to rely on shortcuts that don't generalize well to new domains. This is especially problematic in medical settings, where biased models fail when applied to different hospitals or systems. In such cases, data-driven methods to reduce spurious correlations are preferred, as clinicians can directly validate the modified images. While Denoising Diffusion Probabilistic Models (Diffusion Models) show promise for natural images, they are impractical for medical use due to the difficulty of describing spurious medical features. To address this, we propose Masked Medical Image Inpainting (MaskMedPaint), which uses text-to-image diffusion models to augment training images by inpainting areas outside key classification regions to match the target domain. We demonstrate that MaskMedPaint enhances generalization to target domains across both natural (Waterbirds, iWildCam) and medical (ISIC 2018, Chest X-ray) datasets, given limited unlabeled target images.
November 2024
·
10 Reads
Vision-language model (VLM) embeddings have been shown to encode biases present in their training data, such as societal biases that prescribe negative characteristics to members of various racial and gender identities. VLMs are being quickly adopted for a variety of tasks ranging from few-shot classification to text-guided image generation, making debiasing VLM embeddings crucial. Debiasing approaches that fine-tune the VLM often suffer from catastrophic forgetting. On the other hand, fine-tuning-free methods typically utilize a "one-size-fits-all" approach that assumes that correlation with the spurious attribute can be explained using a single linear direction across all possible inputs. In this work, we propose Bend-VLM, a nonlinear, fine-tuning-free approach for VLM embedding debiasing that tailors the debiasing operation to each unique input. This allows for a more flexible debiasing approach. Additionally, we do not require knowledge of the set of inputs a priori to inference time, making our method more appropriate for online, open-set tasks such as retrieval and text guided image generation.
November 2024
·
8 Reads
Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation of stereotypes and potential downstream harm. In this work, we conduct a systematic analysis of the social biases that are present in CLIP, with a focus on the interaction between image and text modalities. We first propose a taxonomy of social biases called So-B-IT, which contains 374 words categorized across ten types of bias. Each type can lead to societal harm if associated with a particular demographic group. Using this taxonomy, we examine images retrieved by CLIP from a facial image dataset using each word as part of a prompt. We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups, such as retrieving mostly pictures of Middle Eastern men when asked to retrieve images of a "terrorist". Finally, we conduct an analysis of the source of such biases, by showing that the same harmful stereotypes are also present in a large image-text dataset used to train CLIP models for examples of biases that we find. Our findings highlight the importance of evaluating and addressing bias in vision-language models, and suggest the need for transparency and fairness-aware curation of large pre-training datasets.
October 2024
·
9 Reads
·
6 Citations
Vision-language models, like CLIP (Contrastive Language Image Pretraining), are becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior work has shown that large language and deep vision models can learn historical biases contained in their training sets, leading to perpetuation of stereotypes and potential downstream harm. In this work, we conduct a systematic analysis of the social biases that are present in CLIP, with a focus on the interaction between image and text modalities. We first propose a taxonomy of social biases called So-B-It, which contains 374 words categorized across ten types of bias. Each type can lead to societal harm if associated with a particular demographic group. Using this taxonomy, we examine images retrieved by CLIP from a facial image dataset using each word as part of a prompt. We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups, such as retrieving mostly pictures of Middle Eastern men when asked to retrieve images of a "terrorist". Finally, we conduct an analysis of the source of such biases, by showing that the same harmful stereotypes are also present in a large image-text dataset used to train CLIP models for examples of biases that we find. Our findings highlight the importance of evaluating and addressing bias in vision-language models, and suggest the need for transparency and fairness-aware curation of large pre-training datasets.
March 2024
·
4 Reads
Proceedings of the AAAI Conference on Artificial Intelligence
Multi-task learning (MTL) is essential for real-world applications that handle multiple tasks simultaneously, such as selfdriving cars. MTL methods improve the performance of all tasks by utilizing information across tasks to learn a robust shared representation. However, acquiring sufficient labeled data tends to be extremely expensive, especially when having to support many tasks. Recently, Knowledge Amalgamation (KA) has emerged as an effective strategy for addressing the lack of labels by instead learning directly from pretrained models (teachers). KA learns one unified multi-task student that masters all tasks across all teachers. Existing KA for MTL works are limited to teachers with identical architectures, and thus propose layer-to-layer based approaches. Unfortunately, in practice, teachers may have heterogeneous architectures; their layers may not be aligned and their dimensionalities or scales may be incompatible. Amalgamating multi-task teachers with heterogeneous architectures remains an open problem. For this, we design Versatile Common Feature Consolidator (VENUS), the first solution to this problem. VENUS fuses knowledge from the shared representations of each teacher into one unified generalized representation for all tasks. Specifically, we design the Feature Consolidator network that leverages an array of teacher-specific trainable adaptors. These adaptors enable the student to learn from multiple teachers, even if they have incompatible learned representations. We demonstrate that VENUS outperforms five alternative methods on numerous benchmark datasets across a broad spectrum of experiments.
March 2024
·
10 Reads
·
1 Citation
Proceedings of the AAAI Conference on Artificial Intelligence
There are increasingly many large language models (LLMs) available to the public. While these LLMs have exhibited impressive abilities on a variety of task, any individual LLM in particular may do well on some tasks and worse on others. Additionally, the performance of these models is heavily dependent on the choice of prompt template used. For instance, they exhibit sensitivity to the few shot examples chosen or brittleness to the wording of instructions. Moreover, a prompt template that makes a model perform well for one input may not be the optimal template for another input. This necessitates an approach for adaptively selecting LLM and prompt template pairs for each input. Recent work has shown that the accuracy of LLM's responses is correlated with the LLM's confidence in the response. Thus, a natural choice for selecting which model and prompt template to use is to select the pair that is most confident in its response. However, existing confidence metrics are expensive to calculate - necessitating multiple calls to each LLm and prompt pair. We thus propose an approach to predict the confidence of each pair using an auxiliary regression model that is inexpensive to run. Using this auxiliary model, we select the LLM and prompt template with the highest predicted confidence for a given input. Results on a range of benchmark datasets show that our confidence-based instance-level prompt search method consistently improves the performance of LLMs.
January 2024
·
4 Citations
December 2023
·
12 Reads
·
3 Citations
Ad
... 72 While some techniques are available and effective, using balanced data or incorporating representative variables is often the most effective approach for simultaneously maintaining model performance, and as such has been the focus of our review. [73][74][75] Despite the availability of these mitigation strategies, the primary challenge lies in raising awareness and ensuring their widespread adoption among researchers. To address this issue, the implementation of standardised guidelines is crucial. ...
November 2024
Nature Reviews Methods Primers
... More recently, efforts have expanded to multimodal models and datasets, addressing biases in various languagevision tasks. These investigations have explored biases in embeddings [25], text-to-image (TTI) generation [5,11,18,23,52,62,64], image retrieval [61], image captioning [27,65], and visual question-answering models [1,28,44]. Despite these advances, research on intersectional biases in TTI models remains limited. ...
October 2024
... Other work argues that this is a sufficient criterion for LLMs having their own beliefs (Hofweber et al., 2024). Importantly, this optimization pressure seems to shape model outputs to be more human-like in the sense that they comprise a somewhat coherent worldview, though LLM outputs are still much less coherent than humans Powell et al., 2024). Thus, an RLHF-trained LLM could possess beliefs of its own, making it an appropriate candidate for belief revision, but it is not known how much current truth-oriented finetuning processes shape LLMs to have their own beliefs rather than simulate beliefs of the authors of their pretraining data. ...
January 2024
... It is beyond the boundary line of computational creativity, showcasing increasingly amazing examples each year [23]. While GANs have shown remarkable power and interest, it is also limited by various challenges: difficulties in achieving stable training of the GAN [9,27], the vanishing gradient, and the mode collapse [1]. Efforts from both the academic community and the commercial sector have been made to address these problems [8,16,25,34,36]. ...
December 2023
... A study was conducted on capturing the interdependence of labels in multiple-label classification, where an example can be assigned multiple labels simultaneously. This study also demonstrated that effectively managing the complexities associated with labels necessitates the use of advanced techniques, particularly in cases where certain labels are limited or require additional contextual information for accurate classification [35]. Nevertheless, employing techniques like synthetic data generation has been demonstrated to improve the performance of the model when dealing with imbalanced and diverse labels. ...
June 2023
Proceedings of the AAAI Conference on Artificial Intelligence
... The distribution-embedded deep neural network (DDNN) [76] is a state-of-the-art network featuring learning approaches for activity recognition. • Triple-DARE [77] is a neural network method that combines three unique loss functions to enhance intra-class compactness and inter-class separation within the embedding space of multi-labeled datasets. In this experiment, we slightly modify the model from lab-to-field transfer to cross-individual transfer to make the settings the same as the proposed model. ...
March 2023
... Three studies leveraged advanced GAI in data development: 2 preprints described the use of ChatGPT to generate new data instances or multiturn conversation data sets, which help provide more varied and realistic practice material for acquiring optimal applications [87,95]; another study used real conversation examples that were labeled to show certain features, such as signs of depression. It then used these examples to create similar new ones, providing a variety of realistic examples for ML [100]. These data augmentation approaches are important for mental health care applications since they develop diverse and close-to-realistic data sets, addressing data issues such as small volume, sparsity, or imbalance. ...
December 2022
... [131] use tabular GANs to generate synthetic samples for the minority class in imbalanced survival datasets. [132] proposed HAR-CTGAN to generate synthetic data to handle class imbalance in human activity recognition data.It focuses on the synthesizing continuous features, such as real-number data recorded from various sensors. [133] used GANs to augment and synthesize data for balancing the cardiovascular disease prediction dataset. ...
December 2022
... The architectural design enables the generation of a range of visual reports that facilitate a deeper comprehension of learner behavior and performance across diverse digital platforms. In the context of digital phenotyping, a study employed a range of sophisticated visualization techniques to contextualize and interpret low-level sensor data collected from smartphones [14]. The visualizations permit analysts to explore and interpret intricate, context-rich datasets, thereby facilitating the discovery of behavioral patterns, or "phone-o-types", that can be predictive of health outcomes. ...
January 2023
Visual Informatics
... Early classification of time series [1,2,3,4] is a pivotal algorithm, especially when sampling cost is high, e.g., medical early diagnosis [5], autonomous driving [6], and action recognition [7]. Under these applications, the early classifier seeks to optimize both speed and accuracy at the same time. ...
October 2022