February 2025
·
16 Reads
Motivation Advancing data sharing in biomedical research, particularly for sensitive genomic and clinical datasets, is crucial for improving model performance across diverse patient populations. However, stringent privacy concerns hinder collaboration and limit insights derived from multi-institutional datasets. Current approaches to privacy-preserving data sharing fail to address gaps between data distributions. Results We introduce NoisyFlow, a differentially private neural network-based optimal transport framework designed to enable secure and unbiased biomedical data sharing. By integrating optimal transport theory with neural networks and differential privacy mechanisms, our framework aligns data distributions across institutions while preserving individual privacy. NoisyFlow eliminates the need for direct data sharing and reduces distribution shifts caused by covariate and batch effects. Empirical evaluations demonstrate the framework’s effectiveness in handling high-dimensional single-cell genomic data and histopathology images, achieving superior privacy guarantees while maintaining high utility in downstream tasks such as disease classification. Availability and implementation The implementation of NoisyFlow is available at https://github.com/liyy2/NoisyFlow . Contact mark@gersteinlab.org . Supplementary information Supplementary data are available online.