Available via license: CC BY-NC 4.0

Content may be subject to copyright.

Learning Single-Cell Perturbation Responses

using Neural Optimal Transport

Charlotte Bunne,

1,2,∗Stefan G. Stark,

1,2,3,4,∗Gabriele Gut,

5,∗

Jacobo Sarabia del Castillo,

5Kjong-Van Lehmann,

1,2,3,4,†Lucas Pelkmans,

5,†

Andreas Krause,

1,2,†Gunnar Rätsch1,2,3,4,6,†

1Department of Computer Science, ETH Zurich, Switzerland;

2AI Center, ETH Zurich, Switzerland;

3Medical Informatics Unit, University Hospital Zurich, Switzerland;

4Swiss Institute of Bioinformatics, Switzerland;

5Department of Molecular Life Sciences, University of Zurich, Switzerland;

6Department of Biology, ETH Zurich, Switzerland.

December 15, 2021

Abstract

The ability to understand and predict molecular responses towards

external perturbations is a core question in molecular biology. Techno-

logical advancements in the recent past have enabled the generation of

high-resolution single-cell data, making it possible to proﬁle individual cells

under diﬀerent experimentally controlled perturbations. However, cells are

typically destroyed during measurement, resulting in unpaired distributions

over either perturbed or non-perturbed cells. Leveraging the theory of

optimal transport and the recent advents of convex neural architectures,

we learn a coupling describing the response of cell populations upon pertur-

bation, enabling us to predict state trajectories on a single-cell level. We

apply our approach, CellOT, to predict treatment responses of 21,650 cells

subject to four diﬀerent drug perturbations. CellOT outperforms current

state-of-the-art methods both qualitatively and quantitatively, accurately

capturing cellular behavior shifts across all diﬀerent drugs.

1 Introduction

Characterizing and modeling perturbation responses at the single-cell level from

non-time resolved data remains one of the grand challenges of biology. It ﬁnds

applications in predicting cellular reactions to environmental stress or a patient’s

response to drug treatments. Accurate inference of perturbation responses at the

single-cell level allows us, for instance, to understand how and why individual

∗These authors contributed equally.

†

To whom correspondence should be addressed:

kjlehmann@ukaachen.de

,

lucas.pelkmans@mls.uzh.ch,krausea@inf.ethz.ch,raetsch@inf.ethz.ch.

1

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted December 15, 2021. ; https://doi.org/10.1101/2021.12.15.472775doi: bioRxiv preprint

tumor cells evade cancer therapies (Frangieh et al., 2021). More generally, it

deepens the mechanistic understanding of the molecular machinery determining

the respective responses to perturbations.

Cell responses to perturbations such as drugs are highly heterogeneous

in nature (Liberali et al., 2014), determined by many factors, including the

preexisting variability in the abundance and localization of molecular entities,

such as RNA or proteins (Shaﬀer et al., 2017), cellular states (Kramer and

Pelkmans, 2019), or the cellular microenvironment (Snijder et al., 2009). To

eﬀectively predict the drug response of a patient during treatment, it is thus

crucial to incorporate the molecular subpopulation structure of the cell populations

into the analysis.

A key diﬃculty in learning perturbation responses is that a cell (usually) must

be destroyed to measure its state, meaning that it is only possible to measure a

cell state either before or after a perturbation is applied. The typical experimental

setup divides a set of cells into subsets to which individual perturbations are

applied. Hereby, a subset of cells remains unperturbed, allowing us to measure

the base state of the population. So while we do not have access to a set of

paired control/perturbed single-cell observations, we do have access to samples

of distributions of control/perturbed cell states.

Previous methods to approximate single-cell perturbation responses fall

short of solving this highly complex pairing problem while, at the same time,

accounting for cellular heterogeneity and the strong subpopulation structure of

cell samples. Despite incorporating cell heterogeneity, mechanistic models do

not recover cellular response trajectories, instead of predicting factors such as

cell viability or response variables in the data in order to predict drug eﬃcacy

(Snijder et al., 2012; Berchtold et al., 2018; Green and Pelkmans, 2016). Linear

models (Dixit et al., 2016), on the other hand, are unable to capture complex and

inhomogeneous population responses upon perturbation. Current state-of-the-art

methods (Lopez et al., 2018; Lotfollahi et al., 2019; Yang et al., 2020) predict

perturbation responses via linear shifts in a learned low-dimensional latent

space. While capturing nonlinear cell-type-speciﬁc responses, their use of linear

interpolations cause them to resolve the alignment problem with the challenging

task of learning representations that are invariant to their perturbation status.

A similar matching problem was considered in Stark et al. (2020) for matching

cell populations that are proﬁled with diﬀerent proﬁling technologies.

This work proposes CellOT, a novel approach to predict single-cell pertur-

bation responses by uncovering couplings between control and perturbed cell

states while accounting for heterogeneous subpopulation structures of molecular

environments. We achieve this by utilizing the theory of optimal transport, which

provides natural geometry and mathematical tools to manipulate probability

distributions. To this end, we learn a robust optimal transport map describing

how the distribution of control cells connects to the distribution of perturbed cells.

Utilizing recent developments of neural optimal transport (Makkuva et al., 2020),

we learn a general optimal transport coupling for each perturbation, allowing us

to predict behavioral changes of incoming single-cell samples, e.g., of another

patient, using parameterizations learned for the previous cohort. We demonstrate

CellOT’s eﬀectiveness by deploying it to learning cellular responses to diﬀerent

2

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted December 15, 2021. ; https://doi.org/10.1101/2021.12.15.472775doi: bioRxiv preprint

<latexit sha1_base64="8eXnwrIlpVC9lDJsXJA3b54Pwd4=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/VKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPkfqPGA==</latexit>

⇢c

<latexit sha1_base64="goeuxSzxVssdTm512/imKl9KPTI=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKUY8FLx4r2A9ol5JNs23abLIkWaEs/Q9ePCji1f/jzX9j2u5BWx8MPN6bYWZemAhurOd9o8LG5tb2TnG3tLd/cHhUPj5pGZVqyppUCaU7ITFMcMmallvBOolmJA4Fa4eTu7nffmLacCUf7TRhQUyGkkecEuukVk+PVH/cL1e8qrcAXid+TiqQo9Evf/UGiqYxk5YKYkzX9xIbZERbTgWblXqpYQmhEzJkXUcliZkJssW1M3zhlAGOlHYlLV6ovycyEhszjUPXGRM7MqveXPzP66Y2ug0yLpPUMkmXi6JUYKvw/HU84JpRK6aOEKq5uxXTEdGEWhdQyYXgr768TlpXVf+6WnuoVeq1PI4inME5XIIPN1CHe2hAEyiM4Rle4Q0p9ILe0ceytYDymVP4A/T5A5yWjx8=</latexit>

⇢j

<latexit sha1_base64="04Egm20ziq88gpeWon2fenuWZ1c=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oUy2m3bpbhJ3N0IJ/RNePCji1b/jzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkesabgRrJMohjIQrB2Mb2d++4kpzePowUwS5kscRjzkFI2VOr0hSol93i9X3Ko7B1klXk4qkKPRL3/1BjFNJYsMFah113MT42eoDKeCTUu9VLME6RiHrGtphJJpP5vfOyVnVhmQMFa2IkPm6u+JDKXWExnYTolmpJe9mfif101NeO1nPEpSwyK6WBSmgpiYzJ4nA64YNWJiCVLF7a2EjlAhNTaikg3BW355lbQuqt5ltXZfq9Rv8jiKcAKncA4eXEEd7qABTaAg4Ble4c15dF6cd+dj0Vpw8plj+APn8wcIEo/6</latexit>

i

<latexit sha1_base64="S3W/iUmsawT7LtgFsbdsZXTOkfo=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/dKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPmxKPHg==</latexit>

⇢i

<latexit sha1_base64="/OqJF5gzkm0KC22mvfjOhjlLC2Y=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaUCbbTbt2N4m7G6GE/gkvHhTx6t/x5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVlDVoLGLVDlAzwSPWMNwI1k4UQxkI1gpGN1O/9cSU5nF0b8YJ8yUOIh5yisZK7e4ApcTeQ69ccavuDGSZeDmpQI56r/zV7cc0lSwyVKDWHc9NjJ+hMpwKNil1U80SpCMcsI6lEUqm/Wx274ScWKVPwljZigyZqb8nMpRaj2VgOyWaoV70puJ/Xic14ZWf8ShJDYvofFGYCmJiMn2e9Lli1IixJUgVt7cSOkSF1NiISjYEb/HlZdI8q3oX1fO780rtOo+jCEdwDKfgwSXU4Bbq0AAKAp7hFd6cR+fFeXc+5q0FJ585hD9wPn8ACZaP+w==</latexit>

j

: cells in different states

: new cell state after perturbation

: apoptotic cell

<latexit sha1_base64="Xx16NuI/ECoStdk45JUsAVXLMg4=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCb2T2WTIzOw6MyuEkJ/w4kERr/6ON//GSbIHTSxoKKq66e6KUsGN9f1vb219Y3Nru7BT3N3bPzgsHR03TZJpyho0EYluR2iY4Io1LLeCtVPNUEaCtaLR7cxvPTFteKIe7DhlocSB4jGnaJ3U7g5QSuyNeqWyX/HnIKskyEkZctR7pa9uP6GZZMpSgcZ0Aj+14QS15VSwabGbGZYiHeGAdRxVKJkJJ/N7p+TcKX0SJ9qVsmSu/p6YoDRmLCPXKdEOzbI3E//zOpmNb8IJV2lmmaKLRXEmiE3I7HnS55pRK8aOINXc3UroEDVS6yIquhCC5ZdXSfOyElxVqvfVcq2ax1GAUziDCwjgGmpwB3VoAAUBz/AKb96j9+K9ex+L1jUvnzmBP/A+fwAG5I/u</latexit>

k

:optimal transport plan

: of perturbation k

:

<latexit sha1_base64="04Egm20ziq88gpeWon2fenuWZ1c=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oUy2m3bpbhJ3N0IJ/RNePCji1b/jzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkesabgRrJMohjIQrB2Mb2d++4kpzePowUwS5kscRjzkFI2VOr0hSol93i9X3Ko7B1klXk4qkKPRL3/1BjFNJYsMFah113MT42eoDKeCTUu9VLME6RiHrGtphJJpP5vfOyVnVhmQMFa2IkPm6u+JDKXWExnYTolmpJe9mfif101NeO1nPEpSwyK6WBSmgpiYzJ4nA64YNWJiCVLF7a2EjlAhNTaikg3BW355lbQuqt5ltXZfq9Rv8jiKcAKncA4eXEEd7qABTaAg4Ble4c15dF6cd+dj0Vpw8plj+APn8wcIEo/6</latexit>

i

<latexit sha1_base64="/OqJF5gzkm0KC22mvfjOhjlLC2Y=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaUCbbTbt2N4m7G6GE/gkvHhTx6t/x5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVlDVoLGLVDlAzwSPWMNwI1k4UQxkI1gpGN1O/9cSU5nF0b8YJ8yUOIh5yisZK7e4ApcTeQ69ccavuDGSZeDmpQI56r/zV7cc0lSwyVKDWHc9NjJ+hMpwKNil1U80SpCMcsI6lEUqm/Wx274ScWKVPwljZigyZqb8nMpRaj2VgOyWaoV70puJ/Xic14ZWf8ShJDYvofFGYCmJiMn2e9Lli1IixJUgVt7cSOkSF1NiISjYEb/HlZdI8q3oX1fO780rtOo+jCEdwDKfgwSXU4Bbq0AAKAp7hFd6cR+fFeXc+5q0FJ585hD9wPn8ACZaP+w==</latexit>

j

<latexit sha1_base64="8eXnwrIlpVC9lDJsXJA3b54Pwd4=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/VKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPkfqPGA==</latexit>

⇢c

<latexit sha1_base64="goeuxSzxVssdTm512/imKl9KPTI=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKUY8FLx4r2A9ol5JNs23abLIkWaEs/Q9ePCji1f/jzX9j2u5BWx8MPN6bYWZemAhurOd9o8LG5tb2TnG3tLd/cHhUPj5pGZVqyppUCaU7ITFMcMmallvBOolmJA4Fa4eTu7nffmLacCUf7TRhQUyGkkecEuukVk+PVH/cL1e8qrcAXid+TiqQo9Evf/UGiqYxk5YKYkzX9xIbZERbTgWblXqpYQmhEzJkXUcliZkJssW1M3zhlAGOlHYlLV6ovycyEhszjUPXGRM7MqveXPzP66Y2ug0yLpPUMkmXi6JUYKvw/HU84JpRK6aOEKq5uxXTEdGEWhdQyYXgr768TlpXVf+6WnuoVeq1PI4inME5XIIPN1CHe2hAEyiM4Rle4Q0p9ILe0ceytYDymVP4A/T5A5yWjx8=</latexit>

⇢j

<latexit sha1_base64="S3W/iUmsawT7LtgFsbdsZXTOkfo=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/dKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPmxKPHg==</latexit>

⇢i

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

: cells in different states

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

: new cell state after perturbation

: apoptotic cell

<latexit sha1_base64="Xx16NuI/ECoStdk45JUsAVXLMg4=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCb2T2WTIzOw6MyuEkJ/w4kERr/6ON//GSbIHTSxoKKq66e6KUsGN9f1vb219Y3Nru7BT3N3bPzgsHR03TZJpyho0EYluR2iY4Io1LLeCtVPNUEaCtaLR7cxvPTFteKIe7DhlocSB4jGnaJ3U7g5QSuyNeqWyX/HnIKskyEkZctR7pa9uP6GZZMpSgcZ0Aj+14QS15VSwabGbGZYiHeGAdRxVKJkJJ/N7p+TcKX0SJ9qVsmSu/p6YoDRmLCPXKdEOzbI3E//zOpmNb8IJV2lmmaKLRXEmiE3I7HnS55pRK8aOINXc3UroEDVS6yIquhCC5ZdXSfOyElxVqvfVcq2ax1GAUziDCwjgGmpwB3VoAAUBz/AKb96j9+K9ex+L1jUvnzmBP/A+fwAG5I/u</latexit>

k

:optimal transport plan

: of perturbation k

:

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

cells after

perturbation j

control cells

cells after

perturbation i

cell data

space

<latexit sha1_base64="04Egm20ziq88gpeWon2fenuWZ1c=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oUy2m3bpbhJ3N0IJ/RNePCji1b/jzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqKGvSWMSqE6BmgkesabgRrJMohjIQrB2Mb2d++4kpzePowUwS5kscRjzkFI2VOr0hSol93i9X3Ko7B1klXk4qkKPRL3/1BjFNJYsMFah113MT42eoDKeCTUu9VLME6RiHrGtphJJpP5vfOyVnVhmQMFa2IkPm6u+JDKXWExnYTolmpJe9mfif101NeO1nPEpSwyK6WBSmgpiYzJ4nA64YNWJiCVLF7a2EjlAhNTaikg3BW355lbQuqt5ltXZfq9Rv8jiKcAKncA4eXEEd7qABTaAg4Ble4c15dF6cd+dj0Vpw8plj+APn8wcIEo/6</latexit>

i

<latexit sha1_base64="/OqJF5gzkm0KC22mvfjOhjlLC2Y=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE1GPRi8cK9gPaUCbbTbt2N4m7G6GE/gkvHhTx6t/x5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVlDVoLGLVDlAzwSPWMNwI1k4UQxkI1gpGN1O/9cSU5nF0b8YJ8yUOIh5yisZK7e4ApcTeQ69ccavuDGSZeDmpQI56r/zV7cc0lSwyVKDWHc9NjJ+hMpwKNil1U80SpCMcsI6lEUqm/Wx274ScWKVPwljZigyZqb8nMpRaj2VgOyWaoV70puJ/Xic14ZWf8ShJDYvofFGYCmJiMn2e9Lli1IixJUgVt7cSOkSF1NiISjYEb/HlZdI8q3oX1fO780rtOo+jCEdwDKfgwSXU4Bbq0AAKAp7hFd6cR+fFeXc+5q0FJ585hD9wPn8ACZaP+w==</latexit>

j

<latexit sha1_base64="8eXnwrIlpVC9lDJsXJA3b54Pwd4=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/VKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPkfqPGA==</latexit>

⇢c

<latexit sha1_base64="goeuxSzxVssdTm512/imKl9KPTI=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKUY8FLx4r2A9ol5JNs23abLIkWaEs/Q9ePCji1f/jzX9j2u5BWx8MPN6bYWZemAhurOd9o8LG5tb2TnG3tLd/cHhUPj5pGZVqyppUCaU7ITFMcMmallvBOolmJA4Fa4eTu7nffmLacCUf7TRhQUyGkkecEuukVk+PVH/cL1e8qrcAXid+TiqQo9Evf/UGiqYxk5YKYkzX9xIbZERbTgWblXqpYQmhEzJkXUcliZkJssW1M3zhlAGOlHYlLV6ovycyEhszjUPXGRM7MqveXPzP66Y2ug0yLpPUMkmXi6JUYKvw/HU84JpRK6aOEKq5uxXTEdGEWhdQyYXgr768TlpXVf+6WnuoVeq1PI4inME5XIIPN1CHe2hAEyiM4Rle4Q0p9ILe0ceytYDymVP4A/T5A5yWjx8=</latexit>

⇢j

<latexit sha1_base64="S3W/iUmsawT7LtgFsbdsZXTOkfo=">AAAB7XicbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCbOTTjJmdmaZmRXCkn/w4kERr/6PN//GSbIHTSxoKKq66e6KEsGN9f1vb219Y3Nru7BT3N3bPzgsHR03jUo1wwZTQul2RA0KLrFhuRXYTjTSOBLYisa3M7/1hNpwJR/sJMEwpkPJB5xR66RmV49Uj/dKZb/iz0FWSZCTMuSo90pf3b5iaYzSMkGN6QR+YsOMasuZwGmxmxpMKBvTIXYclTRGE2bza6fk3Cl9MlDalbRkrv6eyGhszCSOXGdM7cgsezPxP6+T2sFNmHGZpBYlWywapIJYRWavkz7XyKyYOEKZ5u5WwkZUU2ZdQEUXQrD88ippXlaCq0r1vlquVfM4CnAKZ3ABAVxDDe6gDg1g8AjP8ApvnvJevHfvY9G65uUzJ/AH3ucPmxKPHg==</latexit>

⇢i

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

: cells in different states

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

: new cell state after perturbation

: apoptotic cell

<latexit sha1_base64="Xx16NuI/ECoStdk45JUsAVXLMg4=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBA8hV0J6jHgxWME84BkCb2T2WTIzOw6MyuEkJ/w4kERr/6ON//GSbIHTSxoKKq66e6KUsGN9f1vb219Y3Nru7BT3N3bPzgsHR03TZJpyho0EYluR2iY4Io1LLeCtVPNUEaCtaLR7cxvPTFteKIe7DhlocSB4jGnaJ3U7g5QSuyNeqWyX/HnIKskyEkZctR7pa9uP6GZZMpSgcZ0Aj+14QS15VSwabGbGZYiHeGAdRxVKJkJJ/N7p+TcKX0SJ9qVsmSu/p6YoDRmLCPXKdEOzbI3E//zOpmNb8IJV2lmmaKLRXEmiE3I7HnS55pRK8aOINXc3UroEDVS6yIquhCC5ZdXSfOyElxVqvfVcq2ax1GAUziDCwjgGmpwB3VoAAUBz/AKb96j9+K9ex+L1jUvnzmBP/A+fwAG5I/u</latexit>

k

:optimal transport plan

: of perturbation k

:

cells after

perturbation i

cells after

perturbation j

control

cells

cell data

space

É

: different cell states

: cell states after perturbation

⇢i

<latexit sha1_base64="5WAh8hGQBqNWWh1baUtLFoSmCxs=">AAAB+XicbVBNS8NAEN3Ur1q/oh69BItQLyWRoh4LXjxWsB/QhDLZbtu1u0nYnRRK6D/x4kERr/4Tb/4bt20O2vpg4PHeDDPzwkRwja77bRU2Nre2d4q7pb39g8Mj+/ikpeNUUdaksYhVJwTNBI9YEzkK1kkUAxkK1g7Hd3O/PWFK8zh6xGnCAgnDiA84BTRSz7b9IUgJFR9HDKH3dNmzy27VXcBZJ15OyiRHo2d/+f2YppJFSAVo3fXcBIMMFHIq2Kzkp5olQMcwZF1DI5BMB9ni8plzYZS+M4iVqQidhfp7IgOp9VSGplMCjvSqNxf/87opDm6DjEdJiiyiy0WDVDgYO/MYnD5XjKKYGgJUcXOrQ0eggKIJq2RC8FZfXietq6p3Xa091Mr1Wh5HkZyRc1IhHrkhdXJPGqRJKJmQZ/JK3qzMerHerY9la8HKZ07JH1ifP/dkkzE=</latexit>

(✓j)

<latexit sha1_base64="jXCPyDunvokkoB+UGMx5puOLU/8=">AAAB+XicbVBNS8NAEN34WetX1KOXYBHqpSRS1GPBi8cK9gOaECbbbbt0Nwm7k0IJ/SdePCji1X/izX/jts1BWx8MPN6bYWZelAqu0XW/rY3Nre2d3dJeef/g8OjYPjlt6yRTlLVoIhLVjUAzwWPWQo6CdVPFQEaCdaLx/dzvTJjSPImfcJqyQMIw5gNOAY0U2rY/BCmh6uOIIYT8KrQrbs1dwFknXkEqpEAztL/8fkIzyWKkArTueW6KQQ4KORVsVvYzzVKgYxiynqExSKaDfHH5zLk0St8ZJMpUjM5C/T2Rg9R6KiPTKQFHetWbi/95vQwHd0HO4zRDFtPlokEmHEyceQxOnytGUUwNAaq4udWhI1BA0YRVNiF4qy+vk/Z1zbup1R/rlUa9iKNEzskFqRKP3JIGeSBN0iKUTMgzeSVvVm69WO/Wx7J1wypmzsgfWJ8/9d+TMA==</latexit>

(✓i)

<latexit sha1_base64="if51Wk1neoSWccIW7XKRJD0nyH0=">AAAB+XicbVBNS8NAEN34WetX1KOXYBHqpSRS1GPBi8cK9gOaECbbbbt0Nwm7k0IJ/SdePCji1X/izX/jts1BWx8MPN6bYWZelAqu0XW/rY3Nre2d3dJeef/g8OjYPjlt6yRTlLVoIhLVjUAzwWPWQo6CdVPFQEaCdaLx/dzvTJjSPImfcJqyQMIw5gNOAY0U2rY/BCmh6uOIIYTjq9CuuDV3AWedeAWpkALN0P7y+wnNJIuRCtC657kpBjko5FSwWdnPNEuBjmHIeobGIJkO8sXlM+fSKH1nkChTMToL9fdEDlLrqYxMpwQc6VVvLv7n9TIc3AU5j9MMWUyXiwaZcDBx5jE4fa4YRTE1BKji5laHjkABRRNW2YTgrb68TtrXNe+mVn+sVxrVIo4SOScXpEo8cksa5IE0SYtQMiHP5JW8Wbn1Yr1bH8vWDauYOSN/YH3+APVNkyY=</latexit>

(✓k)

Figure 1: Learning single-cell perturbation responses. We aim to recover a mapping from

control cell distributions

ρc

to some perturbed cell distribution

ρi

or

ρj

by learning the

corresponding neural optimal transport map

γ

(

θk

), parameterized by

θk

, from the observed

distribution of untreated cells and the set of cells observed after the perturbation is applied.

cancer drugs and tumor combination therapies. An overview of our approach is

illustrated in Figure 1.

Optimal transport has previously been applied in the domain of single-cell

biology to uncover trajectories of single-cell reprogramming and to link rich,

non-spatially-resolved with sparse, spatially resolved measurements (Schiebinger

et al., 2019; Cang and Nie, 2020; Demetci et al., 2020; Huizing et al., 2021;

Lavenant et al., 2021; Zhang et al., 2021). Here we apply optimal transport

to a new data modality, consisting of cell morphological measurements and

multiplexed protein state measurements obtained by 4i (Gut et al., 2018) from

large populations of cancer cells exposed in vitro to diﬀerent drugs used in the

clinic.

2 Background

2.1 Optimal Transport

Optimal transport plays dual roles as it induces a mathematically well-characterized

distance measure between distributions besides providing a geometry-based

approach to realizing couplings between two probability distributions. Let

µ=Pn

i=1 aiδxiand ν=Pm

j=1 bjδyjbe two discrete probability measures in Rd.

The optimal transport (OT) problem (Kantorovich, 1942) reads

W2

2(µ, ν) = inf

γ∈Γ(µ,ν)Zkx−yk2dγ(x, y),(1)

where the polytope Γ(

a, b

)is

{γ∈Rn×m

+, γ1m

=

a, γ>1n

=

b}

describes the

set of all couplings

γ

between

µ

and

ν

. The optimal transport plan

γ

thus

corresponds to the coupling between two probability distributions minimizing

the overall transportation cost. Computing optimal transport distances in

(1)

involves solving a linear program, and thus their computational cost is prohibitive

for large-scale machine learning problems. Regularizing objective

(1)

with an

entropy term results in signiﬁcantly more eﬃcient optimization (Cuturi, 2013),

W2,ε

2(µ, ν) = inf

γ∈Γ(µ,ν)Zkx−yk2dγ(x, y)−εH (γ),(2)

with entropy

H

(

γ

) =

−Pij γij

(

log γij −

1) and parameter

ε

controlling the

strength of the regularization.

Wε

2

is further diﬀerentiable w.r.t. its inputs and

3

.CC-BY-NC 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted December 15, 2021. ; https://doi.org/10.1101/2021.12.15.472775doi: bioRxiv preprint

thus serves as a loss function in machine learning applications.

Problem

(1)

denotes the primal formulation for the Wasserstein-2 distance.

The corresponding dual introduced by Kantorovich in 1942 is a constrained

concave maximization problem deﬁned as

W2

2(µ, ν) = sup

(f,g)∈Φc

Eµ[f(x)] + Eν[g(y)],(3)

where the set of admissible potentials is Φ

c:

=

{

(

f, g

)

∈L1

(

µ

)

×L1

(

ν

) :

f

(

x

) +

g

(

y

)

≤1

2kx−yk2

2

,

∀

(

x, y

)

dµ ⊗dν a.e.}

(Villani, 2003, Theorem 1.3). Villani

(2003, Theorem 2.9) further simpliﬁes the dual problem

(3)

over the pair of

functions (f, g)to

W2

2(µ, ν) = 1

2Ekxk2

2+kyk2

2

| {z }

Cµ,ν

−inf

f∈˜

Φ

Eµ[f(X)] + Eν[f∗(Y)] ,(4)

where

˜

Φ

is the set of all convex functions in

L1

(

dµ

)

×L1

(

dν

),

L1

(

µ

)

:

=

{fis measurable

&

Rfdµ < ∞}

, and

f∗

(

y

) =

supxhx, yi − f

(

x

)is

f

’s con-

vex conjugate. Villani (2003, Theorem 2.9) then proves the existence of an

optimal pair (

f, f ∗

)of lower semi-continuous proper conjugate convex functions

on Rnminimizing (3).

2.2 Convex Neural Networks

In order to parameterize convex spaces such as

˜

Φ

in

(4)

, we need neural networks

which are convex w.r.t. to their inputs. One example are input convex neural

networks (ICNN) introduced by Amos et al. (2017). ICNNs are based on fully-

connected feed-forward networks that ensure convexity by placing constraints

on their parameters. An ICNN with parameters

θ

=

{bi, W z

i, W x

i}

represents a

convex function f(x;θ)and, for a layer i= 0 . . . L −1, is deﬁned as

hi+1 =σi(Wx

ix+Wz

ihi+bi)and f(x;θ) = hL,(5)

where activation functions

σi

are convex and non-decreasing, and elements of

all

Wz

i

are constrained to be nonnegative. Despite their constraints, ICNNs are

able to parameterize a rich class of convex functions. In particular, Chen et al.

(2019) provide a theoretical analysis that any convex function over a convex

domain can be approximated in sup norm by an ICNN. Huang et al. (2021)

further extend ICNNs from fully-connected feed-forward neural networks to

convolutional neural architectures.

2.3 Neural Optimal Transport

Despite existing numerical approximations of the optimal transport distance

and the corresponding optimal coupling (Cuturi, 2013; Aude et al., 2016)

(2)

,

recent eﬀorts have investigated neural network-based approaches as fast and

scalable approximations to

(1)

. Taghvaei and Jalali (2019) consider solving

(4)

by parameterizing

f

with an ICNN and solve for

f∗

at each step, which

4

.CC-BY-NC 4.0 International licenseavailable under a

has a high computational cost. Makkuva et al. (2020) extend this work by

approximating

f∗

with another ICNN

g

, which scales well but transforms the

problem into a min-max optimization task. Huang et al. (2021) introduce a novel,

OT-inspired parameterization of normalizing ﬂows utilizing ICNNs. Korotin

et al. (2021) provide a detailed comparison of the current state of neural optimal

transport solvers. Furthermore, convex neural architectures have been utilized

to parameterize Wasserstein gradient ﬂows (Bunne et al., 2021; Alvarez-Melis

et al., 2021; Mokrov et al., 2021).

3 Model

Recent high-throughput methods provide great insights on how cell populations

respond to various perturbations on the level of individual cells. The provided

data, however, is non-time-resolved and unaligned. Hence, snapshots taken of

biological samples before and after perturbations do not provide information

on single-cell trajectories. Perturbations might include the application of drugs

aﬀecting molecular functions in cells, or changes in the cellular environment

causing shifts in biological signaling, thus impacting cells and their states in

various ways. In the following, we describe our approach, which uncovers

single-cell perturbation responses by predicting couplings between control and

perturbed cell states. Hereby, let

X

denote the biological data space spanned by

cell morphology and gene expression features. We then treat a cell’s response to

perturbation

k

as an evolution in a high-dimensional space of cell states

Rd

=

X

.

3.1 Recovering Perturbation Eﬀects via Optimal Transport

Given a dataset of

n

observations

{xc

1, . . . , xc

n}, xc

i∈ X

drawn from

ρc∈ P

(

X

),

the distribution of cells before applying a perturbation, we aim to learn the

distribution of cells

ρk∈ P

(

X

)upon some perturbation

k

, given a set of separate

samples {xk

1, . . . , xk

m}, xk

i∈ X .

Perturbation responses of cells are dynamic: after applying perturbation

k

,

cell states evolve over time and thus can be modeled as a stochastic process on

the cell data space. Despite this time-resolved nature of single-cell responses, we

only have access to the distributions of cell states before,

ρc

, and after injecting

perturbation

k

,

ρk

. We thus aim at understanding the underlying stochastic

process without access to time-resolved perturbation responses by uncovering the

coupling

γ

between

ρc

and

ρk

. Given prior biological knowledge, we can assume

that cells do not drastically alter their phenotype w.r.t. morphology and gene

expression pattern. We thus posit that the evolution of probability distributions

of single-cells upon perturbation can be modeled via the mathematical theory

of optimal transport. The coupling

γ

then corresponds to an optimal transport

plan (1) between ρcand ρk.

Following Makkuva et al. (2020), we infer the optimal coupling

γ(1)

between

ρc

and

ρt

. Thus, instead of computing a coupling individually for each pair of cell

samples using existing solvers (Cuturi, 2013), we learn a parameterized optimal

transport map using neural networks. The parameterized OT coupling then

5

.CC-BY-NC 4.0 International licenseavailable under a

serves as a robust predictor for cellular distribution shifts upon perturbations on

unseen samples {xc

i}n0

i=1 ∼ρc, i.e., of another patient.

3.2 Parametrization of the Optimal Transport Coupling

Directly learning the optimal transport map in the primal

(1)

and dual

(3)

is

notoriously diﬃcult. Instead, Makkuva et al. build upon celebrated results by

Knott and Smith (1984) and Brenier (1991), which relate the optimal solutions

for the dual form

(3)

and the primal form

(1)

, to derive a min-max formulation

replacing the convex conjugate in (4) (Makkuva et al., 2020, Theorem 3.3)

W2

2(ρc, ρk) = sup

f∈˜

Φ

f∗∈L1(ρk)

inf

g∈˜

Φ

Cρc,ρk−Eρc[f(x)] −Eρk[hy, ∇g(y)i − f(∇g(y))]

| {z }

Vρc,ρk(f,g)

.(6)

We can further relax the constraint

g∈˜

Φ

to

L1

(

ρk

), as a function

g∈L1

(

ρk

)

minimizing

(6)

is convex and equal to

f∗

for any convex function

f

. In order

to learn the resulting optimal transport, i.e., the solution of the minimization

problem in

(6)

, Makkuva et al. (2020) parametrize both dual variables

f

and

g

using input convex neural networks (§ 2.3) (Amos et al., 2017). The resulting

approximate Wasserstein distance is thus deﬁned as

ˆ

W2

2(ρc, ρk) = sup

φ

inf

θ

Cρc,ρk− Vρc,ρk(fφ, gθ),(7)

where

θ

and

φ

are the parameters of each ICNN. The resulting

g∗

θ

produces an

approximate optimal transport plan γ≈(∇g∗

θ×Id)#ρc.

3.3 Predicting Perturbation Eﬀects via CellOT

The framework described above allows us to recover couplings between con-

trol

{xc

1, . . . , xc

n}

and perturbed cells

{xk

1, . . . , xk

n}

, giving insights into cellular

response trajectories upon application of a perturbation

k

. Given a set of

perturbations

K

, and sample access to the control distribution

ρc

as well as

distributions

ρk

for each perturbation

k∈K

,CellOT learns the optimal pair

of dual potentials (

f∗

φk, gθ∗

k

)for each perturbation

k

. Given parametrizations of

the convex potentials for each

k

,CellOT then predicts the transformation of a

control cell

xc

i

upon perturbation

k

via

ˆxk

i

=

∇gθ?

k

(

xc

i

), i.e., samples following

the predicted perturbed distribution

ˆρk

= (

∇gθ∗

k

)

#ρc

.CellOT thus provides

a general approach to predict state trajectories on a single-cell level, as well

as understand, how heterogeneous subpopulation structures evolve under the

impact of external factors.

4 Evaluation

We evaluate CellOT on the task of predicting single-cell drug responses for

drugs with diﬀerent molecular eﬀects, using melanoma cell lines proﬁled by the

4i technology (Gut et al., 2018).

6

.CC-BY-NC 4.0 International licenseavailable under a

4.1 Datasets

4i is an imaging technology that detects protein abundance by attaching a

ﬂuorescent tag designed to bind to a target protein and then measuring the

ﬂuorescence intensity of this tag. An iterative staining and washing procedure

allows for the capture of multiple tags. Additionally, an image processing pipeline

extracts morphological features, such as cell perimeter and area and detects

the cell nucleus. We considered four common cancer therapies for this works

since they target diﬀerent biological processes. Erlotinib is an inhibitor of the

epidermal growth factor receptor (EGFR) tyrosine kinase, Imatinib inhibits the

Bcr-Abl tyrosine kinase, and Trametinib is an inhibitor of mitogen-activated

extracellular signal-regulated kinase 1 (MEK1) and MEK2.

We utilized a mixture of 2 melanoma tumor cell lines (ratio 1:1) in order to

image a total of 21,650 cells, of which 11,526 are in the (untreated) control state,

2,364 are treated with Erlotinib, 2,650 with Imatinib, 2,683 with Trametinib,

and 2,417 are treated with a combination of Trametinib and Erlotinib, and

48 features are extracted for each cell. 22 features are morphological, and the

remaining 26 are mean intensities of 13 protein markers detected both inside the

cell nucleus and in the cell as a whole. Finally, we perform an 80/20 train test

split for each condition and evaluate model performance on its ability to make

predictions on the unseen set of control cells. More details regarding dataset

preparation can be found in Appendix B.1.

4.2 Baselines

We compare CellOT to two other baselines, both of which attempt to add

perturbation eﬀects through the manipulation of a learned latent representa-

tion: scGen (Lotfollahi et al., 2019) computes linear shifts using latent space

arithmetic to remove the source condition and add the target condition, and the

conditional autoencoder, cAE, which has an architecture based on batch correc-

tion technique popular in the single-cell community, ﬁrst introduced by Lopez

et al. (2018). Here, one-hot encodings of batch labels (treatment conditions) are

concatenated to the encoder and decoder inputs, which attempt to remove and

then add condition-speciﬁc eﬀects. More details can be found in Appendix A.

4.3 Evaluation Metrics

Since we lack access to the ground truth set of control and treatment observations

on the single-cell level, we ﬁrst analyze the eﬀectiveness of CellOT using

evaluations that operate on the level of the distribution of real and predicted

perturbation states. Drug signatures are computed as the diﬀerence in means

between the distribution of perturbed states and control states. We then report

the

`2

-distance between the drug signatures (DS) computed on the true and

predicted distributions (

`2

(

DS

)). We additionally consider two distributional

distances: kernel maximum mean discrepancy (MMD) (Gretton et al., 2012)

and entropy-regularized Wasserstein distance

W2,ε

2(2)

(Cuturi, 2013). MMD is

computed using the RBF kernel and averaging over the length scales 0

.

5

,

0

.

1

,

0

.

01

,

and 0.005;W2,ε

2is computed with ε= 0.5.

7

.CC-BY-NC 4.0 International licenseavailable under a

4.4 Results

For each drug perturbation, all models predict the perturbed cell states from

the set of held-out of control cells. Diﬀerences between the distribution of

perturbed cells and predicted cells are shown in Table 1. CellOT signiﬁcantly

outperforms all baselines on all three metrics. Qualitative assessment of the

marginal distributions of control, treated, and predicted cell states provides

further evidence for superior performance of CellOT over other approaches

(see Figure 2 for three selected features of the Imatinib condition).

Table 1: Performance assessment of CellOT compared to diﬀerent baselines w.r.t. to

Wasserstein (

W2,ε

2

,

(2)

) and MMD distances between the observed perturbed cells and predicted

responses from control cells, as well as the predictive quality of drug signatures (see § 4.3).

Model Drugs

Erlotinib Imatinib Trametinib Trametinib and Erlotinib

`2(DS)MMD W2,ε

2`2(DS)MMD W2,ε

2`2(DS)MMD W2,ε

2`2(DS)MMD W2,ε

2

scGen 0.41 0.0241 3.542 0.52 0.0361 5.811 0.56 0.0180 3.587 0.60 0.0163 3.594

cAE 0.05 0.0074 3.330 0.16 0.0200 4.512 0.37 0.0087 3.343 0.44 0.0122 3.215

CellOT 0.22 0.0013 3.619 0.12 0.0010 3.851 0.14 0.0011 2.846 0.18 0.0014 2.796

Figure 2: Marginal distributions of observed and predicted cell states for three selected features,

i.e., a measure of the eccentricity of the nucleus, as well as Sox9 intensity level inside the

nucleus and pERK intensity within the cell but outside the nucleus. The marginals of control

and Imatinib distributions correspond to the observed set of untreated cells and cells treated

with the Imatinib drug. The remaining distributions are calculated using the predictions of

each model on the unseen set of control cells.

Next, we compared UMAP projections (McInnes et al., 2018) of the perturbed

and predicted cells (see Figure 3). Predicted cells are colored by the fraction

of other predicted cells in their

k

= 100 nearest neighbors. If the distribution

of predicted cells matches the true distribution of perturbed cells, then we

would expect the nearest neighbor of each cell to be well mixed (i.e., 0.5) across

conditions. Thus, cells with values closer 1 indicate regions where the predicted

distribution does not integrate with the true perturbed distribution. We conclude

that predictions made by CellOT integrate well with measurements of real

treated cells.

Finally, the previous results argue that the distribution of cells predicted

by CellOT closely matches true distribution; however, they could have also

been replicated by a map that assigns control cells to random treated cells.

Thus, we evaluate the quality of single-cell level pairs induced from the CellOT

mapping by computing the Spearman correlation of features between the control

state and predicted drug state. The distribution of the correlation coeﬃcient

8

.CC-BY-NC 4.0 International licenseavailable under a

Figure 3: UMAP projections computed on the joint set of cells perturbed by Imatinib (grey)

and predictions of each model. Model predictions are colored by the fraction of other predicted

cells in their

k

= 100 nearest neighbors in data space. Predicted cells which do not share many

neighbors of the true set of perturbed cells, but instead control cells take values of

≈

1

.

0(blue).

Predicted cells that integrate well with true perturbed cells take values of ≈0.5(white).

between the control state and the predicted state across all features of all learned

maps is shown in Figure 4. The low distributional distances between CellOT-

predicted cells to the true distribution of perturbed cells in conjunction with

a high correlation of features with the control states paired to the predicted

states demonstrate that CellOT makes sound predictions on the single-cell

level, outperforming current state-of-the-art methods both qualitatively and

quantitatively.

Figure 4: Distribution of Spearman correlation coeﬃcients between the features of control cells

and the features of its corresponding predicted state upon treatment for each considered drug

Erlotinib, Imatinib, Trametinib and the combination therapy of Trametinib and Erlotinib. Low

correlations imply unexpected signiﬁcant diﬀerences in the feature states between prediction

and control, and thus a reduced accuracy of predictive power.

5 Conclusion

In this paper, we present a new framework to learn single-cell perturbation

responses. We approach the problem by learning an optimal transport map that

is parameterized by an ICNN to push-forward the distribution of control cells onto

the distribution of perturbed cells. We validate CellOT’s eﬀectiveness through

experiments on melanoma cell lines with four diﬀerent drug perturbations. In

the absence of ground truth, we provide various evaluation metrics to compare

our method to existing approaches. While operating in the original data space,

instead of relying on meaningful low-dimensional representations, CellOT

performs consistently well across all perturbations, outperforming current state-

of-the-art methods. The use of neural optimal transport to learn single-cell drug

9

.CC-BY-NC 4.0 International licenseavailable under a

responses makes for an exciting avenue of future work, including its use to improve

our mechanistic understanding of cell therapies, to study drug responses from

patient samples, and to better account for cell-to-cell variability in large-scale

drug discovery eﬀorts.

Acknowledgments

We are grateful to Hugo Yèche and Ximena Bonilla for their fruitful comments,

corrections, and discussions. C.B. and A.K. received funding from the Swiss

National Science Foundation under the National Center of Competence in

Research (NCCR) Catalysis under grant agreement 51NF40 180544. L.P. is

supported by the European Research Council (ERC-2019-AdG-885579), the

Swiss National Science Foundation (SNSF grant 310030_192622), the Chan

Zuckerberg Initiative, and the University of Zurich. G.G. received funding from

the Swiss National Science Foundation and InnoSuisse as part of the BRIDGE

program as well as from the University of Zurich through the BioEntrepreneur

Fellowship. K.L. and S.G.S. were partially funded by ETH Zürich core funding

(to G.R.) and from the Tumor Proﬁler Initiative (to G.R.).

Declaration of Interests

G.G. and L.P. have ﬁled a patent on the 4i technology (patent WO2019207004A1).

References

D. Alvarez-Melis, Y. Schiﬀ, and Y. Mroueh. Optimizing Functionals on the

Space of Probabilities with Input Convex Neural Networks. arXiv Preprint,

2021.

B. Amos, L. Xu, and J. Z. Kolter. Input Convex Neural Networks. In International

Conference on Machine Learning (ICML), volume 34, 2017.

G. Aude, M. Cuturi, G. Peyré, and F. Bach. Stochastic Optimization for Large-

Scale Optimal Transport. In Advances in Neural Information Processing

Systems (NeurIPS), 2016.

D. Berchtold, N. Battich, and L. Pelkmans. A systems-level study reveals

regulators of membrane-less organelles in human cells. Molecular cell, 72(6):

1035–1049, 2018.

Y. Brenier. Polar Factorization and Monotone Rearrangement of Vector-Valued

Functions. Communications on pure and applied mathematics, 44(4):375–417,

1991.

C. Bunne, L. Meng-Papaxanthos, A. Krause, and M. Cuturi. JKOnet: Proximal

Optimal Transport Modeling of Population Dynamics. arXiv Preprint, 2021.

10

.CC-BY-NC 4.0 International licenseavailable under a

Z. Cang and Q. Nie. Inferring spatial and signaling relationships between cells

from single cell transcriptomic data. Nature Communications, 11(1), 2020.

A. E. Carpenter, T. R. Jones, M. R. Lamprecht, C. Clarke, I. H. Kang, O. Friman,

D. A. Guertin, J. H. Chang, R. A. Lindquist, J. Moﬀat, et al. Cellproﬁler: image

analysis software for identifying and quantifying cell phenotypes. Genome

biology, 7(10):1–11, 2006.

Y. Chen, Y. Shi, and B. Zhang. Optimal Control Via Neural Networks: A

Convex Approach. In International Conference on Learning Representations

(ICLR), 2019.

M. Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport.

In Advances in Neural Information Processing Systems (NeurIPS), volume 26,

2013.

P. Demetci, R. Santorella, B. Sandstede, W. S. Noble, and R. Singh. Gro-

mov–Wasserstein Optimal Transport to Align Single-Cell Multi-Omics Data.

BioRxiv, 2020.

A. Dixit, O. Parnas, B. Li, J. Chen, C. P. Fulco, L. Jerby-Arnon, N. D. Marjanovic,

D. Dionne, T. Burks, R. Raychowdhury, et al. Perturb-Seq: Dissecting

Molecular Circuits with Scalable Single-Cell RNA Proﬁling of Pooled Genetic

Screens. Cell, 167(7):1853–1866, 2016.

C. J. Frangieh, J. C. Melms, P. I. Thakore, K. R. Geiger-Schuller, P. Ho, A. M.

Luoma, B. Cleary, L. Jerby-Arnon, S. Malu, M. S. Cuoco, et al. Multimodal

pooled perturb-cite-seq screens in patient models deﬁne mechanisms of cancer

immune evasion. Nature genetics, 53(3):332–341, 2021.

V. A. Green and L. Pelkmans. A systems survey of progressive host-cell re-

organization during rotavirus infection. Cell host & microbe, 20(1):107–120,

2016.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A

kernel two-sample test. The Journal of Machine Learning Research, 13(1),

2012.

M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup. Eﬃcient subpixel image

registration algorithms. Optics letters, 33(2):156–158, 2008.

G. Gut, M. D. Herrmann, and L. Pelkmans. Multiplexed protein maps link

subcellular organization to cellular states. Science, 361(6401), 2018.

C.-W. Huang, R. T. Q. Chen, C. Tsirigotis, and A. Courville. Convex Potential

Flows: Universal Probability Distributions with Optimal Transport and Con-

vex Optimization. In International Conference on Learning Representations

(ICLR), 2021.

G.-J. Huizing, G. Peyré, and L. Cantini. Optimal transport improves cell-cell

similarity inference in single-cell omics data. bioRxiv, 2021.

11

.CC-BY-NC 4.0 International licenseavailable under a

L. Kantorovich. On the transfer of masses (in Russian). In Doklady Akademii

Nauk, volume 37, 1942.

D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. In

International Conference on Learning Representations (ICLR), 2014.

M. Knott and C. S. Smith. On the optimal mapping of distributions. Journal of

Optimization Theory and Applications, 43(1), 1984.

A. Korotin, L. Li, A. Genevay, J. Solomon, A. Filippov, and E. Burnaev. Do Neu-

ral Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark.

arXiv Preprint, 2021.

B. A. Kramer and L. Pelkmans. Cellular state determines the multimodal

signaling response of single cells. bioRxiv, 2019.

H. Lavenant, S. Zhang, Y.-H. Kim, and G. Schiebinger. Towards a mathematical

theory of trajectory inference. arXiv preprint arXiv:2102.09204, 2021.

P. Liberali, B. Snijder, and L. Pelkmans. A hierarchical map of regulatory genetic

interactions in membrane traﬃcking. Cell, 157(6):1473–1487, 2014.

R. Lopez, J. Regier, M. B. Cole, M. I. Jordan, and N. Yosef. Deep generative

modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058,

2018.

M. Lotfollahi, F. A. Wolf, and F. J. Theis. scGen predicts single-cell perturbation

responses. Nature Methods, 16(8), 2019.

A. Makkuva, A. Taghvaei, S. Oh, and J. Lee. Optimal transport mapping

via input convex neural networks. In International Conference on Machine

Learning (ICML), volume 37, 2020.

L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximation

and Projection for Dimension Reduction. arXiv Preprint, 2018.

P. Mokrov, A. Korotin, L. Li, A. Genevay, J. Solomon, and E. Burnaev. Large-

Scale Wasserstein Gradient Flows. arXiv Preprint, 2021.

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian, A. Solomon,

J. Gould, S. Liu, S. Lin, P. Berube, et al. Optimal-Transport Analysis of Single-

Cell Gene Expression Identiﬁes Developmental Trajectories in Reprogramming.

Cell, 176(4), 2019.

S. M. Shaﬀer, M. C. Dunagin, S. R. Torborg, E. A. Torre, B. Emert, C. Krepler,

M. Beqiri, K. Sproesser, P. A. Braﬀord, M. Xiao, et al. Rare cell variability

and drug-induced reprogramming as a mode of cancer drug resistance. Nature,

546(7658):431–435, 2017.

B. Snijder, R. Sacher, P. Rämö, E.-M. Damm, P. Liberali, and L. Pelkmans.

Population context determines cell-to-cell variability in endocytosis and virus

infection. Nature, 461(7263):520–523, 2009.

12

.CC-BY-NC 4.0 International licenseavailable under a

B. Snijder, R. Sacher, P. Rämö, P. Liberali, K. Mench, N. Wolfrum, L. Burleigh,

C. C. Scott, M. H. Verheije, J. Mercer, et al. Single-cell analysis of population

context advances RNAi screening at multiple levels. Molecular Systems Biology,

8(1):579, 2012.

S. G. Stark, J. Ficek, F. Locatello, X. Bonilla, S. Chevrier, F. Singer, G. Rätsch,

and K.-V. Lehmann. Scim: universal single-cell matching with unpaired

feature sets. Bioinformatics, 36, 2020.

T. Stoeger, N. Battich, M. D. Herrmann, Y. Yakimovich, and L. Pelkmans.

Computer vision for image-based transcriptomics. Methods, 85:44–53, 2015.

A. Taghvaei and A. Jalali. 2-Wasserstein Approximation via Restricted Convex

Potentials with Application to Improved Training for GANs. arXiv Preprint,

2019.

S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner,

N. Yager, E. Gouillart, and T. Yu. scikit-image: image processing in python.

PeerJ, 2:e453, 2014.

C. Villani. Topics in Optimal Transportation, volume 58. American Mathematical

Soc., 2003.

K. D. Yang, K. Damodaran, S. Venkatachalapathy, A. C. Soylemezoglu, G. Shiv-

ashankar, and C. Uhler. Predicting cell lineages using autoencoders and

optimal transport. PLoS Computational Biology, 16(4), 2020.

S. Zhang, A. Afanassiev, L. Greenstreet, T. Matsumoto, and G. Schiebinger.

Optimal transport analysis reveals trajectories in steady-state systems. bioRxiv,

2021.

13

.CC-BY-NC 4.0 International licenseavailable under a

Appendix

A Related Work

Consider a single-cell dataset of a binary perturbation. Let

{x1. . . xn}

,

xi∈ X

,

drawn from

ρc∪ρk

and let

c

(

i

)

∈ {

0

,

1

}

indicate the perturbation status of a

single cell,

c(i) = (0,if xi∼ρc

1,if xi∼ρk.

A.1 scGen

Given representations

{z1. . . zn}

of

{x1. . . xn}

, learned by an autoencoder, with

encoder

φ

and decoder

ψ

,scGen (Lotfollahi et al., 2019) predicts a perturbation

response using latent space arithmetic. Let

¯z(l)

be the mean of representations

in condition l

¯z(l)=1

|{i:c(i) = l}| Xziδc(i)l,

the perturbed state of x0∼ρcis predicted as

ψ(φ(x0)−¯z(0) + ¯z(1)).

A.2 cAE

The conditional autoencoder is based on a batch correction technique popular

within the single-cell community, ﬁrst introduced by (Lopez et al., 2018). It

introduces condition-speciﬁc parameters into the encoder and decoder, which

attempt to remove and replace information in the data speciﬁc to their conditions.

They operate by concatenating one-hot encodings of condition labels (here,

perturbation status) to the inputs of the encoder and decoder. These encodings,

in eﬀect, make the bias term in the ﬁrst layer of the encoder and decoder a

learnable parameter speciﬁc to each condition and are thus are also considered

to learn a linear shift in latent space. Given an encoder

φ

and decoder

ψ

, the

network is trained to reconstruct cells conditioning on its true label

zi=φ(xi|c(i)),ˆxi=ψ(zi|c(i)).

Once trained, the perturbed state of x0∼ρcis predicted as

zi=φ(x0|0),ˆx0=ψ(zi|1).

B Dataset

B.1 Single-Cell Multiplex Data

Biologists have various powerful technologies at their disposal, capable of cap-

turing multivariate single-cell measurements. High-content imaging, particularly

14

.CC-BY-NC 4.0 International licenseavailable under a

when augmented by multiplexing abilities such as by Iterative Indirect Immunoﬂu-

orescence Imaging (4i) (Gut et al., 2018), is ideally suited to study heterogeneous

cell responses. With 4i, ﬂuorescently labeled antibodies are iteratively hybridized,

imaged, and removed from a sample to measure the abundance and localization

of proteins and their modiﬁcations. Thus, 4i quickly generates large, spatially

resolved phenotypic datasets rich in molecular information from thousands of

treated and untreated (control) cells. Additionally to the multiplexed information

4i generates, information about cellular and nuclear morphology is routinely

extracted from microscopy images (without the need for 4i) by image analysis

algorithms (Carpenter et al., 2006).

Through multiplexing, 4i datasets are able to capture meaningful features

related to both the treatment response heterogeneity (e.g., the phosphorylation

or dephosphorylation of a kinase in a signaling pathway) and the pre-existing

cell-to-cell variability (e.g., protein levels related to diﬀerent cellular states or cell

cycle phases) which my determine treatment response. Traditional high-content

imaging datasets often need to compromise between features describing either

the former or the latter and may thus struggle to provide suﬃcient information

to pair treated and control cells accurately.

The cells were seeded in a 384-well plate, allowed to settle and adhere

overnight. Drugs and Dimethyl sulfoxide as the vehicle control was added to

the cells the next morning and incubated for 8 hours, after which the cells were

ﬁxed with Paraformaldehyde. Subsequently, 6 cycles of 4i were performed, for

which the images were acquired with an automated high-content microscope.

All image analysis steps were performed by our in-house platform called Tis-

sueMAPS (https://github.com/TissueMAPS). The steps included illumination

correction (Snijder et al., 2012), alignment of images from diﬀerent acquisition

cycles using Fast Fourier Transform (Guizar-Sicairos et al., 2008), segmentation

of nuclei and cell outlines (Stoeger et al., 2015), as well cellular and nuclear

measurements of intensity and morphology features using the scikit-image library

(Van der Walt et al., 2014).

C Experimental Details

To train all networks, we use the Adam optimizer (Kingma and Ba, 2014).

C.1 Baselines

To tune baseline models, we use a batch size of 128 and do a grid search over the

width [16,32] and depth [2,3] of the encoder and decoder hidden layers, latent

dimension [4,8], dropout rate [0

.

0,0

.

05,0

.

1,0

.

2] and learning rate [0

.

00001,

0.0001,0.001].

For both scGen and cAE we selected a width=32, depth=2, latent dim=8,

dropout=0.05. scGen uses a learning rate of 0.001, and cAE uses a learning

rate of 0.0001. Both models are trained for 1024 epochs.

15

.CC-BY-NC 4.0 International licenseavailable under a

Figure 5: Full set of Imatinib marginals.

C.2 Network Architectures

As suggested by Makkuva et al. (2020), we relax the convexity constraint on

gθ

and instead, penalize its negative weights Wh

l

R(θ) = λX

Wh

l∈θ

max −Wh

l,0

2

F.(8)

The convexity constraint on

fφ

is enforced after each update by setting negative

weights of all Wh

l∈φto zero. Thus the full objective then states

max

φ:Wh

l≥0,∀l

min

θfφ(∇gθ(y)) − hy, ∇gθ(y)i − fφ(x) + λ∗R(θ).

C.3 Hyperparameters

To learn the optimal transport maps, we use a batch size of 256, an ICNN

architecture of 4 hidden layers of width 64, a learning rate of 0

.

0001 (

β1

= 0

.

5,

β2

= 0

.

9) and

λ

=1. The inner loop minimizing

g

runs for 10 updates to every

update of f.

16

.CC-BY-NC 4.0 International licenseavailable under a

Figure 6: Full set of Erlotinib marginals.

17

.CC-BY-NC 4.0 International licenseavailable under a