PreprintPDF Available

A reconstruction of the mammalian secretory pathway identifies mechanisms regulating antibody production

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

The secretory pathway processes >30% of mammalian proteins, orchestrating their synthesis, modification, trafficking, and quality control. However, its complexity— spanning multiple organelles and dependent on coordinated protein interactions—limits our ability to decipher how protein secretion is controlled in biomedical and biotechnological applications. To advance such research, we present secRecon—a comprehensive reconstruction of the mammalian secretory pathway, comprising 1,127 manually curated genes organized within an ontology of 77 secretory process terms, annotated with functional roles, subcellular localization, protein interactions, and complex composition. Using secRecon to integrate multi-omics data, we identified distinct secretory topologies in antibody-producing plasma cells compared to CHO cells. Genes within proteostasis, translocation, and N-glycosylation are deficient in CHO cells, highlighting them as potential engineering targets to boost secretion capacity. Applying secRecon to single-cell transcriptomics and SEC-seq data, we uncovered secretory pathway signatures underlying secretion diversity among IgG-secreting plasma cells. Different transcriptomic clusters had unique secretory phenotypes characterized by variations in the unfolded protein response (UPR), endoplasmic reticulum-associated degradation (ERAD), and vesicle trafficking pathways. Additionally, we discovered specific secretory machinery genes as new markers for plasma cell differentiation. These findings demonstrate secRecon can identify mechanisms regulating protein secretion and guide diverse studies in biomedical research and biotechnology. Graphical Abstract
Content may be subject to copyright.
1
A reconstruction of the mammalian secretory pathway
1
identifies mechanisms regulating antibody production
2
Helen Masson1,*, Jasmine Tat,1,*, Pablo Di Giusto2,*, Athanasios Antonakoudis3, Isaac
3
Shamie4, Hratch Baghdassarian4, Mojtaba Samoudi2, Caressa M. Robinson1,2, Chih-
4
Chung Kuo1, Natalia Koga1,2 Sonia Singh1,2, Angel Gezalyan1,2, Zerong Li2, Alexia
5
Movsessian1,2, Anne Richelle5, Nathan E. Lewis1,2,6,¶
6
7
1 Department of Bioengineering, University of California, San Diego
8
2 Department of Pediatrics, University of California, San Diego
9
3 Sartorius Corporate Research, Royston, UK
10
4 Bioinformatics and Systems Biology Program, University of California, San Diego
11
5 Sartorius Corporate Research, Brussels, Belgium.
12
6 Center for Molecular Medicine, Complex Carbohydrate Research Center, and
13
Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA
14
* These authors contributed equally
15
Corresponding author: Nathan E. Lewis
16
Email: nlewisres@ucsd.edu
17
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
2
Graphical Abstract
18
19
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
3
Abstract
20
The secretory pathway processes >30% of mammalian proteins, orchestrating
21
their synthesis, modification, trafficking, and quality control. However, its complexity—
22
spanning multiple organelles and dependent on coordinated protein interactions—limits
23
our ability to decipher how protein secretion is controlled in biomedical and
24
biotechnological applications. To advance such research, we present secRecon—a
25
comprehensive reconstruction of the mammalian secretory pathway, comprising 1,127
26
manually curated genes organized within an ontology of 77 secretory process terms,
27
annotated with functional roles, subcellular localization, protein interactions, and complex
28
composition. Using secRecon to integrate multi-omics data, we identified distinct
29
secretory topologies in antibody-producing plasma cells compared to CHO cells. Genes
30
within proteostasis, translocation, and N-glycosylation are deficient in CHO cells,
31
highlighting them as potential engineering targets to boost secretion capacity. Applying
32
secRecon to single-cell transcriptomics and SEC-seq data, we uncovered secretory
33
pathway signatures underlying secretion diversity among IgG-secreting plasma cells.
34
Different transcriptomic clusters had unique secretory phenotypes characterized by
35
variations in the unfolded protein response (UPR), endoplasmic reticulum-associated
36
degradation (ERAD), and vesicle trafficking pathways. Additionally, we discovered
37
specific secretory machinery genes as new markers for plasma cell differentiation. These
38
findings demonstrate secRecon can identify mechanisms regulating protein secretion and
39
guide diverse studies in biomedical research and biotechnology.
40
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
4
Keywords
41
secRecon; Secretory pathway; Plasma cells; CHO cells; multi-omics; SEC-seq
42
Introduction
43
Protein secretion is a fundamental biological process in all living organisms
44
coordinated via the secretory pathway, a highly coordinated network of processes
45
responsible for synthesizing, processing, and delivering products of approximately one-
46
third of all protein coding genes in mammals1,2 including both secreted and membrane-
47
bound proteins. These proteins are critical for maintaining cellular homeostasis, cell
48
signaling, and intercellular communication3,4. The secretory pathway in mammalian cells
49
is hosted in a series of membrane-bound organelles and transport vesicles that facilitate
50
the trafficking of proteins and lipids between these organelles5. The process starts with
51
the synthesis of proteins in the endoplasmic reticulum (ER), where they undergo folding,
52
assembly, and post-translational modifications. Proteins are then transported to the Golgi
53
apparatus, where they are further modified and sorted6. Finally, the proteins are delivered
54
to their designated locations, e.g., the plasma membrane, lysosomes, or the extracellular
55
space, through the trans-Golgi network6.
56
Understanding the secretory network is critical to the study of diverse diseases and
57
efforts to produce life-altering biotherapeutics. Specifically, in the biotechnology industry,
58
the secretory pathway is harnessed for manufacturing diverse recombinant proteins, from
59
therapeutic monoclonal antibodies to industrial enzymes7. Thus, optimizing protein
60
secretion in host cells, including mammalian cells, is crucial for enhancing protein
61
production yield and quality8,9. Furthermore, understanding protein secretion has
62
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
5
important implications in immunology, as the immune response is mediated by the
63
release of cytokines, chemokines, and other signaling molecules 10. Disruptions in protein
64
secretion pathways can lead to various pathological conditions, such as
65
neurodegenerative diseases, diabetes, and cancer1114. Therefore, research into the
66
mechanisms of protein secretion has important biotechnological applications and
67
provides essential insights into human health and disease6.
68
To fully comprehend the intricacies of protein secretion, it helps to account for the
69
function of each molecular component within the pathway15. This includes considering
70
complex composition, localization, and interaction partners, which can then be organized
71
as a map that captures the spatial and functional organization of the secretory landscape1.
72
This further allows researchers to analyze data in the context of the cell, thus more
73
effectively diagnosing sources of variation in protein secretion using systems biology
74
tools16.
75
To enable systems-level analyses of the secretory pathway, previous studies
76
aimed to enumerate the proteins involved. For example, previous work presented
77
secretory pathway reconstructions in yeast, mouse, Chinese Hamster Ovary (CHO) cells,
78
and human cells1719. Our lab further reconstructed a mechanistic model of the
79
mammalian secretory pathway, consisting of 261 proteins in CHO cells and 271 proteins
80
in human and mouse distributed across 12 subsystems1. We further formulated it for
81
constraint-based modeling, which allowed us to simulate biosynthetic fluxes and quantify
82
metabolic resource demands for protein secretion. While these reconstructions provide a
83
solid foundation of the core components of the secretory pathway, they only accounted
84
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
6
for less than ¼ of the pathway; thus, we aimed to expand these models to include more
85
components and detailed annotations of the secretory pathway.
86
Here we introduce secRecon, a comprehensive reconstruction of the mammalian
87
secretory pathway that considerably enhances previous models by expanding both
88
coverage and depth. secRecon integrates extensive information from literature and public
89
databases, encompassing 1,127 manually curated genes organized within a functional
90
ontology of 77 secretory pathway processes. A key finding from our analysis using
91
secRecon reveals that the topology of the secretory pathway is predominantly organized
92
by functional associations rather than subcellular localization, highlighting the central role
93
of biological processes in structuring the pathway. This knowledgebase facilitates
94
exploration of disease mechanisms involving altered secretion and supports the targeted
95
engineering of cell factories for biotherapeutic production. To demonstrate its utility, we
96
applied secRecon to multi-omics datasets of CHO and human plasma cells, uncovering
97
conserved and distinct secretory pathway features that influence protein secretion.
98
Additionally, analysis of single-cell SEC-seq data using secRecon revealed key secretory
99
processes driving plasma cell differentiation and IgG secretion heterogeneity.
100
Consequently, secRecon emerges as a valuable resource for dissecting complex cellular
101
processes at both bulk and single-cell levels, addressing a wide range of biological and
102
biotechnological questions.
103
Results
104
secRecon contains a comprehensive annotation of 1127 secretory
105
machinery genes in the mammalian secretory pathway
106
Previous reconstructions of the mammalian secretory pathway identified genes
107
involved in Human18,1, Mouse1,17, and Chinese hamster1,17. In this work, we compiled and
108
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
7
curated these gene sets to generate a consensus reconstruction with expanded scope
109
and depth through extensive literature review and database curation. We meticulously
110
assessed each gene for its role in protein secretion (Supplementary Figure 1). During
111
this process, we excluded certain gene subsets (Supplementary Table S1), such as
112
transcription factors and genes involved in lipid and cholesterol metabolism or cell cycle
113
processes, because they had been previously included solely based on protein-protein
114
interactions identified through STRING17, without direct evidence of involvement in the
115
secretory pathway. Other genes were removed upon confirming limited evidence in the
116
literature supporting their role in secretion. Recognizing the crucial role of glycosylation
117
in protein structure and function, we included an additional 725 genes from the
118
GlycoGene DataBase20. We further enriched our dataset by incorporating information
119
from databases such as CORUM21, UniProt22, STRINGdb23 and the OrganellesDB from
120
the Chan Zuckerberg Biohub24. These resources provided detailed annotations of gene
121
aliases, orthologs, protein complex interactions, subcellular localization, tissue specificity,
122
and transmembrane domains (Supplementary Figure 1C). Our resulting knowledge
123
base, named secRecon, comprises 1127 genes, all validated through scientific literature
124
and comprehensively annotated using well established databases (Supplementary
125
Figure 1, Supplementary Table S1).
126
During the curation of this knowledgebase, we classified each gene into one or
127
more categories within a newly developed ontology specific to the secretory pathway and
128
assigned a relevance score to each assignment (Figure 1A, Supplementary Table S2).
129
This ontology comprises 77 distinct terms organized hierarchically, where subprocesses
130
are nested within processes, which are further nested within subsystems, all
131
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
8
encompassed by five primary systems: translocation, protein conformation, post-
132
translational modifications, proteostasis, and vesicle trafficking. Among these primary
133
systems, the most gene-rich categories were Vesicle Trafficking (385 genes), Post-
134
translational Modifications (273 genes), and Proteostasis (228 genes). In contrast, Protein
135
Conformation and Translocation encompassed fewer genes, with 50 and 17 genes
136
respectively. These numbers represent the count of genes assigned exclusively to a
137
single system. Additionally, approximately 14% of the genes in secRecon (174 genes)
138
are associated with multiple systems (Figure 1B), underscoring the inherent
139
interconnectivity of the secretory pathway.
140
Figure 1. Ontological Classification and Gene Involvement in the Secretory
141
Pathway. (A) Sunburst plot showing the hierarchical structure of our secretory pathway
142
ontology. This ontology consists of 77 distinct terms categorized under five primary
143
systems: translocation, protein conformation, post-translational modifications,
144
proteostasis, and vesicle trafficking. Each system is further subdivided into subsystems,
145
processes, and subprocesses, reflecting the nested organization. (B) The UpSet plot
146
demonstrates the overlap and interconnectivity of genes involved in various secretory
147
pathway systems. Each column represents a specific combination of processes, and the
148
height of the bar indicates the number of genes shared among those processes. The first
149
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
9
five columns represent the number of genes involved in unique systems, while the
150
remaining columns represent genes shared by more than one system. The majority of the
151
genes belong to vesicle trafficking, proteostasis and post-translational modifications. The
152
plot also reveals significant gene overlap, underscoring the multifunctional roles of many
153
genes within the secretory pathway.
154
155
The variations in gene counts among the primary systems reflect the complexity
156
and breadth of the subsystems, processes, and subprocesses they encompass (See
157
Supplementary Table S2). For instance, Vesicle Trafficking, the most gene-rich system,
158
includes numerous subsystems and processes such as pre-Golgi and post-Golgi
159
trafficking, vesicle budding, membrane fusion, and cytoskeletal remodeling. Similarly,
160
Post-Translational Modifications covers a wide array of modifications including
161
glycosylation (with multiple types and pathways), lipidation, phosphorylation, and disulfide
162
bond formation. Proteostasis involves diverse processes including autophagy, UPR
163
signaling pathways, ER-associated degradation (ERAD), and calcium homeostasis. In
164
contrast, Protein Conformation and Translocation involve more specialized and narrowly
165
focused processes.
166
167
Topological analysis of secRecon shows the secretory pathway is first
168
organized by function, followed by localization
169
One might assume that the primary determinant of the secretory pathway topology
170
is subcellular localization—given that proteins localize to specific membrane-bound
171
compartments (e.g., ER, Golgi apparatus, lysosomes, and vesicles). However, we
172
hypothesized that functional associations might play a more significant role in organizing
173
the topology of the secretory pathway. To explore this, we utilized the extensive
174
annotation in secRecon to integrate functional annotations, subcellular localization,
175
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
10
protein-protein interactions (PPIs), and protein complex information (Figure 2,
176
Supplementary Table S1, Supplementary Table S3).
177
Using the Fruchterman-Reingold force-directed algorithm25, which positions nodes
178
(genes) based on their connections, we constructed networks where edges represent
179
shared processes and protein complexes (Figure 2A). As expected, the algorithm
180
naturally clustered the genes into well-defined groups according to specific secRecon
181
systems, such as vesicle trafficking, proteostasis, and others (Figure 2A, left panel).
182
Within these major clusters defined by secRecon systems, genes further exhibited
183
organization based on their subcellular localization, as visible in the plot. This pattern is
184
particularly evident for genes associated with the Golgi apparatus, proteasome, and ER-
185
Golgi intermediate compartment (ERGIC), among others (Figure 2A, right panel). The
186
co-localization of functionally related genes within the same subcellular compartment
187
allows for spatial organization and optimization of metabolic and signaling pathways26,27.
188
We next visualized the protein-protein interaction (PPI) topology of the network
189
obtained from STRINGdb (Figure 2B), which captures physical interactions underlying
190
virtually every cellular process, from signal transduction to metabolic and secretory
191
pathways28,29. The PPI network for secRecon shows genes cluster by shared functions in
192
specific biological processes. Indeed, the Fruchterman-Reingold force-directed algorithm,
193
applied to the PPI data, clusters genes into the major secRecon systems (Figure 2B, left
194
panel), suggesting protein interactions are tightly correlated with secretory function.
195
Similarly, when the network is annotated by subcellular localization,
196
197
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
11
198
Figure 2. Network-Based Representation of secRecon Highlights Functional and
199
Interaction Topologies in the Mammalian Secretory Pathway: Each one of the four
200
panels contains an undirected node-link graph representation of secRecon with edges
201
depicting shared processes (A) or protein-protein interactions (B). Left panels display
202
nodes colored by the systems in secRecon and right panels display nodes colored by
203
subcellular localization. The normalized mutual information (NMI) score displayed at the
204
bottom left of each network indicates the degree of alignment between the network's
205
community structure and either the secRecon systems (left panels) or subcellular
206
localization categories (right panels). A higher NMI score signifies a stronger correlation
207
between the network clustering and the respective annotation.
208
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
12
clusters of genes that co-localize within the same cellular compartments are observed
209
(Figure 2B, right panel).
210
To assess how much of the clustering can be attributed to system-level
211
organization versus subcellular localization, we performed Louvain community detection
212
across a range of resolution parameters. We evaluated the resulting community
213
structures using three key metrics: modularity, normalized mutual information (NMI) with
214
subcellular localization, and NMI with system categories. By analyzing these metrics
215
through elbow plots, we identified optimal resolutions that revealed a clear trade-off
216
between modularity and the NMI scores for systems and subcellular localization
217
(Supplementary Figure 2).
218
In the functional topology network, the NMI score with system categories reached
219
0.76, while the NMI with subcellular localization remained relatively low at 0.21 (Figure
220
2A, Supplementary Figure 2A). This outcome aligns with our expectations since the
221
network was constructed primarily based on functional annotations from secRecon. As
222
for the PPI topology network, the NMI score with system categories was 0.43, slightly
223
higher than the NMI with subcellular localization at 0.35 (Figure 2B, Supplementary
224
Figure 2B). These results further support the notion that system-level organization is a
225
stronger determinant of network structure than subcellular localization.
226
These findings indicate that the community structures of both networks are more
227
closely aligned with system-level organization than with subcellular localization,
228
reinforcing our hypothesis that functional associations play a more significant role in
229
organizing the secretory pathway. The stronger correlation between community structure
230
and system categories suggests that the functional roles of proteins—such as their
231
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
13
involvement in vesicle trafficking or proteostasis—are more predictive of the network's
232
organization than their spatial localization. This supports the idea that biological
233
processes, rather than compartmental localization alone, drive the structural and
234
functional coherence of the secretory pathway. Both network plots representing functional
235
and PPI topologies highlight how secRecon encapsulates our current understanding of
236
the mammalian secretory pathway in a biologically relevant format, offering a valuable
237
resource for systemic analysis of mammalian protein secretion.
238
239
Analysis of multi-omic data with secRecon identifies secretory pathway
240
signatures linked to antibody secretion in CHO and Plasma cells
241
The biotherapeutics industry is rapidly expanding, with drugs spanning a wide
242
array of therapeutic modalities30. However, increasing demands for complex biologics
243
(e.g., multispecific antibodies, fusion proteins, etc.), presents a considerable challenge
244
for efficient and scalable production. CHO cells remain the gold standard mammalian host
245
system for manufacturing most therapeutic proteins, since they can secrete large
246
molecules with human-compatible post-translational modifications31. However, their
247
epithelial-like origin does not inherently equip them for high secretion. By contrast, plasma
248
cells are specialized immune cells that secrete vast quantities of antibodies32, offering a
249
natural model to advance cell line engineering to maximize protein secretion in CHO cells.
250
To identify important features of plasma cells missing in CHO, industry-standard
251
monoclonal antibody-producing CHO cell lines (CHO-DG44-mAb1 and CHO-K1-mAb2)
252
were compared against four plasma cell-derived (PCD) lines of murine origin (MPC-11,
253
P3X63Ag8) and human origin (JK-6L, Karpas-25) in a multi-omics analysis33. Given the
254
focal point of the secretory pathway in these comparisons, we explored if secRecon could
255
offer insight on the global secretory topology and facilitate process-level network analyses
256
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
14
of differentially expressed secretory machinery, thus, revealing avenues for future CHO
257
cell line optimization.
258
The expression of the secretory pathway topologically deviates between
259
human, murine, and CHO cells.
260
We first questioned if the topology of the secretory transcriptome and proteome
261
differed between plasma cells and CHO cells. Pairwise correlations of secRecon gene
262
expression between the cell lines revealed high intra-species correlation (r > 0.9) and
263
moderate correlation (r ~ 0.7) between CHO and plasma cell lines (Figure 3A). At the
264
proteome scale, a similar trend was observed, though with lower correlations (intra-
265
species r > 0.7; inter-species r ~ 0.5) (Figure 3C). When comparing hierarchical clustering
266
of secRecon gene expression, we found that CHO cell gene-level dendrograms
267
moderately correlated with those of murine and human plasma cells, while the murine
268
and human plasma cell topologies surprisingly exhibited lower similarity (Figure 3B).
269
Protein-level dendrograms between CHO and murine plasma cells, however, showed
270
considerable divergence (Figure 3D). These observations suggest that while the
271
expression ranges of secretory machinery are largely conserved within and across
272
species, the wiring and regulation of specific components may have evolved differently
273
between species.
274
The expression of the secretory machinery is more similar at the pathway
275
level across species.
276
To contextualize how differential wiring of individual machinery genes may impact
277
overall secretory pathway activity and topology, we applied the secRecon ontology to
278
score secretory geneset activity using gene set variability analysis35 (GSVA). As
279
expected, correlations at the pathway level using transcriptomic and proteomic GSVA
280
scores were overall stronger than at individual gene- or protein-level correlations, with
281
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
15
intra-species correlations exhibiting the strongest correlations (Figure 3E, 3G). Pathway-
282
level topology derived from GSVA score clustering also revealed greater interspecies
283
similarity compared to individual gene- or protein-level clustering (Figure 3F, 3H).
284
Notably, the proteome pathway-level topology between CHO and murine plasma cells
285
were highly similar, despite their greater dissimilarity at the protein level (Figure 3D, 3H).
286
Figure 3. Pairwise correlation and clustering topology of secRecon transcriptome
287
and proteome for antibody secreting CHO and plasma cell lines. Spearman rank
288
correlation of individual secRecon (A) gene expression and (C) protein abundance was
289
performed between each cell line, colored by CHO, murine, or human cell line origin.
290
Dendrograms derived from hierarchical clustering of secRecon (B) gene expression and
291
(D) protein abundance were correlated using Baker’s gamma index34. Pairwise spearman
292
rank correlation (E,G) and dendrogram correlation (F,H) were similarly conducted at the
293
pathway-level using GSVA35 scoring of secRecon genesets.
294
295
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
16
Altogether, these topological analyses suggest that despite variations in individual
296
secretory machinery components across mammalian species, their activities and
297
regulatory mechanisms tend to converge at the pathway level. This highlights the
298
translational potential of functional and pathway-level knowledgebases, such as
299
secRecon, to study cross-species biological systems. However, when targeting specific
300
genes or resolving subtle variations in secretory regulation, it may be crucial to consider
301
species- and host-specific differences in the wiring of secretory machinery, as indicated
302
by diverging gene and protein-level topologies (Figure 3B, 3D).
303
Secretory pathway signatures primarily localized to ER and Golgi
304
compartments are upregulated in plasma cells.
305
To further study the conservation and divergence of secretory mechanisms in
306
plasma and CHO cells, we overlaid the differential secretory transcriptome and proteome
307
onto secRecon's Functional and PPI networks (Figure 4, Supplementary Figure 3). We
308
found that plasma cells exhibit significant upregulation of the global secretory
309
transcriptome relative to CHO cells, in particular for machinery localized to the ER and
310
Golgi apparatus (Figure 4A). By overlaying the data on PPI networks, we found that
311
upregulated genes clustered around "Proteostasis" and "Protein Conformation" with
312
many genes localized to the ER (Figure 4C). Similarly, the secretory proteome was
313
substantially upregulated in plasma cells compared to CHO cells, with dominant
314
upregulated functional clusters mirroring those observed in the transcriptome
315
(Supplementary Figure 3A). However, in the PPI network analysis, a pronounced
316
central cluster of interactions showed increased enrichment of "Protein Conformation"
317
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
17
318
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
18
Figure 4. Network-based Transcriptomic Analysis of Plasma Cells vs CHO Cells:
319
Log2 fold change in gene expression between plasma and CHO cells were overlaid on
320
secRecon. Differentially expressed genes were visualized in separate network plots for
321
upregulated (A and C) and downregulated (B and D) genes in plasma cells. Panels A
322
and B display these genes in a network layout based on the secRecon system ontology,
323
where nodes are colored according to the major secretory pathway systems (left), or their
324
subcellular localization (right). Panels C and D show the same sets of genes plotted in a
325
network representation based on protein-protein interactions, with node colors according
326
to the major secretory pathway systems (left), or their subcellular localization (right). The
327
size of each node corresponds to the magnitude of fold change in expression.
328
329
330
331
proteins and decreased enrichment of "Proteostasis" proteins compared to the
332
transcriptome (Supplementary Figure 3C). Additionally, the proteins associated with this
333
central cluster were predominantly localized in the ER, with a smaller connected
334
subcluster involving genes associated with the ERGIC.
335
Global network analysis, however, may mask the specific secretory pathway
336
signatures of these cells. To provide a more detailed view of the secretory pathway
337
architecture unique to each cell type, we leveraged secRecon GSVA scores to identify
338
differentially enriched secretory gene sets between these cell types. This analysis
339
revealed processes within the "Proteostasis" system (e.g., the PERK pathway”, ER
340
stress-induced pre-emptive quality control” (ERpQC)) and N-glycosylation” within the
341
"Post-translational modification" system to be significantly upregulated in plasma cells
342
(Figure 5A). These processes were similarly upregulated in plasma cells at the protein
343
level, with notable increases in enrichment for Vesicle budding” within the "Vesicle
344
Trafficking" system and “Co-translational translocation” within the Translocation” system
345
(Figure 5B).
346
347
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
19
Figure 5. Differentially enriched secRecon processes between plasma and CHO
348
cells: Differential testing of secRecon GSVA scores identified differentially enriched
349
secretory processes and subprocesses between plasma and CHO cells at both the
350
transcript (A) and protein (B) levels. In these volcano plots, the x-axis represents the
351
difference in GSVA enrichment scores between plasma cells and CHO cells (positive
352
values indicate higher activity of the geneset in plasma cells), while the y-axis represents
353
the statistical significance of the enrichment (-log10 adjusted p-value). Processes within
354
systems differentially enriched in both transcript and protein analyses are labeled. Log
355
fold-change of secretory machinery annotated under identified processes significantly
356
upregulated in both the transcript or protein enrichments were overlaid on PPI networks
357
for (C) transcripts or (D) protein. The size of each node corresponds to the magnitude of
358
fold change in expression in plasma vs CHO cells (bigger nodes indicate higher
359
expression in plasma cells), while the color of each node indicates the process to which
360
it belongs. Nodes associated with processes not identified in the GSVA scoring analysis
361
are colored in gray.
362
363
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
20
For processes upregulated in both the transcriptome and proteome we performed
364
PPI-based topological network analyses to study the relationships between the
365
components involved in these processes. Analyzing PPIs can reveal how proteins
366
physically and functionally interact within the cell, and can identify key clusters within the
367
network. The PPI network revealed a tight cluster of components annotated under the
368
PERK pathway”, Co-translational translocation” and additional nodes assigned
369
to“ERpQC” at both the transcript and protein levels (Figure 5C, 5D). This suggests that
370
the coordinated activity of these proteins is a key feature distinguishing the secretory
371
machinery of plasma cells from that of CHO cells. Interestingly, both networks also
372
revealed several components related to N-glycosylation”, with an increased number of
373
these nodes present in the transcriptome (Figure 5C). However, this cluster is not as tight
374
as the previous one, highlighting much less interaction between these proteins. This
375
suggests that the proteins involved in N-glycosylation may function more independently
376
or sequentially, rather than forming tightly interacting complexes. N-glycosylation involves
377
a series of enzymatic steps wherein different glycosyltransferases act on substrates
378
without necessarily forming stable complexes with one another36,37. This contrasts with
379
the ER stress response”, UPR”, and translocation” processes (Figure 5,
380
Supplementary Figure 4), where proteins often assemble into larger complexes to
381
facilitate coordinated actions38,39. These results demonstrate that secRecon can not only
382
identify unique secretory pathway signatures but also effectively contextualize the
383
processes underlying different cellular conditions. Within these enriched processes, we
384
further identified genes differentially expressed at the transcript and protein level
385
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
21
(Supplementary Figure 5) which could be further studied and engineered to maximize
386
the secretory capacity of industrial CHO cell lines.
387
388
Analysis of single cell omics with secRecon identifies pathways driving
389
plasma cell differentiation and IgG secretion
390
Activated B cells undergo transcriptional, epigenetic, and morphological changes
391
to differentiate into antibody-secreting plasma cells, resulting in a highly heterogeneous
392
cell population4044. Single-cell omics studies and CRISPR screens are identifying
393
regulators and hallmarks of this process, elucidating how the secretory pathway impacts
394
specific antibody-secreting fates, such as the production of distinct heavy and light chain
395
immunoglobulin subtypes4346. Thus, here we wondered if secRecon could provide more
396
targeted insights into the secretory machinery’s involvement in plasma cell differentiation
397
and antibody secretion.
398
IgG secreting plasma cells exhibit distinct clusters characterized by unique
399
secretory phenotypes.
400
To answer this, we used SEC-seq data that quantified the single cell IgG secretion
401
and mRNA of activated B cells as they differentiated into antibody-secreting plasma
402
cells47. Interestingly, variation in immunoglobulin heavy and light chain subtype
403
expression, major sources of transcriptomic heterogeneity, were not strongly coupled with
404
the relative abundance of secreted IgG (Figure 6A). To identify markers and pathways
405
predictive of high antibody secretion, high and low IgG secreting plasma cells were
406
aggregated across clusters and pseudotime for differential expression analysis. Unfolded
407
protein response, glycosylation, and mitochondria-associated metabolic processes were
408
among the top pathways enriched in high IgG secretors47. Here, we used our secRecon
409
ontology to calculate average expression48 and relative PPI activity49 per geneset for each
410
cell. The relative contribution of each secRecon geneset in predicting single-cell IgG
411
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
22
secretion was subsequently quantified using Dominance Analysis50,51. We found 10.9%
412
of single-cell IgG secretion variation was explained by secretory processes (Figure 6B).
413
The most important secretory subsystems contributing to this variation were UPR”,
414
ERAD”, “vesicle trafficking”, and “protein folding” systems (Figure 6B, Supplementary
415
Figure 6D). This could be unraveled to more granular processes, specifically the PERK”
416
and IRE1” pathways, ubiquitination”, and autophagy”, which contributed the most
417
towards explaining IgG secretion variability. Given that metabolic processes were
418
previously enriched in high versus low IgG secreting cells in this dataset47, we found that
419
including pathways beyond secRecon such as oxidative phosphorylation GO-BP52,53
420
geneset explained 1.9 fold more variation than secRecon terms alone (Supplementary
421
Figure 6B, 6C), suggesting that maintaining ATP levels is essential to support energy-
422
intensive processes such as antibody secretion.
423
The IgG-secreting plasma cell population is composed of five distinct Leiden
424
clusters47 (Figure 6C). We speculated that analyzing all IgG secreting cells in aggregate
425
may mask further intra-population secretory phenotypes. To explore this, we applied
426
Dominance Analysis to predict IgG secretion for each cluster individually. Doing this, we
427
resolve “secretory fingerprints” per cluster (Figure 6D), which represent distinct and
428
shared secretory machinery that contribute most to the variation in IgG production in each
429
cluster. For example, IgG secretion for cluster 10 was more driven by “vesicle trafficking”,
430
post-translational modifications”, and differentiation state (as indicated by pseudotime).
431
In contrast, processes under UPR”, “ERAD”, and “protein conformation” contribute more
432
towards the other clusters’ secretory phenotypes. Further characterization revealed that
433
while overall canonical expression of secRecon genes is similar across clusters (aligned
434
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
23
with our findings from bulk RNA-Seq analysis of various plasma and CHO cells Figure
435
3A, 3C), 30-50% of these secRecon genes are differentially expressed between these
436
five IgG secreting Leiden clusters with notable differences in co-expression topologies
437
(Figure 6E, 6F), suggesting that fine tuning of secRecon expression and topological
438
distinctions likely contribute to the respective unique secretory phenotypes.
439
440
Figure 6. Using secRecon ontology to identify secRecon processes and machinery
441
correlating to IgG secretion. (A) SEC-seq data47 were used to link single-cell IgG
442
secretion to transcriptome for a diverse population of B cells. The UMAPs are colored by
443
major sources of single-cell heterogeneity (left to right: Leiden transcriptomic clusters,
444
expression of IGH chain subtype, secreted IgG concentration as quantified by SEC-seq,
445
expression of IGL chain subtype. (B) Dominance analysis quantifies relative importance
446
of secRecon genesets explaining the variation in single-cell IgG secretion for the IgG
447
secreting population. (C) The IgG secreting population is largely composed of 5 distinct
448
subpopulations as annotated by Leiden clustering. (D) Dominance analysis is applied to
449
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
24
each Leiden cluster to reveal distinct secretory fingerprints associated with IgG secretion
450
heterogeneity. (E) Pairwise secRecon gene expression and (F) gene-level topologies
451
were correlated using Spearman rho and Baker’s Gamma index respectively for
452
pseudobulk populations within each Leiden cluster.
453
454
Expression of secretory pathway subprocesses is strongly associated with
455
plasma cell differentiation
456
Rewiring of secretory machinery is crucial for preparing activated B cells to
457
transition to the stressful antibody-secreting state; although, the associated molecular
458
mechanisms involved remain poorly understood41,42,44. Thus to dissect secretory
459
machinery underlying the differentiation trajectory, we applied Dominance and correlation
460
analysis to predict secRecon genesets and individual genes associated with pseudotime
461
(Figure 7). Specifically, early pseudotime, corresponding to the activated B cell state,
462
was associated with elevated activity in cytoskeleton remodeling”, “post-Golgi vesicle
463
trafficking”, and ERAD” processes. In contrast, late pseudotime, corresponding to the
464
mature plasma cell state, was associated with elevated activity of glycosaminoglycan”
465
post-translational modifications, ER pre-quality control”, and Golgi organization”
466
processes (Figure 7C). Previous studies identified UPR and autophagy activation as
467
hallmarks of early antibody-secreting cell (ASC) differentiation, facilitating extensive
468
secretory organelle remodeling such as ER expansion, preventing stress-induced
469
apoptosis, and priming for increased cargo load, whereas disulfide bond formation and
470
protein folding become more prominent in the mature antibody-secreting state40,41,42,44.
471
Interestingly, while transcription of antibody heavy chain genes was moderately
472
correlated with pseudotime (r = 0.28, BH-corrected P < 0.001), the secreted IgG
473
concentration was weakly correlated with pseudotime (r = -0.04 , BH-corrected P <
474
0.001). This suggests that post-transcriptional processes, such as the secretory
475
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
25
processes identified in our Dominance Analysis, may act independently of well-known
476
differentiation hallmarks to regulate IgG secretion.
477
To further investigate potential markers of differentiation, we performed a
478
secRecon gene-level correlation analysis against pseudotime progression (Figure 7D).
479
Among the top positively correlated genes, we found multiple plasma cell differentiation
480
markers (XBP146,54, FKBP1155, DERL342, PRDX454). Additionally, many of the top
481
correlated genes, such as SSR4, SSR3, SPCS3, KDELR, SPCS2, SAR1B, are annotated
482
in the Human Protein Atlas as plasma cell markers but remain functionally
483
uncharacterized (Figure 7D). Notably, most secRecon genes that significantly correlated
484
with pseudotime are novel or uncharacterized in the context of plasma cell differentiation.
485
These novel targets present opportunities for future CRISPR screens and functional
486
studies to further elucidate the molecular mechanisms underlying plasma cell
487
differentiation.
488
Altogether, we demonstrate that secRecon’s ontology and annotations enable a
489
comprehensive analysis from systems down to the gene level, contextualizing specific
490
secretory processes and machinery underlying single-cell phenotypic heterogeneity.
491
Furthermore, our findings reveal novel markers and pathways that could be of significant
492
interest to the plasma cell research community, potentially unlocking opportunities to
493
further understand plasma cell differentiation and antibody secretion.
494
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
26
495
Figure 7. Identifying secRecon processes and machinery underlying plasma cell
496
differentiation trajectory: (A) Pseudotime trajectory analysis of SEC-seq data provides
497
a continuous scale of differentiation from activated B cell to antibody secreting plasma
498
cells (PC)47. (B) Dominance analysis identifies predictive secRecon genesets collectively
499
explaining variation along plasma cell differentiation trajectory, as indicated by
500
pseudotime. secRecon (C) subsystems activity and (D) gene expression were correlated
501
to pseudotime, with the highest correlates labeled. Underlined subsystems and genes
502
were identified in existing studies to be associated with plasma cell differentiation. Genes
503
with subscript “H” indicate scRNA plasma cell markers annotated in Human Protein Atlas
504
but not functionally characterized.
505
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
27
Discussion
506
The secretory pathway plays an essential role in shaping the overall function of the
507
cell. Disruptions in this pathway are associated with various pathological conditions1114.
508
Furthermore, in biotechnology, the secretory pathway is harnessed to manufacture
509
diverse recombinant proteins7. Optimizing protein secretion in host cells, particularly
510
mammalian cells, is crucial for enhancing production yield and quality8,9. Thus, a better
511
understanding of the processes and components within the secretory pathway is critical
512
for advancing both therapeutic protein production and the development of personalized
513
therapies.
514
Reconstructions of the secretory pathway provide comprehensive representations
515
of secretory profiles under specific conditions. Building these models efficiently can
516
support new biomedical and biotechnological applications. However, the accuracy of
517
these reconstructions relies on the depth of annotated data and the quality of the curation
518
of the network. Existing secretory pathway reconstructions1,17,18 provide insightful
519
charters however they vary in depth–such as the level of annotations and hierarchical
520
organization–and coverage, including the number of secretory pathway components.
521
Some integrate protein-protein interaction information, while others provide details on
522
protein complexes. However, none currently provide a comprehensive annotation
523
encompassing all these features in a single framework.
524
Addressing the need for a more detailed, unified, and hierarchically organized
525
model, we introduce secRecon, a comprehensive map of the mammalian secretory
526
pathway, comprising 1127 functionally annotated genes. The reconstruction captures the
527
functional roles of individual genes within specific secretory processes and their spatial
528
localization across cellular compartments, providing a detailed map of where secretory
529
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
28
machinery operates within the cell. This resource offers mechanistic insights by
530
annotating PPI networks, which can reveal how the pathway machinery collaborates to
531
carry out complex, multi-step processes required for protein synthesis and secretion. We
532
demonstrated here how it can help uncover both direct and indirect relationships between
533
proteins, in both spatially and temporally regulated manner, thus enabling more accurate
534
predictions of how disruptions in these networks may lead to altered secretory
535
phenotypes, disrupted signaling, and disease states. Moreover, both the functional (NDEx
536
UUID: 4e7b9729-722a-11ef-ad6c-005056ae3c32) and PPI topology networks (NDEx
537
UUID: efb15b63-722c-11ef-ad6c-005056ae3c32) are publicly available in the NDEx
538
repository, allowing broader access for researchers to explore and utilize these networks.
539
Applying secRecon to a multi-omics study of diverse antibody-secreting cell lines
540
–including CHO cells, murine, and human plasma cells–we explored the topology of the
541
secretory pathway using RNA-Seq and proteomic data. Despite species-specific
542
differences at the individual gene expression and protein abundance levels, we found that
543
the individual regulation and activity of secretory machinery ultimately converge at the
544
pathway level. By identifying and analyzing upregulated secretory subprocesses in
545
plasma cells within the context of protein-protein interactions, we revealed tightly
546
clustered secretory machinery genes central to the global secretory interaction topology,
547
suggesting their potential roles in regulating plasma cell protein secretion efficiency.
548
Additionally, the enriched subprocesses in plasma cells allowed us to narrow down a list
549
of 23 genes consistent in fold-change magnitudes at both the transcript and protein levels.
550
Further investigation of these specific processes and genes could guide efforts to improve
551
the scalability and efficiency of producing complex biologics, to meet the growing
552
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
29
demands of the biotherapeutics industry30,56. Moreover, the framework could also support
553
future studies into how the secretory pathway dynamics impact diseases related to protein
554
secretion and cell-cell communication.
555
We then applied secRecon to probe antibody-secreting plasma cells at single-cell
556
resolution. By linking single-cell IgG secretion with transcriptomic profiles for human
557
plasma cells, we leveraged secRecon to shed light on secretory processes and
558
machinery associated with IgG secretion heterogeneity and plasma cell differentiation.
559
This analysis identified specific subprocesses and genes within UPR, ERAD, vesicle
560
trafficking, and protein folding associated with the heterogeneity in IgG secretion, while
561
further resolving cluster-specific secretory phenotypes within the bulk population.
562
Additionally, we uncovered novel insights into the secretory machinery involved in plasma
563
cell differentiation. Although existing studies have identified broad secretory hallmarks
564
associated with early and mature plasma cell differentiation4042,57,58, system to gene-level
565
mechanistic understanding of the complex secretory process remains largely unresolved.
566
Notably, a substantial portion of variation along the single-cell differentiation trajectory
567
could be explained by secRecon subprocess activity, including cytoskeletal remodeling,
568
ER preemptive quality control, post-Golgi vesicle trafficking, and glycosaminoglycan post-
569
translational modifications. These processes are aligned with findings that the ER and
570
Golgi systems are subjected to extensive remodeling to prime B cells for a highly stressful,
571
antibody-secreting state32,41,42,57,59 while the use of secRecon ontology and Dominance
572
Analysis is able to quantify the distribution of specific, granular processes underlying this
573
trajectory of dynamic cellular remodeling. secRecon gene-level correlations further
574
revealed targets previously identified by the community alongside a substantial number
575
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
30
of novel candidates strongly associated with plasma cell differentiation. As we were able
576
to identify groups of coordinated secretory machinery signatures among a highly
577
heterogeneous population of plasma cells, it would be valuable to further investigate what
578
and how cell-cell interactions might drive this continuum of states and phenotypes, and
579
to examine whether these correlated secretory machinery patterns are regulated by such
580
signaling. Altogether, we illustrate secRecon's ability to contextualize multi-omics data at
581
multiple resolutions, from secretory systems to individual genes. These analyses
582
underscore the value of secRecon in revealing novel insights into the distinct secretory
583
wiring of different mammalian cell types and potential to unravel the complexity of
584
secretory pathways across diverse biological contexts.
585
While this work established a robust framework to map the complex secretory
586
network, there remain numerous opportunities for further exploration and enhancement
587
of secRecon. One notable opportunity for future exploration is the integration of
588
transcription factors (TFs), which function as regulators of processes within the secretory
589
pathway. To lay the groundwork for future incorporation, we conducted a transcription
590
factor enrichment analysis using our list of curated secRecon machinery (Supplementary
591
Table S4-S6). Moreover, secRecon is positioned to evolve into a comprehensive
592
resource for reconstructing secretory pathway reactions. By expanding on the Genome-
593
Scale Model (GEM) framework proposed previously1 and integrating directional topology
594
from established pathway databases such as KEGG, secRecon can facilitate the
595
development of more sophisticated mechanistic models. Our comparative analysis of
596
secretory topologies in plasma and CHO cells revealed that, although secretory
597
subprocess activities are largely conserved, the regulation and wiring of individual genes
598
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
31
and proteins may vary across species and down to the level of single-cell resolved
599
clusters. Such consideration can be important when pathway databases (e.g., KEGG,
600
IPA) are used to contextualize omics data and output actionable hits, as resources are
601
often based on model organisms and may miss species- or cell line-specific nuances in
602
secretory regulation. This highlights a potential need for hybrid models that overlay multi-
603
omics data onto protein-protein interaction (PPI) networks to gain deeper, species-
604
specific insights into secretory regulation6062.
605
Additionally, we recognize potential inherent biases in ontologies and
606
knowledgebases that rely on manual curation and public literature. Indeed, genes with
607
higher global expression levels or localized to specific organelles are more frequently
608
studied and better characterized. This can confound community-wide functional
609
understanding and experimental design63. This systematic bias is a concern across many
610
public network constructions and should be considered when using and interpreting
611
ontologies like secRecon that rely on published studies. While we explored this and did
612
not find any strong associations of secRecon curated metrics with the transcriptomic
613
profiles from the Genotype-Tissue expression project64,65 (GTEx) (Supplementary
614
Figure 7), this does not rule out the presence of study and community bias. AI tools and
615
LLMs could help mitigate these biases by systematically analyzing large amounts of data
616
to identify under-studied genes of phenotypic importances; this can help provide more
617
balanced functional annotations. Future work to enhance the quality of such
618
knowledgebases should further investigate and address such biases, e.g., by
619
experimentally validating groups of secretory machinery functions or interactions using
620
well-defined experiments that span diverse cell types.
621
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
32
Materials and Methods
622
Manual curation of mammalian secretory pathway genes
623
A preliminary list of secretory genes was drafted based on previous
624
reconstructions from Feizi et al.18, Lund et al.17, and Gutierrez et al1. Genes involved in
625
glycosylation were added to the list based on their annotation in the GlycoGene DataBase
626
(GGDB)20. Each gene was manually removed or linked to one or more processes in our
627
secRecon ontology based on literature surveys. Literature-based relevance scores
628
ranging from poor to strong association to secRecon terms were assigned as follows: (1)
629
loose association with secretory pathway, (2) localized to secretory pathway with
630
probable evidence but missing functional information, (3) evidence supporting
631
association/interaction with core secretory machinery, (4) concrete evidence of functional
632
role in the secretory pathway well supported and characterized in literature. Based on the
633
identified literature evidence and curated secRecon ontology term annotations, a
634
comprehensive description of the secretory functional role was manually generated for
635
each gene.
636
The secRecon ontology was manually defined and organized based on extensive
637
literature research of the secretory pathway, resulting in 77 distinct terms
638
(Supplementary Table S2, Supplementary Material S7). Each was categorized under
639
five primary secretory systems: translocation, protein conformation, post-translational
640
modifications, proteostasis, and vesicle trafficking. Each system is further subdivided into
641
subsystems, processes, and subprocesses, reflecting the nested and interconnected
642
organization of the secretory pathway.
643
644
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
33
Ortholog mapping and integration of database annotations
645
Various databases were used to further annotate the secretory pathway genes. A
646
custom Python script (See Supplementary Material; Feature_Extraction.ipynb) was
647
used to map the default human gene symbol (HGNC) to: i) aliases, ii) Ensembl IDs, iii)
648
Entrez IDs, iv) Gene names, and v) UniProt IDs. CHO and Mouse orthologs were mapped
649
using the Entrez API66 and the Human Entrez IDs as input. Subcellular localization was
650
mapped using a two-step process: localization was primarily assigned using organelle
651
immunoprecipitation data generated in Hein et al.24 genes absent from this dataset were
652
then mapped according to subcellular localizations annotated in UniProt using the
653
UniProt IDs as input. Protein complex information was acquired from the CORUM21
654
database and filtered for complexes that contain other secRecon partners or contain a
655
secretory relevant functional description. Interaction partners for each gene were
656
retrieved from the STRING database (See Supplementary Material;
657
PPI_Ontology_Network.ipynb).
658
659
Graph-based representation of secRecon
660
To construct network representations of both the functional topology and protein-
661
protein interaction (PPI) networks of secRecon, the NetworkX67 Python library was used.
662
For the functional topology, each gene is represented as a node in a graph "G". The
663
module iterates over each pair of genes, calculating the number of shared secRecon
664
ontology terms, which includes respective parent systems, subsystems and processes,
665
and whether they share any protein complexes. If a gene lacks associated complexes,
666
the shared complex count is set to zero. Similarly, if no shared terms exist, the shared
667
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
34
process count is set to zero. An edge is added between each pair of genes, with the edge
668
weight determined by the sum of shared processes and complexes. For visualization,
669
only edges with a count greater than 1 are displayed in the plot. The spatial arrangement
670
of nodes is determined using the Fruchterman-Reingold force-directed algorithm25 or
671
'spring layout', which positions genes with similar processes and complexes closer
672
together.
673
The PPI network was obtained from STRING database23. Gene pairs were filtered
674
based on interaction confidence scores, retaining only high-confidence interactions in the
675
final network. A graph "G" was again generated using NetworkX, where each node
676
represents a gene, and edges correspond to known PPIs between the proteins encoded
677
by these genes. STRING interaction scores were used as weights on the edges. For
678
visualization, the spatial arrangement of nodes was computed using the 'spring layout'
679
algorithm to cluster proteins based on interaction strength. Nodes were colored according
680
to either their involvement in secretory subsystems (e.g., vesicle trafficking, post-
681
translational modifications) or their subcellular localization (e.g., ER, Golgi apparatus).
682
In both networks, nodes are colored consistently according to either their
683
involvement in secretory subsystems (e.g., vesicle trafficking, post-translational
684
modifications) or their subcellular localization (e.g., ER, Golgi apparatus). Each node is
685
displayed as a pie chart, with segments color-coded to represent the gene’s subcellular
686
localization or secRecon systems.
687
688
Community detection analysis to investigate secRecon network topology
689
690
To investigate the organizational principles of secRecon, Louvain community
691
detection68 was performed on the two different network topologies aforementioned. For
692
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
35
each network, Louvain community detection was carried out across a range of resolution
693
parameters, ranging from 0.5 to 2.5 in increments of approximately 0.1 to identify optimal
694
community partitions. Three metrics were used to assess the quality and biological
695
relevance of the communities detected: (1) Modularity: A measure of the quality of the
696
partitioning, indicating how well-separated the communities are within the network. (2)
697
Normalized Mutual Information (NMI) for subcellular localization: Quantifying the
698
agreement between the detected communities and the known subcellular localization
699
annotations. (3) NMI for system categories: Quantifying the alignment of the detected
700
communities with system-level biological functions, as annotated in secRecon.
701
To determine the optimal resolution that balances high modularity with biologically
702
meaningful community structure, a trade-off analysis was conducted using the following
703
approach: (1) Normalization of Metrics: The modularity scores and NMI scores for both
704
subcellular localization and system categories were normalized to a common scale using
705
min-max normalization.
706
𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑+𝑆𝑐𝑜𝑟𝑒!+ = +𝑆𝑐𝑜𝑟𝑒! +𝑚𝑖𝑛(𝑆𝑐𝑜𝑟𝑒𝑠)
𝑚𝑎𝑥(𝑆𝑐𝑜𝑟𝑒𝑠) +− +𝑚𝑖𝑛(𝑆𝑐𝑜𝑟𝑒𝑠)
707
(2) Combined Score Calculation: For each resolution parameter, a combined score
708
was calculated by summing the normalized modularity and NMI scores.
709
!"#$%&'()*+",' -.",#/0%1'()2"(30/,%45 6.",#/0%1'().27)89"+/0%1/4%"&: 6 .",#/0%1'().27)8*5;4'#;:
710
(3) Identification of Optimal Resolution: The resolution parameter corresponding
711
to the maximum combined score was identified as the optimal resolution. This resolution
712
represents the best trade-off point where the network exhibits high modularity while
713
maintaining strong alignment with biological annotations.
714
715
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
36
Visualization of Secretory Pathway Ontology
716
A sunburst plot was generated using plotly to visualize the hierarchical structure of
717
the secretory pathway's ontology described above. The dataset containing the ontology
718
of the secretory pathway was organized by System, Subsystem, Process, and
719
Subprocess. Each level in the ontology was used as a path to structure the sunburst plot,
720
with colors representing the different systems. To visualize the overlap of genes involved
721
in multiple secretory pathway Systems, an UpSet plot69 was generated using the
722
upsetplot Python library. A dictionary containing gene annotations across the secretory
723
systems (e.g., "Protein conformation", "Translocation", "Post-translational modifications",
724
"Vesicle trafficking", "Proteostasis") was used to create a binary matrix indicating the
725
presence of genes in each system. The occurrence of different combinations of systems
726
was counted and visualized as an UpSet plot.
727
728
Comparison of secRecon coverage and annotations against currently available
729
reconstructions
730
To compare secRecon with other available reconstructions, secretory pathway
731
genes from existing reconstructions, including Feizi et al.18, Lund et al.17, and Gutierrez
732
et al.1, were mapped to the genes in secRecon. Using the secRecon gene list, mappings
733
between gene identifiers (Human Entrez IDs) and other gene identifiers used in these
734
reconstructions (such as ENSEMBL IDs and CHO Entrez IDs) were created to ensure
735
accurate comparisons. A venn diagram was used to visualize the overlap between the
736
gene sets from these reconstructions, allowing quantification of the shared and unique
737
genes across the different reconstructions.
738
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
37
739
Multi-omic analysis of CHO and plasma cells using secRecon
740
Normalized transcriptomic and proteomic data33 were subset for secRecon genes
741
and proteins. Spearman correlation was quantified between all samples for secRecon
742
genes and proteins. Gene Set Variation Analysis35 (GSVA) quantifying secRecon
743
subsystem activity was conducted using the complete transcriptomic and proteomic data.
744
Cosine similarity of gene expression, protein abundance, and respective GSVA scores
745
was calculated for all 6 cell lines per species (CHO, murine, human for transcriptomic
746
data, CHO and murine for proteomic data). Hierarchical clustering using average linkage
747
was performed for the cosine similarity matrices to create respective species
748
dendrograms. Baker’s gamma index34 to compare subtree clustering topology was
749
calculated using the R dendextend package. Correlation plots were generated in R using
750
corrplot.
751
For network analysis, both the functional and PPI topology networks were filtered
752
to include only genes present in the transcriptomic and proteomic datasets. The networks
753
were further annotated by overlaying gene expression levels onto the network nodes.
754
Node sizes were scaled according to log2 fold-change in gene expression, enabling a
755
visual representation of expression differences between CHO and plasma cells.
756
757
Single-cell SEC-seq analysis of human ASCs
758
Single-cell gene expression and IgG secretion data were preprocessed,
759
normalized, and clustered using scanpy as previously published47. Geneset lists were
760
defined according to secRecon ontology or GO-BP53 genesets for additional biological
761
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
38
processes. secRecon subgraph of protein-protein interaction (PPI) network PCNet70 was
762
created by filtering for edges in the total PCNet graph (NDEx UUID: 4de852d9-9908-
763
11e9-bcaf-0ac135e8bacf) that connect 2 gene nodes within secRecon. Single-cell
764
secRecon geneset average expression was calculated using scanpy FeatureExpression
765
and PPI activity was quantified using ORIGINS249 per geneset edgelist. Dominance
766
Analysis50,51 was performed using dominance-analysis python package with default
767
multiple linear regression parameters. Average expression scores or PPI activity scores
768
per secRecon geneset were used as input features and normalized secreted IgG counts
769
or pseudotime were used as response variables in Dominance Analysis. secRecon
770
subsystems and genes correlating with secreted IgG or pseudotime were identified using
771
Spearman’s rho correlation (|r| > 0.1) and Benjamini-Hochberg multiple test correction
772
(BH p-value < 0.05).
773
To perform topology-based analysis of distinct Leiden clusters, the normalized
774
expression data matrix was subset for differentially expressed secRecon genes between
775
the IgG secreting population clusters. Pseudobulk groups per cluster were then generated
776
by splitting each cluster into random groups of cells and averaging expression of each
777
gene per pseudobulk group. Similar to the previous topological analyses for the bulk
778
transcriptome and proteome CHO and plasma cells dataset, pairwise correlation of gene-
779
expression and Baker’s gamma index of gene expression dendrograms were performed.
780
781
Transcription factor enrichment for secRecon genes
782
Transcription factors regulating secRecon genes were enriched using Ingenuity Pathway
783
Analysis71 (IPA), ChEA372, and the Lund et al. secretory reconstruction17. Human gene
784
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
39
symbols for all 1127 secRecon genes were input to IPA and ChEA3 using default
785
parameters. Enriched transcription factors from IPA Upstream Regulator analysis were
786
identified by filtering for “transcriptional regulator” molecule type and adjusted p-value <
787
0.05. Enriched transcription factors from ChEA3 analysis were identified by selecting the
788
top 100 ranked factors. As Lund et al. included DNA and protein interaction neighbors in
789
their secretory reconstruction, we intersected the GO:000370053 DNA-binding
790
transcription factor activity gene set list to filter for transcription factors associated with
791
secretory machinery. The three lists are provided as tables with annotations from the
792
respective databases (Supplementary Tables S4-S6).
793
794
Quantifying potential biases in secRecon scoring
795
RNA-seq TPM expression data for 36 different tissues originating from the GTEx
796
Consortium64,65 was obtained from the Human Protein Atlas (HPA) database. Pairwise
797
Spearman correlations were performed for individual secRecon gene expression
798
averaged for each tissue type against various metrics from the secRecon curation
799
(Supplementary Table S1), including the number of annotated secRecon ontology terms
800
(counting all associated parent terms), max annotated relevance score, mean relevance
801
score.
802
Acknowledgements
803
This work was supported by generous funding from NIGMS (R35 GM119850),
804
NSF (CBET-2030039), Novo Nordisk Foundation (NNF20SA0066621), and Sartorius
805
Stedim.
806
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
40
Author Contributions
807
H.O.M., J.T., P.D.G., and N.E.L. led the network reconstruction, analyzed data,
808
and wrote the manuscript. J.T. and P.D.G. created the figures and visualizations for the
809
manuscript. I.S., H.B., M.S., C.M.R., C.C.K., N.K., S.S., A.G., Z.L., and A.M. contributed
810
to the network ontology and curation. A.A. and A.R. participated in the network
811
annotation. A.R. and N.E.L. reviewed and edited the manuscript. N.E.L. supervised and
812
secured funding for the project. All authors read and approved the manuscript.
813
Declaration of generative AI and AI-assisted technologies in the writing
814
process
815
During the preparation of this work the authors used ChatGPT-4 in a limited
816
manner to improve the readability and language of the manuscript. After using this tool,
817
the authors reviewed and edited the content. The authors take full responsibility for the
818
content of the published article.
819
Declaration of Interests
820
H.M. is an employee of Eli Lilly and Company. J.T. is an employee of Amgen, Inc.
821
A.R and A.A are employees of Sartorius. N.E.L is a co-founder of Augment Biologics, Inc.
822
and NeuImmune, Inc. and a board member for CHO Plus, Inc. The remaining authors
823
declare no competing interests.
824
825
Data and Code Availability
826
All data and code supporting the findings of this study are publicly available. The
827
secRecon knowledgebase, annotations and scripts necessary to reproduce the analyses
828
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
41
and figures presented in this manuscript are available in the GitHub repository at
829
https://github.com/LewisLabUCSD/secRecon-Secretory-Pathway-Reconstruction.
830
Additionally, the supplementary data accompanying the code are hosted on Synapse.org
831
and can be accessed via the DOI: https://doi.org/10.7303/syn64026567.
832
833
834
References
835
1. Gutierrez, J. M. et al. Genome-scale reconstructions of the mammalian secretory pathway
836
predict metabolic costs and limitations of protein secretion. Nat. Commun. 11, 68 (2020).
837
2. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347,
838
1260419 (2015).
839
3. Kondylis, V., Pizette, S. & Rabouille, C. The early secretory pathway in development: a tale
840
of proteins and mRNAs. Semin. Cell Dev. Biol. 20, 817827 (2009).
841
4. Vázquez-Martínez, R. et al. Revisiting the regulated secretory pathway: from frogs to
842
human. Gen. Comp. Endocrinol. 175, 19 (2012).
843
5. Stefan, C. J. et al. Membrane dynamics and organelle biogenesis-lipid pipelines and
844
vesicular carriers. BMC Biol. 15, 102 (2017).
845
6. Barlowe, C. K. & Miller, E. A. Secretory protein biogenesis and traffic in the early secretory
846
pathway. Genetics 193, 383410 (2013).
847
7. Hou, J., Tyo, K., Liu, Z., Petranovic, D. & Nielsen, J. Engineering of vesicle trafficking
848
improves heterologous protein secretion in Saccharomyces cerevisiae. Metab. Eng. 14,
849
120127 (2012).
850
8. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control.
851
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
42
Nature 473, 337342 (2011).
852
9. Nielsen, J. Production of biopharmaceutical proteins by yeast: advances through metabolic
853
engineering. Bioengineered 4, 207211 (2013).
854
10. Keller, M., Rüegg, A., Werner, S. & Beer, H.-D. Active caspase-1 is a regulator of
855
unconventional protein secretion. Cell 132, 818831 (2008).
856
11. Hansen, U. et al. A secreted variant of cartilage oligomeric matrix protein carrying a
857
chondrodysplasia-causing mutation (p.H587R) disrupts collagen fibrillogenesis. Arthritis
858
Rheum. 63, 159167 (2011).
859
12. Huber, R. J. Altered protein secretion in Batten disease. Dis. Model. Mech. 14, (2021).
860
13. Lu, P. J. et al. Mutations alter secretion of fukutin-related protein. Biochim. Biophys. Acta
861
1802, 253258 (2010).
862
14. Braakman, I. & Bulleid, N. J. Protein folding and modification in the mammalian
863
endoplasmic reticulum. Annu. Rev. Biochem. 80, 7199 (2011).
864
15. Guerriero, C. J. & Brodsky, J. L. The delicate balance between secreted protein folding and
865
endoplasmic reticulum-associated degradation in human physiology. Physiol. Rev. 92,
866
537576 (2012).
867
16. Li, F. et al. Improving recombinant protein production by yeast through genome-scale
868
modeling using proteome constraints. Nat. Commun. 13, 2969 (2022).
869
17. Lund, A. M. et al. Network reconstruction of the mouse secretory pathway applied on CHO
870
cell transcriptome data. BMC Syst. Biol. 11, 37 (2017).
871
18. Feizi, A., Gatto, F., Uhlen, M. & Nielsen, J. Human protein secretory pathway genes are
872
expressed in a tissue-specific pattern to match processing demands of the secretome. NPJ
873
Syst Biol Appl 3, 22 (2017).
874
19. Feizi, A., Österlund, T., Petranovic, D., Bordel, S. & Nielsen, J. Genome-scale modeling of
875
the protein secretory machinery in yeast. PLoS One 8, e63284 (2013).
876
20. Narimatsu, H. Construction of a human glycogene library and comprehensive functional
877
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
43
analysis. Glycoconj. J. 21, 1724 (2004).
878
21. Giurgiu, M. et al. CORUM: the comprehensive resource of mammalian protein complexes
879
2019. Nucleic Acids Res 47, D559D563 (2018).
880
22. The UniProt Consortium et al. UniProt: the Universal Protein Knowledgebase in 2023.
881
Nucleic Acids Res 51, D523D531 (2022).
882
23. Szklarczyk, D. et al. The STRING database in 2023: protein-protein association networks
883
and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids
884
Res 51, D638D646 (2023).
885
24. Hein, M. Y. et al. Global organelle profiling reveals subcellular localization and remodeling
886
at proteome scale. bioRxiv 2023.12.18.572249 (2023) doi:10.1101/2023.12.18.572249.
887
25. Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw.
888
Pract. Exp. 21, 11291164 (1991).
889
26. Morre, D. J. & Mollenhauer, H. H. The Golgi Apparatus: The First 100 Years. (Springer,
890
New York, NY, 2009). doi:10.1007/978-0-387-74347-9.
891
27. Finley, D. Recognition and processing of ubiquitin-protein conjugates by the proteasome.
892
Annu. Rev. Biochem. 78, 477513 (2009).
893
28. Berggård, T., Linse, S. & James, P. Methods for the detection and analysis of protein-
894
protein interactions. Proteomics 7, 28332842 (2007).
895
29. Bensimon, A., Heck, A. J. R. & Aebersold, R. Mass spectrometry-based proteomics and
896
network biology. Annu. Rev. Biochem. 81, 379405 (2012).
897
30. Walsh, G. Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 36, 11361145 (2018).
898
31. Park, S.-Y. et al. Driving towards digital biomanufacturing by CHO genome-scale models.
899
Trends Biotechnol. 0, (2024).
900
32. Cenci, S. & Sitia, R. Managing and exploiting stress in the antibody factory. FEBS Lett. 581,
901
36523657 (2007).
902
33. Raab, N. et al. Nature as blueprint: Global phenotype engineering of CHO production cells
903
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
44
based on a multi-omics comparison with plasma cells. Metab. Eng. 83, 110122 (2024).
904
34. Baker, F. B. Stability of two hierarchical grouping techniques case I: Sensitivity to data
905
errors. J. Am. Stat. Assoc. 69, 440445 (1974).
906
35. Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for
907
microarray and RNA-Seq data. BMC Bioinformatics 14, 115 (2013).
908
36. Moremen, K. W., Tiemeyer, M. & Nairn, A. V. Vertebrate protein glycosylation: diversity,
909
synthesis and function. Nat. Rev. Mol. Cell Biol. 13, 448462 (2012).
910
37. Breton, C., Šnajdrová, L., Jeanneau, C., Koča, J. & Imberty, A. Structures and mechanisms
911
of glycosyltransferases. Glycobiology 16, 29R37R (2005).
912
38. Hetz, C., Zhang, K. & Kaufman, R. J. Mechanisms, regulation and functions of the unfolded
913
protein response. Nat. Rev. Mol. Cell Biol. 21, 421438 (2020).
914
39. Rapoport, T. A., Li, L. & Park, E. Structural and Mechanistic Insights into Protein
915
Translocation. Annu. Rev. Cell Dev. Biol. 33, (2017).
916
40. Shapiro-Shelef, M. & Calame, K. Regulation of plasma-cell development. Nat. Rev.
917
Immunol. 5, 230242 (2005).
918
41. Shi, W. et al. Transcriptional profiling of mouse B cell terminal differentiation defines a
919
signature for antibody-secreting plasma cells. Nat. Immunol. 16, 663673 (2015).
920
42. Gaudette, B. T., Jones, D. D., Bortnick, A., Argon, Y. & Allman, D. mTORC1 coordinates an
921
immediate unfolded protein response-related transcriptome in activated B cells preceding
922
antibody secretion. Nat. Commun. 11, 723 (2020).
923
43. King, H. W. et al. Single-cell analysis of human B cell maturation predicts how antibody
924
class switching shapes selection dynamics. Sci Immunol 6, (2021).
925
44. Duan, M. et al. Understanding heterogeneity of human bone marrow plasma cell maturation
926
and survival pathways by single-cell analyses. Cell Rep. 42, 112682 (2023).
927
45. Trezise, S. et al. An arrayed CRISPR screen of primary B cells reveals the essential
928
elements of the antibody secretion pathway. Front. Immunol. 14, 1089243 (2023).
929
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
45
46. Xiong, E. et al. A CRISPR/Cas9-mediated screen identifies determinants of early plasma
930
cell differentiation. Front. Immunol. 13, 1083119 (2022).
931
47. Cheng, R. Y.-H. et al. SEC-seq: association of molecular signatures with antibody secretion
932
in thousands of single human plasma cells. Nat. Commun. 14, 3567 (2023).
933
48. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 35733587.e29
934
(2021).
935
49. Senra, D., Guisoni, N. & Diambra, L. Cell annotation using scRNA-seq data: A protein-
936
protein interaction network approach. MethodsX 10, 102179 (2023).
937
50. Luo, W. & Azen, R. Determining Predictor Importance in Hierarchical Linear Models Using
938
Dominance Analysis. J. Educ. Behav. Stat. 38, 331 (2013).
939
51. Budescu, D. V. Dominance analysis: A new approach to the problem of relative importance
940
of predictors in multiple regression. Psychol. Bull. 113, 542551 (1993).
941
52. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology
942
Consortium. Nat. Genet. 25, 2529 (2000).
943
53. Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics
944
224, (2023).
945
54. Bertolotti, M. et al. B- to plasma-cell terminal differentiation entails oxidative stress and
946
profound reshaping of the antioxidant responses. Antioxid. Redox Signal. 13, 11331144
947
(2010).
948
55. Preisendörfer, S. et al. FK506-Binding Protein 11 Is a Novel Plasma Cell-Specific Antibody
949
Folding Catalyst with Increased Expression in Idiopathic Pulmonary Fibrosis. Cells 11,
950
(2022).
951
56. Kunert, R. & Reinhart, D. Advances in recombinant antibody manufacturing. Appl.
952
Microbiol. Biotechnol. 100, 34513461 (2016).
953
57. Reimold, A. M. et al. Plasma cell differentiation requires the transcription factor XBP-1.
954
Nature 412, 300307 (2001).
955
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
46
58. Nutt, S. L., Hodgkin, P. D., Tarlinton, D. M. & Corcoran, L. M. The generation of antibody-
956
secreting plasma cells. Nature Reviews Immunology 15, 160171 (2015).
957
59. van Anken, E. & Braakman, I. Endoplasmic reticulum stress and the making of a
958
professional secretory cell. Crit. Rev. Biochem. Mol. Biol. 40, 269283 (2005).
959
60. Schinn, S.-M., Morrison, C., Wei, W., Zhang, L. & Lewis, N. E. A genome-scale metabolic
960
network model and machine learning predict amino acid concentrations in Chinese
961
Hamster Ovary cell cultures. Biotechnol. Bioeng. 118, 21182123 (2021).
962
61. Systematic evaluation of parameters for genome-scale metabolic models of cultured
963
mammalian cells. Metab. Eng. 66, 2130 (2021).
964
62. Gopalakrishnan, S. et al. COSMIC-dFBA: A novel multi-scale hybrid framework for
965
bioprocess modeling. Metab. Eng. 82, 183192 (2024).
966
63. Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402
967
408 (2020).
968
64. GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human
969
tissues. Science 369, 13181330 (2020).
970
65. GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature
971
550, 204213 (2017).
972
66. Sayers, E. A General Introduction to the E-utilities. in Entrez Programming Utilities Help
973
[Internet] (National Center for Biotechnology Information (US), 2022).
974
67. Exploring Network Structure, Dynamics, and Function Using Networkx. (2008).
975
68. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-
976
connected communities. Sci Rep 9, 5233 (2019).
977
69. Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: Visualization of
978
intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 19831992 (2014).
979
70. Huang, J. K. et al. Systematic Evaluation of Molecular Networks for Discovery of Disease
980
Genes. Cell Syst 6, 484495.e5 (2018).
981
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
47
71. Krämer, A., Green, J., Pollard, J., Jr & Tugendreich, S. Causal analysis approaches in
982
Ingenuity Pathway Analysis. Bioinformatics 30, 523530 (2014).
983
72. Keenan, A. B. et al. ChEA3: transcription factor enrichment analysis by orthogonal omics
984
integration. Nucleic Acids Res. 47, W212W224 (2019).
985
986
987
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
48
Supplementary Figures
988
989
Supplementary Figure 1. secRecon is a manually curated knowledgebase of the
990
mammalian secretory pathway. (A) Previous secretory pathway reconstructions
991
(Feizi18 N=575, Lund17 N=793 and Gutierrez1 N=271) were integrated and more
992
extensively annotated to build secRecon. In addition, genes involved in glycosylation
993
were added from GGDB20. (B) The draft was subsequently refined and expanded via
994
manual curation. Each individual gene was assigned to one or more specific processes
995
within our secretory pathway ontology along with its relevance score and functional
996
description backed by literature survey. (C) Additional annotations were integrated from
997
various databases, including mapping gene symbols to aliases, Ensembl IDs, Entrez IDs,
998
gene names, and UniProt IDs, and identifying orthologs, subcellular localizations, protein
999
complexes, and interaction partners (See Materials and Methods).
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
49
Supplementary Figure 2. Elbow Plot Analysis of Community Detection in
1010
Functional and PPI Network Topologies: Elbow plots illustrating the results of Louvain
1011
community detection across a range of resolution parameters for (A) the functional
1012
topology network and (B) the protein-protein interaction (PPI) topology network. For each
1013
network, three metrics are plotted against the resolution parameter: modularity (blue line),
1014
normalized mutual information (NMI) with subcellular localization categories (red line),
1015
and NMI with secRecon system categories (green line). Modularity measures the strength
1016
of the community structure in the network, with higher values indicating more defined
1017
communities. The NMI scores quantify how well the detected community structures align
1018
with known biological attributes—either subcellular localization or functional system
1019
categories.
1020
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
50
1021
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
51
Supplementary Figure 3. Network-based Proteomics Analysis of Plasma Cells vs
1022
CHO Cells: Log2 fold changes in protein levels between plasma and CHO cells were
1023
calculated and proteins from the proteomics dataset were overlaid with secRecon genes.
1024
Differentially expressed proteins were visualized in separate network plots for
1025
upregulated (A and C) and downregulated (B and D) proteins. Panels A and B display
1026
protein levels in a network layout based on the secRecon system ontology, where nodes
1027
are colored according to the major secretory pathway systems (left), or subcellular
1028
localization (right) they belong to. Panels C and D show the same sets of proteins plotted
1029
in a network representation based on protein-protein interactions, with node colors
1030
according to the major secretory pathway systems (left), or subcellular localization (right)
1031
they belong to. The size of each node corresponds to the magnitude of fold change in
1032
expression.
1033
1034
1035
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
52
1036
Supplementary Figure 4. Differentially enriched secRecon processes between
1037
plasma and CHO cells: Log fold-change of secretory machinery annotated under select
1038
processes significantly upregulated in both the transcript or protein enrichments were
1039
overlaid on PPI networks for (C) transcripts or (D) protein. The size of each node
1040
corresponds to the magnitude of fold change in expression in plasma vs CHO cells (bigger
1041
nodes indicate higher expression in plasma cells), while the color of each node indicates
1042
the process to which it belongs. Nodes associated with other secRecon processes are
1043
colored in gray.
1044
1045
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
53
1046
Supplementary Figure 5. Identifying differentially expressed secRecon genes in
1047
plasma cells for future engineering and characterization in CHO cells: Differentially
1048
expressed genes and differentially abundant proteins33 annotated within secRecon
1049
(N=112) were plotted according to respective log2-fold change in plasma cells relative to
1050
CHO cells. Genes are colored by differentially enriched secRecon processes identified in
1051
Figure 6 or otherwise colored gray. Boxed text labels indicate genes within these enriched
1052
processes that are similarly differentially expressed at both transcript and protein levels
1053
(e.g. positive fold-change above 1.5 or negative fold-change below -1.5 in both datasets)
1054
in plasma cells.
1055
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
54
1056
Supplementary Figure 6. Identifying secretory pathway signatures underlying
1057
single-cell IgG secretion in plasma cells: (A) SEC-seq links single-cell IgG secretion
1058
to transcriptome for a diverse population of B cells47 (B) Dominance Analysis identifies
1059
secRecon genesets and biological processes explaining the variation in single-cell IgG
1060
secretion were identified for the IgG secreting population. (top) secRecon genesets
1061
explain 10.9% variation in secreted IgG concentration while (bottom) additional cellular
1062
processes such as IgG gene transcription, cytoplasmic translation (GO-BP), and oxidative
1063
phosphorylation (GO-BP) increase the amount of variation explained for secreted IgG.
1064
(C) Fold-change in relative importance of these additional processes relative to the
1065
secRecon geneset activity in predicted IgG concentration. (D) secRecon subsystem
1066
activity and (E) gene expression were correlated with secreted IgG, with the highest
1067
correlates labeled.
1068
1069
1070
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
55
1071
Supplementary Figure 7. Identifying potential biases in secRecon curation and
1072
process scoring with physiological gene expression: secRecon gene expression was
1073
averaged for each tissue type in the Genotype-Tissue Expression (GTEx) Project and
1074
correlated against secRecon features: max secRecon process confidence score, mean
1075
secRecon process confidence score, and number of annotated secRecon processes.
1076
Spearman correlation coefficients of all pairwise tests are compiled in a heatmap.
1077
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 15, 2024. ; https://doi.org/10.1101/2024.11.14.623668doi: bioRxiv preprint
Article
The transcriptome of Chinese hamster ovary (CHO) cells plays a crucial role in determining cellular characteristics that are essential for biopharmaceutical applications. RNA-sequencing has been extensively used to profile gene expression patterns, aiming to gain a better understanding of intracellular behavior and mechanisms. Individual datasets, however, do not provide a comprehensive overview and characterization of the CHO cell's transcriptome, such that the fundamental structure of the transcriptome remains unknown. Using 15 RNA-sequencing datasets, encompassing almost 300 samples of various experimental setups, conditions and cell lines, we explore and classify the protein-coding transcriptome of CHO cells. Differences in cell line lineages are found to be the primary source of variation in transcribed genes. By employing a novel approach, we identified the core transcriptome that is ubiquitously expressed in all cell lines and culture conditions, as well as genes that remain entirely non-expressed. Additionally, we identified a set of genes that may be active or inactive depending on different conditions, which are linked to biological processes including translation as well as immune and stress response. Lastly, by integrating chromatin states derived from histone modifications, we provided additional context on the epigenetic level that supports our protein-coding gene classification. Our study offers a comprehensive insight into the CHO cell transcriptome and lays the foundation for future research into cellular adaptation to changing conditions and the development of phenotypes.
Article
Full-text available
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO—a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations—evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)—mechanistic models of molecular “pathways” (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
Article
Full-text available
The secreted products of cells drive many functions in vivo; however, methods to link this functional information to surface markers and transcriptomes have been lacking. By accumulating secretions close to secreting cells held within cavity-containing hydrogel nanovials, we demonstrate workflows to analyze the amount of IgG secreted from single human B cells and link this information to surface markers and transcriptomes from the same cells. Measurements using flow cytometry and imaging flow cytometry corroborate the association between IgG secretion and CD38/CD138. By using oligonucleotide-labeled antibodies we find that upregulation of pathways for protein localization to the endoplasmic reticulum and mitochondrial oxidative phosphorylation are most associated with high IgG secretion, and uncover surrogate plasma cell surface markers (e.g., CD59) defined by the ability to secrete IgG. Altogether, this method links quantity of secretion with single-cell sequencing (SEC-seq) and enables researchers to fully explore the links between genome and function, laying the foundation for discoveries in immunology, stem cell biology, and beyond.
Article
Full-text available
Pathway analysis is an important step in the interpretation of single cell transcriptomic data, as it provides powerful information to detect which cellular processes are active in each individual cell. We have recently developed a protein-protein interaction network-based framework to quantify pluripotency associated pathways from scRNA-seq data. On this occasion, we extend this approach to quantify the activity of a pathway associated with any biological process, or even any list of genes. A systems-level characterization of pathway activities across multiple cell types provides a broadly applicable tool for the analysis of pathways in both healthy and disease conditions. Dysregulated cellular functions are a hallmark of a wide spectrum of human disorders, including cancer and autoimmune diseases. Here, we illustrate our method by analyzing various biological processes in healthy and cancer breast samples. Using this approach we found that tumor breast cells, even when they form a single group in the UMAP space, keep diverse biological programs active in a differentiated manner within the cluster.•We implement a protein-protein interaction network-based approach to quantify the activity of different biological processes. •The methodology can be used for cell annotation in scRNA-seq studies and is freely available as R package.
Article
Full-text available
Introduction The differentiation of B cells into antibody-secreting plasma cells depends on cell division-coupled, epigenetic and other cellular processes that are incompletely understood. Methods We have developed a CRISPR/Cas9-based screen that models an early stage of T cell-dependent plasma cell differentiation and measures B cell survival or proliferation versus the formation of CD138+ plasmablasts. Here, we refined and extended this screen to more than 500 candidate genes that are highly expressed in plasma cells. Results Among known genes whose deletion preferentially or mostly affected plasmablast formation were the transcription factors Prdm1 (BLIMP1), Irf4 and Pou2af1 (OBF-1), and the Ern1 gene encoding IRE1a, while deletion of XBP1, the transcriptional master regulator that specifies the expansion of the secretory program in plasma cells, had no effect. Defective plasmablast formation caused by Ern1 deletion could not be rescued by the active, spliced form of XBP1 whose processing is dependent on and downstream of IRE1a, suggesting that in early plasma cell differentiation IRE1a acts independently of XBP1. Moreover, we newly identified several genes involved in NF-kB signaling (Nfkbia), vesicle trafficking (Arf4, Preb) and epigenetic regulators that form part of the NuRD complex (Hdac1, Mta2, Mbd2) to be required for plasmablast formation. Deletion of ARF4, a small GTPase required for COPI vesicle formation, impaired plasmablast formation and blocked antibody secretion. After Hdac1 deletion plasmablast differentiation was consistently reduced by about 50%, while deletion of the closely related Hdac2 gene had no effect. Hdac1 knock-out led to strongly perturbed protein expression of antagonistic transcription factors that govern plasma cell versus B cell identity (by decreasing IRF4 and BLIMP1 and increasing BACH2 and PAX5). Discussion Taken together, our results highlight specific and non-redundant roles for Ern1, Arf4 and Hdac1 in the early steps of plasma cell differentiation.
Article
Full-text available
Eukaryotic cells are used as cell factories to produce and secrete multitudes of recombinant pharmaceutical proteins, including several of the current top-selling drugs. Due to the essential role and complexity of the secretory pathway, improvement for recombinant protein production through metabolic engineering has traditionally been relatively ad-hoc; and a more systematic approach is required to generate novel design principles. Here, we present the proteome-constrained genome-scale protein secretory model of yeast Saccharomyces cerevisiae (pcSecYeast), which enables us to simulate and explain phenotypes caused by limited secretory capacity. We further apply the pcSecYeast model to predict overexpression targets for the production of several recombinant proteins. We experimentally validate many of the predicted targets for α-amylase production to demonstrate pcSecYeast application as a computational tool in guiding yeast engineering and improving recombinant protein production. Due to the complexity of the protein secretory pathway, strategy suitable for the production of a certain recombination protein cannot be generalized. Here, the authors construct a proteome-constrained genome-scale protein secretory model for yeast and show its application in the production of different misfolded or recombinant proteins.
Article
Full-text available
Antibodies are central effectors of the adaptive immune response, widespread used therapeutics, but also potentially disease-causing biomolecules. Antibody folding catalysts in the plasma cell are incompletely defined. Idiopathic pulmonary fibrosis (IPF) is a fatal chronic lung disease with increasingly recognized autoimmune features. We found elevated expression of FK506-binding protein 11 (FKBP11) in IPF lungs where FKBP11 specifically localized to antibody-producing plasma cells. Suggesting a general role in plasma cells, plasma cell-specific FKBP11 expression was equally observed in lymphatic tissues, and in vitro B cell to plasma cell differentiation was accompanied by induction of FKBP11 expression. Recombinant human FKBP11 was able to refold IgG antibody in vitro and inhibited by FK506, strongly supporting a function as antibody peptidyl-prolyl cis-trans isomerase. Induction of ER stress in cell lines demonstrated induction of FKBP11 in the context of the unfolded protein response in an X-box-binding protein 1 (XBP1)-dependent manner. While deficiency of FKBP11 increased susceptibility to ER stress-mediated cell death in an alveolar epithelial cell line, FKBP11 knockdown in an antibody-producing hybridoma cell line neither induced cell death nor decreased expression or secretion of IgG antibody. Similarly, antibody secretion by the same hybridoma cell line was not affected by knockdown of the established antibody peptidyl-prolyl isomerase cyclophilin B. The results are consistent with FKBP11 as a novel XBP1-regulated antibody peptidyl-prolyl cis-trans isomerase and indicate significant redundancy in the ER-resident folding machinery of antibody-producing hybridoma cells.
Article
Full-text available
The neuronal ceroid lipofuscinoses (NCLs), collectively known as Batten disease, are a group of neurological diseases that affect all ages and ethnicities worldwide. There are 13 different subtypes of NCL, each caused by a mutation in a distinct gene. The NCLs are characterized by the accumulation of undigestible lipids and proteins in various cell types. This leads to progressive neurodegeneration and clinical symptoms including vision loss, progressive motor and cognitive decline, seizures, and premature death. These diseases have commonly been characterized by lysosomal defects leading to the accumulation of undigestible material but further research on the NCLs suggests that altered protein secretion may also play an important role. This has been strengthened by recent work in biomedical model organisms, including Dictyostelium discoideum, mice, and sheep. Research in D. discoideum has reported the extracellular localization of some NCL-related proteins and the effects of NCL-related gene loss on protein secretion during unicellular growth and multicellular development. Aberrant protein secretion has also been observed in mammalian models of NCL, which has allowed examination of patient-derived cerebrospinal fluid and urine for potential diagnostic and prognostic biomarkers. Accumulated evidence links seven of the 13 known NCL-related genes to protein secretion, suggesting that altered secretion is a common hallmark of multiple NCL subtypes. This Review highlights the impact of altered protein secretion in the NCLs, identifies potential biomarkers of interest and suggests that future work in this area can provide new therapeutic insight.
Article
Full-text available
The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.
Article
Full-text available
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.