Using the Human Plasma PeptideAtlas to study human plasma proteins.
ABSTRACT PeptideAtlas is a web-accessible database of LC-MS/MS shotgun proteomics results from hundreds of experiments conducted in diverse laboratories, with all data processed via a uniform analysis pipeline. A total of 91 experiments on human plasma and serum are included. Using the PeptideAtlas web interface, users can browse and search the Human Plasma PeptideAtlas for identified peptides and identified proteins, view spectra, and select proteotypic peptides. Users can easily view supporting information such as chromosomal mapping, estimated abundances, and sequence alignments. Herein, the reader is instructed in the use of the Human Plasma PeptideAtlas through an illustrated exploration of cytokine receptors in plasma.
- SourceAvailable from: sdu.dk
Article: Mass spectrometry-based proteomics.[show abstract] [hide abstract]
ABSTRACT: Recent successes illustrate the role of mass spectrometry-based proteomics as an indispensable tool for molecular and cellular biology and for the emerging field of systems biology. These include the study of protein-protein interactions via affinity-based isolations on a small and proteome-wide scale, the mapping of numerous organelles, the concurrent description of the malaria parasite genome and proteome, and the generation of quantitative protein profiles from diverse species. The ability of mass spectrometry to identify and, increasingly, to precisely quantify thousands of proteins from complex samples can be expected to impact broadly on biology and medicine.Nature 04/2003; 422(6928):198-207. · 38.60 Impact Factor
- Molecular & Cellular Proteomics 07/2002; 1(6):413-4. · 7.25 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: A comprehensive, systematic characterization of cirolating proteins in health and disease will greatly facilitate development of biomarkers for prevention, diagnosis, and therapy of cancers and other diseases. The Human Proteome Organization Plasma Proteome Project pilot phase aims to (1) compare the advantages and limitations of many technology platforms; (2) contrast reference specimens of human plasma (ethylenediaminetetra acetic acid, heparin, citrate-anticoagulated) and serum, in terms of numbers of proteins identified and any interferences with various technology platforms; and (3) create a global knowledge base/data repository.PROTEOMICS 06/2004; 4(5):1235-40. · 4.13 Impact Factor
Using the Human Plasma PeptideAtlas to study human plasma proteins
Running head: Using the Human Plasma PeptideAtlas
Methods in Molecular Biology
Terry Farrah1*, Eric Deutsch1 & Ruedi Aebersold2
1 Institute for Systems Biology, 1441 N 34th St., Seattle, WA 98103, USA
2 Institute of Molecular Systems Biology, ETH Zurich, Wolfgang-Pauli-Str. 16, 8093,
* To whom correspondence should be addressed. firstname.lastname@example.org.
PeptideAtlas is a web-accessible database of LC-MS/MS shotgun proteomics results from
hundreds of experiments conducted in diverse laboratories. Ninety-one experiments on
human plasma and serum are included in the subsection, or ―build‖, named the Human
Plasma PeptideAtlas. Using the PeptideAtlas web interface, users can browse and search
identified peptides and identified proteins, view spectra, and select proteotypic peptides.
Users can easily view auxiliary information such as chromosomal mapping, sequence
alignments, and much more. Herein, the reader is instructed in the use of the Human
Plasma PeptideAtlas through an illustrated example.
proteomics, plasma, blood, serum, peptide, database, web server, computer application
Shotgun proteomics using LC-MS/MS (liquid chromatography, tandem mass
spectrometry) is currently the most powerful tool available for discovering proteins
present in human plasma (1). As the technique develops, more and more proteins can be
identified in a single experiment. To compile the most comprehensive list of human
plasma proteins, one approach is to collect all proteins identified in various individual
proteomics experiments, as was done by HUPO (the Human Proteome Organization (2))
via the Human Plasma Proteome Project (HPPP) in 2003-2005 (3). In HPPP Phase I,
protein identifications from 18 laboratories were combined, and all proteins identified by
at least two laboratories were used to generate a list of 3020 proteins (4).
One deficiency of this approach is that each laboratory interprets its data in its own
manner, resulting in protein lists that are not comparable. HPPP Phase II (5) aims to
address this by collecting raw data, rather than protein identifications. The PeptideAtlas
project (6), a key participant in this effort, collects raw data from experiments conducted
in many different laboratories (including HPPP data) and processes them using a
common computational pipeline. To date, PeptideAtlas has collected and interpreted 91
experiments from human plasma, yielding over 3 million identified spectra and 20,709
distinctly identified peptides, which provide evidence for at least 2170 different proteins
(7). While HPPP Phase I resulted in more protein identifications, the PeptideAtlas protein
identifications have a higher confidence, with a false discovery rate (FDR) of 1%.
PeptideAtlas is accessible to the public via a web interface. For human plasma (as well as
a number of other species proteomes and subproteomes), it provides a large database of
proteins, peptides and spectra, with supporting data such as probabilities, FDRs, genome
mappings, sequence alignments, links to other databases, uniqueness of peptide-protein
mappings, observability of peptides, predicted observable peptides, estimated protein
abundances, and cross-references to other databases. The PeptideAtlas also provides
many useful methods for accessing the data; the user may search by protein or peptide, or
may construct a query to retrieve proteins or peptides with certain characteristics.
This chapter provides an introduction to using the Human Plasma PeptideAtlas by
guiding the reader through an illustrated example.
2. PeptideAtlas Construction
First, we provide a description of how data is added to PeptideAtlas, illustrated in Figure
1. PeptideAtlas is organized into various builds, each encompassing data from a single
proteome or subproteome. Each build begins with raw LC-MS/MS spectra contributed by
the community. Data can be deposited into one of several repositories or sent directly to
the PeptideAtlas project. We then search these spectra against a sequence database (8), a
spectral library (9), or both. Each search assigns a peptide identification and score to each
spectrum. Search results are mapped to a comprehensive reference protein database (for
human builds, this is a combination of Swiss-Prot (10), Ensembl (11) and IPI (12)), and
post-processed using the Trans-Proteomic Pipeline (13), a suite of software tools
developed at the Institute for Systems Biology for assigning a probability of being correct
to each peptide-spectrum match (PSM), distinct peptide identification and protein
identification. The Trans-Proteomic Pipeline includes the tools PeptideProphet (14),
InterProphet (15), and ProteinProphet (16). Finally, other software processing tools are
applied to store PSM, peptide and protein identifications in PeptideAtlas, and to generate
and store supporting data such as genome mappings and predicted proteotypic peptides.
The atlas build is then made available to the community.
Figure 1: PeptideAtlas is built from data provided by the community, and is itself a
community resource. Data can be submitted to one of several repositories, or sent
directly to the PeptideAtlas project. The data are then processed via a uniform
pipeline that includes searching, validating by the Trans-Proteomic Pipeline and
post-processing. The results, as well as the raw data, are stored in a database and
made accessible via a web interface (this figure first appeared in (17)).
PeptideAtlas and its web interface are continually under development, and data are
constantly being added. Thus, what the reader sees when using PeptideAtlas may not
always exactly match what is described in this tutorial.
3. Using PeptideAtlas
3.1 PeptideAtlas web interface
Go to www.peptideatlas.org. The front page provides basic search functionality and
displays PeptideAtlas news. To access the full functionality of PeptideAtlas, click GO,
without typing anything in the search box. You will see the page shown in Figure 2.
Figure 2: PeptideAtlas web interface. Near the top of the display are four tabs
representing four categories of functionality. Search functionality is shown. Users
may enter a protein accession, gene name, protein description, peptide accession, or
peptide string to quickly access all matching peptides in all of the latest PeptideAtlas
builds or in a particular build.
Near the top are four gray tabs representing four categories of functionality:
Search: Keyword search within a single build or across all the most current builds.
All Builds: Allows the user to view all available builds, to select a particular build for
functions under the Current Build tab, and to see a peptide’s presence across all builds.
Current Build: Allows the user to obtain information on a specific peptide or protein in
the currently selected build.
Queries: Allows the user to retrieve information on a set of peptides, proteins, or
transitions that satisfy user-specified criteria..
To see the functions that are available for each tab, place the cursor over the tab. Many of
these options will be reviewed in detail later in the tutorial.
In the left sidebar are links that will take you to auxiliary pages within PeptideAtlas. Here
are descriptions of some key pages:
Overview: Description of how PeptideAtlas is constructed.
Publications: What to cite if you use PeptideAtlas in your research.
Data Repository: Links for downloading much of the raw data used to construct
HPPP Data Central: Background and links for the HUPO Human Plasma Proteome
PeptideAtlas Builds: Lists and download links for all available PeptideAtlas builds.
Search Database: A conveniently accessible search bar with the same functionality as the
Search tab (described below).
Contribute Data: How to contribute your own data to the PeptideAtlas project.
Libraries + Info: Access to spectral libraries created from data in PeptideAtlas builds as
well as libraries from NIST (18) (National Institute of Standards and Technology) and
links to other spectral library search resources. You can download the libraries available
here and use them to perform spectral library searches of your own data.
SpectraST Search: A web server allowing you to perform a spectral library search using
the SpectraST software (9).
After exploring these links, click Search Database in the left navigation bar, to return to
the Search functionality.
3.2 Search for a protein by keyword
Using the Search functionality, you can enter a protein accession, a peptide sequence, a
gene name, or a keyword or phrase, and retrieve links to all matching proteins. You can
search one build or all current builds.
To illustrate the use of PeptideAtlas, we will investigate the presence of cytokine
receptors in plasma. Cytokines are signaling molecules that are secreted by specific cells
of the immune system and interact with receptors on the surfaces of other cells, thereby
initiating various responses. Cytokine receptors perform their primary functions while
embedded in the surfaces of cells. However, under some conditions, cytokine receptor
extracellular portions are shed and may then perform a variety of secondary functions.
More generally, it has been hypothesized that nearly every human protein will appear in
plasma under some conditions.
Click the Build type dropdown menu. You will see a list of all current builds, including
two human plasma builds. The more encompassing of these is named ―Human Plasma‖.
Select Human Plasma, click Tabular Results, click the + (plus) sign next to Advanced
Search, and check Protein/Gene Name. Finally, to search for cytokine receptor names
that contain the word ―interleukin‖, type interleukin in the search box and click GO.
Note that not all of the proteins listed in the results are actually observed in this atlas, but
rather, the list includes all proteins within the database that match your search criteria.
The rightmost column, N Peptide obs, lists the number of observations of each protein.
To focus on observed proteins, perform a descending sort on the last column by clicking
the downward-pointing gray triangle.
3.3 Protein View
The third most frequently observed interleukin receptor protein in the Human Plasma
PeptideAtlas is Interleukin-6 receptor subunit beta (IL6ST). Click the identifier link
(P40189-2) for that result. You will be taken to the PeptideAtlas Protein View (Figure 3).
Note that a different primary tab, Current Build, is now highlighted, and that the
secondary tab, Protein, is selected.
For each protein in the reference proteome for a given build, a dynamic Protein View
page summarizes the information available for that protein. The page is segmented into
several collapsible sections that can be easily minimized by clicking the small icon in the
orange section header.
Figure 3: Protein View, top section. For any protein included in a particular
PeptideAtlas build, users may retrieve a page such as this displaying alternative
names (most of which are clickable hyperlinks to external databases), the full name
of the protein, and the number of distinct peptides and spectra (observations) that
map to this protein within this build. The Protein View provides additional
information further down on the page, as shown in Figures 4 and 5.
The top section provides basic information about the protein, including alternative names
(most of which are hyperlinked to external databases), as well as the total number of
spectra (observations) and distinct peptides that map to the protein. To learn more about
interleukin-6 receptor subunit beta, click the second UniProt link (P40189-2). Here, we
see that this protein is also known by several other names, including the commonly used
name gp130. For the purpose of this tutorial, we will refer to it as IL-6R-beta. The
UniProt page also describes the structure and function of this molecule and provides
literature references. We see that Isoform 1 is a membrane protein, whereas the smaller
Isoform 2 (P40189-2) is secreted. Because the protein we are examining lists P40189-2 as
its synonym, we know it is the smaller, secreted isoform. According to this page, IL-6R-
beta is a signal-transducing molecule participating in the receptor systems for IL6 and
several other cytokines.
Going back to the top section of the Protein View page (Figure 3), we see that two
distinct peptides have been identified as IL-6R-beta from a total of 10 observations
(spectra). Just below is an External Links section that takes you to other peptide and
protein atlases: the Human Protein Atlas, which lists antibodies available for the protein,
and the Global Proteome Machine, which is another database that collects and displays
MS/MS search results to the community.
Figure 4: More details from the Protein View page. The Sequence Motifs section
displays a linear representation of the protein, with differently shaded bars
depicting observed peptides and peptides unlikely to ever be observed due to small
The following two sections, Sequence Motifs and Sequence, summarize the peptide
coverage of the protein (Figure 4). A graphical diagram, similar to a genome browser
view, summarizes all the peptides that map either uniquely or redundantly to the protein,
plus information on segments unlikely to be observed with mass spectrometers, as well as
signal peptides and transmembrane domains, where available. The observed peptides are
highlighted in red in the actual protein sequence. We see that for IL-6R-beta, the two
observed peptides are in the C-terminal half of the chain, covering only 12.7% of the
likely observable sequence. It is not surprising that the protein coverage is so low, given
that we expect cytokine receptor proteins to be present at very low abundance in plasma,
and only under certain conditions.
Figure 5: The last four sections in the Protein View page. First, observed peptides
are listed, with links to further details for each and to a Cytoscape representation of
their protein mappings. Second, predicted highly observable peptides are listed,
which are of use in the design of targeted proteomics experiments. Third, a
graphical map shows which peptides are observed in which samples, with darker
shading denoting more observations. Finally, links to the samples are provided.
Next is the Distinct Observed Peptides section, which lists all the observed peptides and
maps them to the protein (Figure 5). Each peptide has a PeptideAtlas accession of the
form PApxxxxxxxx; a peptide with a given sequence has the same accession in any
PeptideAtlas build. The table displays several attributes of the peptides, including the
number of times they were observed in the selected build (N Obs), highest
PeptideProphet probability among all observations (Best Prob), theoretically calculated
hydrophobicity (RHS), and the samples in which the peptides were observed. The
Empirical Observability Score (EOS) and Suitability Score (ESS) metrics are listed as
well. The EOS reflects the likelihood that if the protein is detectable in the sample, it will
be detected via that peptide. The ESS represents a ranking of how suitable the peptide is
as a reference or proteotypic peptide. The score includes information about the total
number of observations, the EOS, the best probability of identification. It also includes
penalties if the peptides are not fully tryptic or contain missed cleavages or undesirable
residues that might impact the suitability of the peptide for targeting (such as methionine,
which is variably oxidized).
Immediately below is a Cytoscape (19) link, allowing users to see which observed
peptides are also observed in related proteins. For IL-6R-beta, this network (Figure 6)
shows the two peptides listed in the Protein View page, PAp01154397 and
PAp00558899, connected to the protein we are examining (ENSP00000314481) and ten
additional proteins. The network also includes two peptides not seen in the Protein View
page, PAp00415855 and PAp01420893, that each also map to some of these ten
Figure 6: Cytoscape (19) depiction of peptide-protein mapping. Squares near the
center denote peptides, and circles near the periphery denote proteins. Here, one
peptide maps to all twelve proteins shown, while the other peptides map to fewer
proteins. The user can grab and move the nodes. The Cytoscape graphic is
hyperlinked from the Protein View page.
Below the Cytoscape link is the Predicted Highly Observable Peptides section, listing
theoretical peptides for the protein. Each protein is digested in silico and both the
PeptideSieve (20) and DetectabilityPredictor (21) software tools are used to predict
which peptides might be easily detectable in a electrospray mass spectrometry platform.
For low abundance or otherwise hard-to-detect proteins, these theoretical predictions are
useful. For IL-6R-beta, we see in the left column that three theoretical peptides have
peptide accessions, indicating that they are actually observed somewhere in PeptideAtlas,
although not necessarily in this build.
Next, under the heading Sample peptide map, is a graphical depiction of the abundance of
each peptide in each sample. Darker shades denote more observations.
Finally, the Observed in Samples section provides links to the samples in which the
protein was observed. Note that IL-6R-beta was observed only in samples from trauma
patients. IL-6R-beta is known to regulate cell growth and differentiation and to play an
important role in immune response. Observation of IL-6R-beta in plasma after trauma
may lead an expert to hypothesize further about IL-6R-beta’s role in the body’s response
to trauma. To learn more about these samples, click the links.
3.4 Peptide View
To learn more about the peptide PAp01154397 (SSFTVQDLKPFTEYVFR), click the
link on the Protein View page in the Distinct Observed Peptides section, which will take
you to the Peptide View page (Figure 7).