Virus Research 158 (2011) 251–256
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/virusres
Picobirnaviruses encode a protein with repeats of the ExxRxNxxxE motif
Bruno Da Costaa, Stéphane Duquerroyb,c, Bogdan Tarusa, Bernard Delmasa,∗
aUnité de Virologie et Immunologie moléculaires, UR892 INRA, F-78350 Jouy-en-Josas, France
bInstitut Pasteur, Unité de Virologie Structurale, Virology Department and CNRS URA 3015, Paris, France
cUniversité Paris-Sud, Faculté d’Orsay, Orsay, France
a r t i c l ei n f o
Received 18 November 2010
Received in revised form 18 February 2011
Accepted 24 February 2011
Available online 2 March 2011
Amino acid sequence motif
Short linear motif
a b s t r a c t
Picobirnaviruses possess a bisegmented double-stranded RNA genome. While the segment 2 encodes
the RNA-dependent RNA polymerase, the segment 1 displays two open reading frames (ORFs). ORF2
was recently shown to code the capsid precursor and ORF1 product has not been characterized. In this
study, we show that the three ORF1 sequences available in databases and representing three phylo-
genetically distant picobirnaviruses (two from human and one from rabbit hosts) encode proteins of
dicted structurally different of the upstream domains containing the motif repetitions. The ExxRxNxxxE
sequence was not previously identified as a short linear motif in eukaryotic and prokaryotic proteins. Its
function remains elusive.
© 2011 Elsevier B.V. All rights reserved.
Picobirnaviruses (PBVs) belong to a unique genus of the new
virus Picobirnaviridae family (Delmas, in press). Virions are iso-
metric, non-enveloped, 35nm in diameter and they possess a
bisegmented double-stranded RNA (dsRNA) genome (Pereira et al.,
1988b; Nates et al., in press). The small genome segment (segment
2) is 1.2–1.9kbp long and encodes the viral RNA-dependent RNA
polymerase (RdRp) (Rosen et al., 2000). Only two complete (from a
human and a rabbit PBV) and several partial human PBV sequences
of the large genome segment (segment 1) are available in sequence
ORFs (ORF1 and ORF2). ORF1 could encode a 10–20kDa protein of
unknown function. More information is yet available for ORF2; it
encodes the precursor of the capsid protein (CP). For rabbit PBV,
recombinant ORF2 expression using the baculovirus/Sf9 cells sys-
tem yielded to the synthesis of a precursor that is autocatalytically
cleaved to generate a large N-terminal post-translationally mod-
ified peptide and mature CP and to the formation of virus-like
particles (VLPs) (Duquerroy et al., 2009). The 3.4˚A X-ray structure
of the VLPs shows a simple capsid with an unusual icosahedral
arrangement, displaying 60 twofold symmetric dimers of the CP
(Duquerroy et al., 2009).
∗Corresponding author. Tel.: +33 1 3465 2627.
E-mail address: email@example.com (B. Delmas).
PBVs are widely distributed geographically among humans and
mammals in general, and have also been reported in birds and
reptiles (Gallimore et al., 1995; Fregolente et al., 2009 and refer-
ences therein). PBV has also been detected in raw sewage samples
(Symonds et al., 2009). Laboratory diagnostic of PBVs infections
have been mainly carried out by the detection of the two dsRNA
genome segments in polyacrylamide gel electrophoresis (Pereira
et al., 1988a). The use of large-scale RNA sequencing techniques
recently revealed the abundance of picobirnaviruses in particular
clinical or environmental samples (Zhang et al., 2006; Finkbeiner
et al., 2010). Genome sequences analyses of PBVs originating from
different species or isolates from the same individual revealed high
sequence heterogeneity (Banyai et al., 2003; Zhang et al., 2006)
and relatedness between human and animal strains (Banyai et al.,
2008). Altogether, the data do not provide evidence of virus clus-
ters specific to a host species (van Leeuwen et al., 2010; Giordano
et al., 2010).
The pathogenicity of PBV has not been established. Studies car-
ried out with immunocompromised persons suggested that PBVs
are opportunistic pathogens that may cause diarrhea. PBVs have
been detected in stool samples from children with diarrhoea as
well as in immunocompromised patients, and they have also been
detected in individuals lacking symptoms of gastroenteritis.
Short linear motifs, SLiMs, LMs or minimotifs in proteins are
amino acid stretches that are responsible for mediating a func-
0168-1702/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
B. Da Costa et al. / Virus Research 158 (2011) 251–256
Capsid protein precursor
Fig. 1. Genetic organization of the picobirnavirus segment 1. Two complete and one partial sequences of picobirnaviruses segment 1 and encompassing ORF1 are available in
sequence databases: sequences of two human picobirnaviruses, NC 007026/human1 PBV (Wakuda et al., 2005), GU968923/human2 PBV (van Leeuwen et al., 2010) and of a
rabbit picobirnavirus, AJ244022/rabbit PBV (Green et al., 1999). The three frames for each virus is vizualized with stop codons corresponding to full vertical bars and initiation
codons with short vertical bars. The ORFs are indicated by colors. Asterisks at the 5?end of two segments mark short putative ORFs that are assumed to be non-functional.
(For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
They are much shorter than protein domains, usually less than 15
amino acids in length. The residues responsible for the function
are sequential in the protein primary structure and not coming
from distant parts of the sequence. Some proteins contain repeats
of SLIMs; as an example, the Eps15-related protein (NP 067058)
Motif database (ELM) (Gould et al., 2010), about a third was found
the regulatory processes of the infected cell (Davey et al., in press).
These motifs are typically found in intrinsically disordered regions
(IDR) of proteins, allowing them to function independently of their
structural context. IDR lack populated constant secondary and ter-
tiary structure under physiological conditions and in the absence
of a binding partner (for a recent review on IDR, see Uversky and
Dunker, 2010 and references therein).
As a first step towards the characterization of the proteins
encoded by segment 1 ORF1 of phylogenetically distant PBV, we
herein describe the results of a thorough computational analysis of
their primary structures. We carried out a matrix-based sequence
comparison analysis to evidence putative motif repetitions in the
sequences. A repeated 10 amino acid long motif was identified
and characterized by careful sequence examination. Next, because
SLiMs are often found in disordered domains, algorithms allowing
predictions of intrinsically disordered regions with very good con-
fidence (Li et al., 1999; Romero et al., 2001; Lieutaud et al., 2008)
were used to determine if the motifs could be assigned to struc-
tured or unstructured regions. The hydrophobic cluster analysis
(HCA) (Callebaut et al., 1997) allowed identification of the bound-
aries between the regions containing motif repetitions and the
Fig. 1 summarizes our current knowledge of the genome organ-
isation of the PBV segment 1. A full human PBV and a full rabbit
PBV sequence completed with a partial sequence of a human iso-
late overlapping ORF1, the only three ORF1 sequences available in
is made of 224 (designed as human1) and 213 (human2) codons for
the two human isolates and of 106 codons for the rabbit virus. The
two human PBV ORF1 and ORF2 overlap at few nucleotides, while
they do not in the rabbit PBV segment.
The matrix-based sequence comparison analyses show that the
three ORF1-encoded proteins possess sequence repeats that were
evidenced for human2 and rabbit PBV proteins on the 3/4 of their
sequence. Examination of the three sequences allowed the identi-
fication of a common amino acid sequence signature made of 4
conserved residues and characterized by the following consensus
motif ExxRxNxxxE, repeated 4, 7 or 10 times in the rabbit, human1
between the N and the last E is frequently A (Fig. 3). The corre-
sponding ETxRxNxAxE motif signature was identified 3 times in
the rabbit virus, 2 and 5 times in its human homologs. In one case,
the last E was replaced by a L (in the human2 PBV sequence). The
amino acid distance between two motifs is variable. Generally, in
the human PBV proteins, one residue separates two motifs, but the
distance can increase to up 19 residues. In the rabbit PBV protein,
the distance between the four motifs is invariably of eight residues.
To determine whether the repetitions of the ExxRxNxxxE motif
may reflect the existence of a structured or of a disordered domain,
the PONDR-FIT, a meta-predictor of intrinsic disorder, was first
the regions containing the repeats were all predicted to adopt an
extensive intrinsic disorder. In contrast, the C-terminal domains
were not predicted being disordered on a 30–50 amino acids long
stretch. To confirm this analysis, the amino acid sequences were
analyzed using the MeDor metaserver for the prediction of dis-
order (Lieutaud et al., 2008). In the graphic output generated by
MeDor, disordered regions, as analyzed by the various predic-
tors, are shown along with the hydrophobic cluster analysis (HCA)
(Callebaut et al., 1997) plot of the query sequence. Several main
features emerged from the analyses (Fig. 5) and corroborated the
PONDR data. The domains containing the motif repetitions are
B. Da Costa et al. / Virus Research 158 (2011) 251–256
Fig. 2. Sequence analyses of the proteins encoded by the picobiravirus ORF1. (Left) A dot-matrix sequence alignment method available in DNA Strider (Douglas, 1995) was
used for analyses. The dot plots show regional self-similarities. The main diagonal represents the sequence’s alignment with itself; lines off the diagonal represent repetitive
patterns within the sequence. (Right) Protein sequences. The amino acids constituting the motif are colored in red. Sequences are presented in a way the motifs are aligned
vertically. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)
Fig. 3. Logo for the ExxRxNxxxE motif. The 21 identified 10-amino acid long sequences defining the motif were multialigned and submitted to logo representation (Schneider
and Stephens, 1990; http://weblogo.berkeley.edu/). The characters representing the amino acids are stacked on top of each other for each position in the aligned sequences.
The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top. The height of the entire stack is then adjusted
to signify the information content of the sequences at that position.
consistently predicted disordered and depleted in hydrophobic
clusters. However, short secondary structure elements (?-helices)
are consistently predicted in these domains. For human2 and rab-
bit PBV, the domains that do not contain the motif repetitions are
short (20–45 residues) and located in the C-terminus. They are
not predicted as unfolded, but possibly forming a short hydropho-
bic cluster (rabbit PBV). The C-terminal moieties of human1 PBV1
displays a more complex organization, with regions containing
hydrophobic and glycine residues which alternate with regions
constituted of charged residues. Like the rabbit PBV protein, the
extreme C-terminus of human1 PBV is predicted as forming a
hydrophobic cluster in association to glycines. To test the ability of
these C-terminal domains to form ?-helices, we used the AGADIR
algorithm (Lacroix et al., 1998; http://agadir.crg.es/). None of the
B. Da Costa et al. / Virus Research 158 (2011) 251–256
Fig. 4. Disorder and helical content predictions for the proteins encoded by the picobirnavirus ORF1 using the PONDR and AGADIR algorithms, respectively. Red lines
represent the disorder prediction for the whole protein. Residues above the threshold of 0.5 are predicted disordered. Calculation of the average helical content is performed
with AGADIR (blue lines). The repeated motifs (ExxRxNxxxR) are marked by blue filled rectangles. (For interpretation of the references to color in this figure legend, the
reader is referred to the web version of the article.)
three C-terminal domains was predicted to form a stable helical
structure (Fig. 4).
To further analyze the nature of the ExxRxNxxxE motif and
the secondary structures associated with, we identified in the PDB
database proteins in which the motif was present (see Table 1 in
supplementary data for selected examples). The motif was mainly
involved in ?-helices in the ExxR stretches, with a hydrogen bond
2P25 and 3CSK). Thus, amino acid indexed as x in the motif or in its
surrounding are expected to play a role in its folding.
We concluded from this study that the gene located upstream
the CP gene encodes a protein that is homologous among pico-
birnaviruses, a feature that was not evidenced earlier because of
low sequence similarity and size variability between the predicted
B. Da Costa et al. / Virus Research 158 (2011) 251–256
Fig. 5. Modular organization and disorder prediction of the proteins encoded by the picobirnavirus ORF1. MeDor outputs and proposed modular organization of the three
proteins. The sequences are represented as single, continuous horizontal lines below the predicted secondary structure elements. Below the sequences are shown the HCA
plots and the predicted regions of disorder that are represented by bidirectional arrows. Five disorder predictors, namely IUPred (Dosztányi et al., 2005), RONN (Yang et al.,
2005), Foldindex (Prilusky et al., 2005), Globplot2 (Linding et al., 2003b) and DisEMBL (Linding et al., 2003a) were used. Secondary structure predictions were carried out
using the pred2ary algorithm using the default parameters (Chandonia and Karplus, 1999). The ExxRxNxxxE motifs are indicated by blue rectangles. A limited number of
three amino acid symbols instead of the current one-letter code are used: (?) for glycine, (
randomly distributed, they are relatively absent in the domains containing the repeats but tend to form horizontal clusters in the C-terminal domains. Regions containing
hydrophobic and glycine residues are surrounded. Note the absence of proline and cysteine residues in the entire sequences. (For interpretation of the references to color in
this figure legend, the reader is referred to the web version of the article.)
) and (?) for serine and threonine, respectively. Hydrophobic amino acids are not
times along their sequence was identified. To our knowledge, that
is the first time a viral protein exhibiting such a large number of
amino acid motif repetitions is identified.
What could be the function of the ExxRxNxxxE motif?
Search of this motif in the sequence and 3D-structure databases
UniProtKB/Swiss-Prot and PDB using the ProSite tool (15Feb2011)
allows identification of 1008 and 383 hits, respectively. Two repe-
titions of the motif were found in a conserved hypothetical protein
of Campylobacter gracilis (GenBank accession number: EEV18412),
in the extracellular domain of an isoform of EGF-like module
receptor 4 (Q86SQ3-2) and in envelope glycoproteins B of two
herpesviruses (P17471 and P18538). Only nine or no hits were
found with the more restrictive motif definition as ETxRxNxAxE in
UniProtKB/Swiss-Prot and PDB, respectively. These identifications
did not provide any evidence for its function. Using the Eukary-
otic Linear Motif (ELM) resource for identification of functional
sites in proteins (http://elm.eu.org/, Gould et al., 2010), we were
not able to recognize the ExxRxNxxxE motif as the signature of a
particular cellular function. Davey et al. (in press) listed validated
examples of viral SLIM present in viral proteomes associated to
several functions such as protein degradation, cell signalling, viral
egress, immune response, transport, cell cycle or transcriptional
regulation or translation. None of them exhibited similarities with
the ExxRxNxxxE motif we identified. Thus, the function of this
motif remains elusive. The predicted disordered domains contain-
with cellular components. Definitive answers on the ExxRxNxxxE
motif function await identification of these binding partner(s).
This work was supported by INRA and an ANR grant from
the Programme blanc 2006 “VirusEntry”. We thank Jean-Franc ¸ois
Eléouët for critical reading of the manuscript.
B. Da Costa et al. / Virus Research 158 (2011) 251–256
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at doi:10.1016/j.virusres.2011.02.018.
Banyai, K., Jakab, F., Reuter, G., Bene, J., Uj, M., Melegh, B., Szucs, G., 2003. Sequence
heterogeneity among human picobirnaviruses detected in a gastroenteritis out-
break. Arch. Virol. 148, 2281–2291.
Banyai, K., Martella, V., Bogdan, A., Forgach, P., Jakab, F., Meleg, E., Biro, H., Melegh,
B., Szucs, G., 2008. Genogroup I picobirnaviruses in pigs: evidence for genetic
diversity and relatedness to human strains. J. Gen. Virol. 89, 534–539.
Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henris-
sat, B., Mornon, J.P., 1997. Deciphering protein sequence information through
hydrophobic cluster analysis (HCA): current status and perspectives. Cell. Mol.
Life Sci. 53, 621–645.
Chandonia, J.M., Karplus, M., 1999. New methods for accurate prediction of protein
secondary structure. Proteins 35, 293–306.
Sci., in press (available online 9 December 2010).
Delmas, B. Picobirnaviridae. In: Virus Taxonomy, IX ICTV Report. Academic Press,
London, in press.
Dosztányi, Z., Csizmok, V., Tompa, P., Simon, I., 2005. IUPred: web server for the
prediction of intrinsically unstructured regions of proteins based on estimated
energy content. Bioinformatics 21, 3433–3434.
Douglas, S.E., 1995. DNA Strider. An inexpensive sequence analysis package for the
Macintosh. Mol. Biotechnol. 3, 37–45.
Rey, F.A., 2009. The picobirnavirus crystal structure provides functional insights
into virion assembly and cell entry. EMBO J. 28, 1655–1665.
Finkbeiner, S.R., Allred, A.F., Tarr, P.I., Klein, E.J., Kirkwood, C.D., Wang, D., 2008.
Metagenomic analysis of human diarrhea: viral detection and discovery. PLoS
Pathog. 4 (2), e1000011.
Fregolente, M.C.D., de Castro-Dias, E., Martins, S.S., Spilki, F.R., Allegretti, S.M., Gatti,
Res. 143, 134–136.
Gallimore, C.I., Appleton, H., Lewis, D., Green, J., Brown, D.W.G., 1995. Detection
and 159 characterisation of bisegmented double-stranded RNA viruses (Pico-
birnaviruses) in human 160 faecal specimens. J. Med. Virol. 45, 135–140.
Giordano, M.O., Martinez, L.C., Masachessi, G., Barril, P.A., Ferreyra, L.J., Isa, M.B.,
Valle, M.C., Massari, P.U., Nates, S.V., 2010. Evidence of closely related picobir-
navirus strains circulating in humans and pigs in Argentina. J Infect. (October).
Gould, C.M., Diella, F., Via, A., Puntervoll, P., Gemünd, C., Chabanis-Davidson, S.,
Michael, S., Sayadi, A., Bryne, J.C., Chica, C., Seiler, M., Davey, N.E., Haslam, N.,
Weatheritt, R.J., Budd, A., Hughes, T., Pas, J., Rychlewski, L., Travé, G., Aasland,
R., Helmer-Citterich, M., Linding, R., Gibson, T.J., 2010 January. ELM: the status
of the 2010 eukaryotic linear motif resource. Nucleic Acids Res. (38 (Database
Green, J., Gallimore, C.I., Clewley, J.P., Brown, D.W., 1999. Genomic characterisation
picobirnavirus of Cryptosporidium parvum. Arch. Virol. 144, 2457–2465.
Lacroix, E., Viguera, A.R., Serrano, L., 1998. Elucidating the folding problem of a-
helices: local motifs, long-range electrostatics, ionic strength dependence and
prediction of NMR parameters. J. Mol. Biol. 284, 173–191.
Li, X., Romero, P., Rani, M., Dunker, A.K., Obradovic, Z., 1999. Predicting protein dis-
order for N-, C-, and internal regions. Genome Inform. Ser. Workshop Genome
Inform. 10, 30–40.
Lieutaud, P., Canard, B., Longhi, S., 2008. MeDor: a metaserver for predicting protein
disorder. BMC Genomics 9 (Suppl. 2), S25.
Linding, R., Jensen, L.J., Diella, F., Bork, P., Gibson, T.J., Russell, R.B., 2003a. Pro-
tein disorder prediction: implications for structural proteomics. Structure 11,
Linding, R., Russell, R.B., Neduva, V., Gibson, T.J., 2003b. GlobPlot: exploring protein
sequences for globularity and disorder. Nucleic Acids Res. 31, 3701–3708.
Nates, S.V., Gatti, M.S.V., Ludert, J.E. The picobirnavirus: an integrated view on its
biology, epidemiology and pathogenic potential. Future Virol., in press.
Pereira, H.G., Flewett, T.H., Candelas, J.A., Barth, O.M., 1988a. A virus with a biseg-
mented double-stranded RNA genome in rat (Oryzomys nigripes) intestines. J.
Gen. Virol. 69, 2749–2754.
Pereira, H.G., Fialho, A.M., Flewett, T.H., Teixeira, J.M.S., Andrade, Z.P., 1988b. Novel
viruses in human faeces. Lancet 2 (8602), 103–104.
Prilusky, J., Felder, C.E., Zeev-Ben-Mordehai, T., Rydberg, E.H., Man, O., Beck-
mann, J.S., Silman, I., Sussman, J.L., 2005. FoldIndex: a simple tool to predict
whether a given protein sequence is intrinsically unfolded. Bioinformatics 21,
complexity of disordered proteins. Proteins 42, 38–48.
Schmid, E.M., Ford, M.G.J., Burtey, A., Praefcke, G.J.K., Peak-Chew, S.Y., Mills, I.G.,
Benmerah, A., McMahon, H.T., 2006. Role of the AP2 beta-appendage hub in
recruiting partners for clathrin-coated vesicle assembly. PLoS Biol. Sep. 4 (9),
Schneider, T.D., Stephens, R.M., 1990. Sequence logos: a new way to display consen-
sus sequences. Nucleic Acid Res. 18, 6097–6100.
Symonds, E.M., Griffin, D.W., Breitbart, M., 2009. Eukaryotic viruses in wastewater
samples from the United States. Appl. Environ. Microbiol. 75, 1402–1409.
Uversky, V.N., Dunker, A.K., 2010. Understanding protein non-folding. Biochim. Bio-
phys. Acta 1804, 1231–1264.
van Leeuwen, M., Williams, M.M.W., Koraka, P., Simon, J.H., Smits, S.L., Osterhaus,
A.D.M.E., 2010. Human picobirnaviruses identified by molecular screening of
diarrhea samples. J. Clin. Microbiol. 48, 1787–1794.
Victoria, J.G., Kapoor, A., Dupuis, K., Schnurr, D.P., Delwart, E.L., 2008. Rapid iden-
tification of known and new RNA viruses from animal tissues. PLoS Pathog. 4,
sequences of two segments of human picobirnavirus. J. Virol. Methods 126,
Yang, Z.R., Thomson, R., McNeil, P., Esnouf, R.M., 2005. RONN: the bio-basis function
in proteins. Bioinformatics 21, 3369–3376.
Zhang, T., Breitbart, M., Lee, W.H., Run, J.-Q., Wei, C.L., Soh, S.W.L., Hibberd, M.L., Liu,
of plant pathogenic viruses. PLoS Biol. 4 (1), 3.
K., 2005.Complete nucleotide