Vol. 23 no. 1 2007, pages 5–13
Oligonucleotide fingerprint identification for microarray-based
pathogen diagnostic assays
Waibhav Tembe, Nela Zavaljevski, Elizabeth Bode1, Catherine Chase1, Jeanne Geyer1,
Leonard Wasieloski1, Gary Benson2and Jaques Reifman?
Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US
Army Medical Research and Materiel Command, Ft. Detrick, MD,1Diagnostic Systems Division, US Army Medical
Research Institute of Infectious Diseases, Ft. Detrick, MD and2Departments of Biology and Computer Science,
Boston University, Boston, MA, USA
Received on June 15, 2006; revised on October 18, 2006; accepted on October 21, 2006
Advance Access publication October 26, 2006
Associate Editor: John Quackenbush
Motivation: Advances in DNA microarray technology and computa-
tional methods have unlocked new opportunities to identify ‘DNA fin-
gerprints’, i.e. oligonucleotide sequences that uniquely identify a
specific genome. We present an integrated approach for the computa-
tional identification of DNA fingerprints for design of microarray-based
pathogen diagnostic assays. We provide a quantifiable definition of a
DNA fingerprint stated both from a computational as well as an
experimental point of view, and the analytical proof that all in silico
Results: The presented computational approach is implemented in
an integrated high-performance computing (HPC) software tool for
oligonucleotide fingerprint identification termed TOFI. We employed
TOFI to identify in silico DNA fingerprints for several bacteria and plas-
probes for microarray-based diagnostic assays. Results and analysis
of approximately 150 in silico DNA fingerprints for Yersinia pestis
and 250 fingerprints for Francisella tularensis are presented.
Availability: The implemented algorithm is available upon request.
The recent advances in genomic sequencing and the availability of
large-scale sequence databases have unlocked several opportunities
to identify ‘genomic signatures’ or ‘DNA fingerprints’, i.e. short
DNA sequences that uniquely ascertain the presence or absence of
causative biological agents, such as viruses, bacteria or virulent
genes. For example, a vast number of DNA-based detection and
diagnostic technologies are being developed to quickly identify
biological threat agents (Ivnitski et al., 2003; Slezak et al., 2003;
Draghici et al., 2005; Kaderali and Schliep, 2002), such as the
anthrax-causing bacterium, Bacillus anthracis, and the plague-
causing bacterium, Yersinia pestis. DNA signatures could also be
used to detect the presence of one or more virulent genes, such as
Bacillus genes, which encode important virulence factors,
entereotoxins and exotoxins (Sergeev et al., 2006), and to provide
high-resolution differentiation between closely related microorgan-
isms in microbial forensics (Willse et al., 2004). New viruses and
strains have been identified using a special microarray technology
consisting of approximately 11000 70mer oligonucleotides (Wang
et al., 2002). DNA fingerprints have also been used to develop
diagnostic assays for a wide-range of important applications in
medicine, environmental monitoring and quality control of food
products (Hardiman, 2003; Joos and Fortina, 2005; Wang et al.,
2002; Abbe et al., 2004).
The specific algorithm implemented in a DNA fingerprint identi-
fication method is selected based on (1) whether the DNA finger-
prints are being sought for a specific pathogen strain (e.g. Y.pestis
CO92), a group of pathogens from the same species (e.g. all Y.pestis
strains) or genus (e.g.all Yersinia species), or a set of organisms that
may or may not have any phylogenetic relationship (e.g. to detect
a viral or a bacterial family) and (2) the experimental conditions
specified by the end application technology, such as PCR (Slezak
et al., 2003; Viljoen et al., 2005; Haas et al., 2003; Gordon and
Sensen, 2004) or DNA microarrays (Kaderali and Schliep, 2002;
The useofreal-time PCR-based detectiontechnologyrequires the
identification of three informative sequences: two amplification
primer sequences and an additional probe sequence (the finger-
print). The assay requires that primer hybridization takes place
near the fingerprint and, therefore, imposes constraints on the posi-
tion of the primer and PCR-based fingerprints. Moreover, PCR-
based assays are quite limited in their multiplexing capabilities,
as different assays are required to detect different pathogenic
sequences. In contrast, microarrays do not impose any position
specific constraints on the DNA fingerprints, and several finger-
prints can be simultaneously placed on a microarray to provide
detection redundancy and allow for the diagnosis of multiple patho-
gens on a single assay. Despite these advantages, microarray-based
assays are relatively insensitive and slow compared to the exquisite
sensitivity and speed of PCR-based assays. Microarray sensitivity
canbegreatly enhanced byincorporatingsample amplification prior
to hybridization but, unfortunately, this results in a net increase in
assay time for already slow assays.
This paper is concerned with the identification of DNA finger-
prints for specific, single pathogenic sequences, referred to as the
?To whom correspondence should be addressed.
? The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com
Report Documentation Page
OMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE
01 JAN 2007
2. REPORT TYPE
3. DATES COVERED
4. TITLE AND SUBTITLE
Oligonucleotide fingerprint identification for microarray-based pathogen
diagnostic assays. Bioinformatics 23:3 - 13
5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
Tembe, W Zavaljevski, N Bode, E Chase, C Geyer, J Wasieloski, L
Benson, G Reifman, J
5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
United States Army Medical Research Institute of Infectious Diseases,
Fort Detrick, MD
8. PERFORMING ORGANIZATION
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release, distribution unlimited
13. SUPPLEMENTARY NOTES
MOTIVATION: Advances in DNA microarray technology and computational methods have unlocked new
opportunities to identify "DNA fingerprints," i.e., oligonucleotide sequences that uniquely identify a
specific genome. We present an integrated approach for the computational identification of DNA
fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable
definition of a DNA fingerprint stated both from a computational as well as an experimental point of view,
and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our
approach. RESULTS: The presented computational approach is implemented in an integrated
high-performance computing software tool for oligonucleotide fingerprint identification termed TOFI. We
employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which
were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results
and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for
Francisella tularensis are presented. AVAILABILITY: The implemented algorithm is available upon
15. SUBJECT TERMS
methods, microarray, DNA fingerprint, identification, oligodeoxynucleotides, Yersina pestis, plague
16. SECURITY CLASSIFICATION OF:
17. LIMITATION OF
19a. NAME OF
c. THIS PAGE
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
Joos,T. and Fortina,P. (2005) Microarrays in clinical diagnosis. Humana Press,
Totowa, NJ, USA.
Kaderali,L. and Schliep,A. (2002) Selecting signature oligonucleotides to identify
organisms using DNA arrays. Bioinformatics, 18, 1340–1349.
Kurtz,S. (2002) construction and application of virtual suffix trees.. PhD dissertation,
Technische Fakulto ¨en, Universitat Bielefeld, Bielefeld, Germany.
Kurtz,S. et al. (2004) Versatile and open software for comparing large genomes. BMC
Genome Biol., 5, R12.
Leber,M. et al. (2005) A fractional programming approach to efficient DNA melting
temperature calculation. Bioinformatics, 21, 2375–2382.
Lin,H. et al. (2005) Efficient data access for parallel BLAST. IEEE International
Parallel and Distributed Processing Symposium, Denver, CO.
Nordberg,E. (2005) YODA: selecting signature oligonucleotides. Bioinformatics, 21,
Panjkovich,A. and Melo,F. (2005) Comparison of different melting temperature
calculation methods for short DNA sequences. Bioinformatics, 21, 711–722.
Pruitt,K.D. et al. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant
sequence database of genomes, transcripts and proteins. Nucleic Acids Res., 33,
Rahmann,S. (2003) Fast large scale oligonucleotide selection using the longest com-
mon factor approach. J. Bioinfo. Compu. Biol., 1, 343–361.
SantaLucia,J.,Jr and Hicks,D. (2004) The thermodynamics of DNA structural motifs.
Annu. Rev. Biophys. Biomol. Struct., 33, 415–440.
Schliep,A. et al. (2003) Group testing with DNA chips: generating designs and decod-
ing experiments. In Proceedings of the Computational Systems Bioinformatics,
August 11-14, Stanford, CA, pp. 84–91.
Sergeev,N. et al. (2006) Microarray analysis of Bacillus cereus group virulence factors.
J. Microbiol. Meth., 65, 488–502.
Slezak,T. et al. (2003) Comparative genomics tools applied to bioterrorism defense.
Brief. Bioinform., 4, 133–149.
Urisman,A. et al. (2005) E-Predict: a computational strategy for species identification
based on observed DNA microarray hybridization patterns. BMC Genome Biol.,
Viljoen,G.J. et al. (eds) (2005) Molecular Diagnostics PCR Handbook. Springer
Publishers, Berlin, Germany.
Wang,D. et al. (2002) Microarray-based detection and genotyping of viral pathogens.
Proc. Natl Acad. Sci. USA, 99, 15687–15692.
Weiner,P. (1973) Linear pattern matching algorithms. In Proceedings of 14th IEEE
Annual Symposium on Switching and Automata Theory, Washington, DC, IEEE
Computer Soc., pp. 1–11.
Willse,A. et al. (2004) Quantitative oligonucleotide microarray fingerprinting of
Salmonella enterica isolates. Nucleic Acids Res., 32, 1848–1856.
Oligonucleotide fingerprint identification