Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays.

Biotechnology HPC Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Ft. Detrick, MD Boston, MA, USA.
Bioinformatics (Impact Factor: 4.62). 02/2007; 23(1):5-13. DOI: 10.1093/bioinformatics/btl549
Source: PubMed

ABSTRACT Advances in DNA microarray technology and computational methods have unlocked new opportunities to identify 'DNA fingerprints', i.e. oligonucleotide sequences that uniquely identify a specific genome. We present an integrated approach for the computational identification of DNA fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable definition of a DNA fingerprint stated both from a computational as well as an experimental point of view, and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our approach.
The presented computational approach is implemented in an integrated high-performance computing (HPC) software tool for oligonucleotide fingerprint identification termed TOFI. We employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for Francisella tularensis are presented.
The implemented algorithm is available upon request.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Array comparative genomic hybridization is a quick and cheap method for detecting and genotyping unknown microbial iso- lates. However, there are a fixed number of probes per array, and therefore the number of loci that can be targeted by a single array is limited. For accurate strain genotyping, an array must query a fully representative set of genes from the speciesʼ pan-genome. Prior genotyping arrays have only targeted a single strain or the con- served sequences of gene families. Results: This paper presents a new probe selection algorithm (PanArray) that can target multiple whole genomes in a minimal number of probes. Unlike arrays built on clustered gene families, PanArray guarantees that every subsequence of the genomes is independently targeted by a full complement of probes, increasing the flexibility and accuracy of the associated comparative analysis and genotyping. The viability of the algorithm is demonstrated by the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains at greater than two-fold coverage. Availability and Implementation: The PanArray design software is implemented in C++, and the PanArray source code and the L. monocytogenes array design are freely available upon request.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Signatures are short sequences that are unique and not similar to any other sequence in a databasethat can be used as the basis to identify different species. Even though several signature discoveryalgorithms have been proposed in the past, these algorithms require the entirety of databases to beloaded in the memory, thus restricting the amount of data that they can process. It makes thosealgorithms unable to process databases with large amounts of data. Also, those algorithms usesequential models and have slower discovery speeds, meaning that the efficiency can be improved.
    BMC Bioinformatics 10/2014; 15(1):339. · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Pathogens present in the environment pose a serious threat to human, plant and animal health as evidenced by recent outbreaks. As many pathogens can survive and proliferate in the environment, it is important to understand their population dynamics and pathogenic potential in the environment. To assess pathogenic potential in diverse habitats, we developed a functional gene array, the PathoChip, constructed with key virulence genes related to major virulence factors, such as adherence, colonization, motility, invasion, toxin, immune evasion and iron uptake. A total of 3715 best probes were selected from 13 virulence factors, covering 7417 coding sequences from 1397 microbial species (2336 strains). The specificity of the PathoChip was computationally verified, and approximately 98% of the probes provided specificity at or below the species level, proving its excellent capability for the detection of target sequences with high discrimination power. We applied this array to community samples from soil, seawater and human saliva to assess the occurrence of virulence genes in natural environments. Both the abundance and diversity of virulence genes increased in stressed conditions compared with their corresponding controls, indicating a possible increase in abundance of pathogenic bacteria under environmental perturbations such as warming or oil spills. Statistical analyses showed that microbial communities harboring virulence genes were responsive to environmental perturbations, which drove changes in abundance and distribution of virulence genes. The PathoChip provides a useful tool to identify virulence genes in microbial populations, examine the dynamics of virulence genes in response to environmental perturbations and determine the pathogenic potential of microbial communities.The ISME Journal advance online publication, 13 June 2013; doi:10.1038/ismej.2013.88.
    The ISME Journal 06/2013; · 8.95 Impact Factor


Available from