Article

A Java-based tool for the design of classification microarrays.

School of Electrical Engineering and Computer Science, Washington State University, Pullman, USA.
BMC Bioinformatics (impact factor: 2.75). 02/2008; 9:328. DOI:10.1186/1471-2105-9-328 pp.328
Source: PubMed

ABSTRACT Classification microarrays are used for purposes such as identifying strains of bacteria and determining genetic relationships to understand the epidemiology of an infectious disease. For these cases, mixed microarrays, which are composed of DNA from more than one organism, are more effective than conventional microarrays composed of DNA from a single organism. Selection of probes is a key factor in designing successful mixed microarrays because redundant sequences are inefficient and limited representation of diversity can restrict application of the microarray. We have developed a Java-based software tool, called PLASMID, for use in selecting the minimum set of probe sequences needed to classify different groups of plasmids or bacteria.
The software program was successfully applied to several different sets of data. The utility of PLASMID was illustrated using existing mixed-plasmid microarray data as well as data from a virtual mixed-genome microarray constructed from different strains of Streptococcus. Moreover, use of data from expression microarray experiments demonstrated the generality of PLASMID.
In this paper we describe a new software tool for selecting a set of probes for a classification microarray. While the tool was developed for the design of mixed microarrays-and mixed-plasmid microarrays in particular-it can also be used to design expression arrays. The user can choose from several clustering methods (including hierarchical, non-hierarchical, and a model-based genetic algorithm), several probe ranking methods, and several different display methods. A novel approach is used for probe redundancy reduction, and probe selection is accomplished via stepwise discriminant analysis. Data can be entered in different formats (including Excel and comma-delimited text), and dendrogram, heat map, and scatter plot images can be saved in several different formats (including jpeg and tiff). Weights generated using stepwise discriminant analysis can be stored for analysis of subsequent experimental data. Additionally, PLASMID can be used to construct virtual microarrays with genomes from public databases, which can then be used to identify an optimal set of probes.

0 0
 · 
0 Bookmarks
 · 
49 Views
  • Source
    Article: Using DNA microarrays to identify library-independent markers for bacterial source tracking.
    [show abstract] [hide abstract]
    ABSTRACT: Bacterial source tracking is used to apportion fecal pollution among putative sources. Within this context, library-independent markers are genetic or phenotypic traits that can be used to identify the host origin without a need for library-dependent classification functions. The objective of this project was to use mixed-genome Enterococcus microarrays to identify library-independent markers. Separate shotgun libraries were prepared for five host groups (cow, dog, elk/deer, human, and waterfowl), using genomic DNAs (gDNAs) from ca. 50 Enterococcus isolates for each library. Microarrays were constructed (864 probes per library), and 385 comparative genomic hybridizations were used to identify putative markers. PCR assays were used to screen 95 markers against gDNAs from isolates from known sources collected throughout the United States. This validation process narrowed the selection to 15 markers, with 7 having no recognized homologues and the remaining markers being related to genes involved in metabolic pathways and DNA replication. In most cases, each marker was exclusive to one of four Enterococcus species (Enterococcus casseliflavus, E. faecalis, E. hirae, or E. mundtii). Eight markers were highly specific to either cattle, humans, or elk/deer, while the remaining seven markers were positive for various combinations of hosts other than humans. Based on microarray hybridization data, the prevalence of host-specific markers ranged from 2% to 45% of isolates collected from their respective hosts. A 20-fold difference in prevalence could present challenges for the interpretation of library-independent markers.
    Applied and Environmental Microbiology 04/2006; 72(3):1843-51. · 3.83 Impact Factor
  • Source
    Article: Improved gene selection for classification of microarrays.
    [show abstract] [hide abstract]
    ABSTRACT: In this paper we derive a method for evaluating and improving techniques for selecting informative genes from microarray data. Genes of interest are typically selected by ranking genes according to a test-statistic and then choosing the top k genes. A problem with this approach is that many of these genes are highly correlated. For classification purposes it would be ideal to have distinct but still highly informative genes. We propose three different pre-filter methods--two based on clustering and one based on correlation--to retrieve groups of similar genes. For these groups we apply a test-statistic to finally select genes of interest. We show that this filtered set of genes can be used to significantly improve existing classifiers.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 02/2003;
  • Source
    Article: Selecting genes by test statistics.
    [show abstract] [hide abstract]
    ABSTRACT: Gene selection is an important issue in analyzing multiclass microarray data. Among many proposed selection methods, the traditional ANOVA F test statistic has been employed to identify informative genes for both class prediction (classification) and discovery problems. However, the F test statistic assumes an equal variance. This assumption may not be realistic for gene expression data. This paper explores other alternative test statistics which can handle heterogeneity of the variances. We study five such test statistics, which include Brown-Forsythe test statistic and Welch test statistic. Their performance is evaluated and compared with that of F statistic over different classification methods applied to publicly available microarray datasets.
    Journal of Biomedicine and Biotechnology 07/2005; 2005(2):132-8. · 2.44 Impact Factor

Full-text (2 Sources)

View
2 Downloads
Available from
4 Feb 2013

Keywords

classification microarray
 
Classification microarrays
 
classify different groups
 
comma-delimited text
 
conventional microarrays
 
design expression arrays
 
different sets
 
genetic relationships
 
heat map
 
Java-based software tool
 
mixed microarrays
 
mixed microarrays-and mixed-plasmid microarrays
 
mixed-plasmid microarray data
 
model-based genetic algorithm
 
new software tool
 
probe selection
 
stepwise discriminant analysis
 
successful mixed microarrays
 
virtual microarrays
 
virtual mixed-genome microarray