Article

PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data

Information and Mathematical Sciences Group, Genome Institute of Singapore, 60 Biopolis Street, Genome #02-01, 138672, Singapore.
BMC Bioinformatics (Impact Factor: 2.67). 02/2006; 7(1):390. DOI: 10.1186/1471-2105-7-390
Source: PubMed

ABSTRACT We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable.
We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping.
The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.

Download full-text

Full-text

Available from: Kuo Ping Chiu, Oct 09, 2014
0 Followers
 · 
149 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in high-throughput technologies, such as ChIP-chip and ChIP-PET (Chromatin Immuno-Precipitation Paired-End diTag), and the availability of human and mouse genome sequences now allow us to identify transcription factor binding sites (TFBS) and analyze mechanisms of gene regulation on the level of the entire genome. Here, we have developed a computational approach which uses ChIP-PET data and statistical modeling to assess experimental noise and identify reliable TFBS for c-Myc, STAT1 and p53 transcription factors in the human genome. We propose a mixture probabilistic model and develop computational programs for Monte Carlo simulation of ChIP-PET data to define the background noise of the sequence clustering and to identify the probability function of specific DNA-protein binding in the eukaryotic genome. Our approach demonstrates high reproducibility of the method and not only distinguishes bona fide TFBSs from non-specific TFBSs with a high specificity, but also provides algorithmic and computational basis for further optimization of experimental parameters of the ChIP-PET method.
    Genome informatics. International Conference on Genome Informatics 02/2007; 19:83-94. DOI:10.1142/9781860949852_0008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Using a chromatin immunoprecipitation-paired end diTag cloning and sequencing strategy, we mapped estrogen receptor a (ERa) binding sites in MCF-7 breast cancer cells. We identified 1,234 high confidence binding clusters of which 94% are projected to be bona fide ERa binding regions. Only 5% of the mapped estrogen receptor binding sites are located within 5 kb upstream of the transcriptional start sites of adjacent genes, regions containing the proximal promoters, whereas vast majority of the sites are mapped to intronic or distal locations (.5 kb from 59 and 39 ends of adjacent transcript), suggesting transcriptional regulatory mechanisms over significant physical distances. Of all the identified sites, 71% harbored putative full estrogen response elements (EREs), 25% bore ERE half sites, and only 4% had no recognizable ERE sequences. Genes in the vicinity of ERa binding sites were enriched for regulation by estradiol in MCF-7 cells, and their expression profiles in patient samples segregate ERa-positive from ERa-negative breast tumors. The expression dynamics of the genes adjacent to ERa binding sites suggest a direct induction of gene expression through binding to ERE-like sequences, whereas transcriptional repression by ERa appears to be through indirect mechanisms. Our analysis also indicates a number of candidate transcription factor binding sites adjacent to occupied EREs at frequencies much greater than by chance, including the previously reported FOXA1 sites, and demonstrate the potential involvement of one such putative adjacent factor, Sp1, in the global regulation of ERa target genes. Unexpectedly, we found that only 22%-24% of the bona fide human ERa binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites. Taken together, this genome-scale analysis suggests complex but definable rules governing ERa binding and gene regulation.
    PLoS Genetics 01/2005; DOI:10.1371/journal.pgen.0030087.eor · 8.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Melanoma is the major cause of skin cancer deaths and melanoma incidence doubles every 10 to 20 years. However, little is known about melanoma pathway aberrations. Here we applied the robust Gene Identification Signature Paired End diTag (GIS-PET) approach to investigate the melanoma transcriptome and characterize the global pathway aberrations. GIS-PET technology directly links 5' mRNA signatures with their corresponding 3' signatures to generate, and then concatenate, PETs for efficient sequencing. We annotated PETs to pathways of KEGG database and compared the murine B16F1 melanoma transcriptome with three non-melanoma murine transcriptomes (Melan-a2 melanocytes, E14 embryonic stem cells, and E17.5 embryo). Gene expression levels as represented by PET counts were compared across melanoma and melanocyte libraries to identify the most significantly altered pathways and investigate the expression levels of crucial cancer genes. Melanin biosynthesis genes were solely expressed in the cells of melanocytic origin, indicating the feasibility of using the PET approach for transcriptome comparison. The most significantly altered pathways were metabolic pathways, including upregulated pathways: purine metabolism, aminophosphonate metabolism, tyrosine metabolism, selenoamino acid metabolism, galactose utilization, nitrobenzene degradation, and bisphenol A degradation; and downregulated pathways: oxidative phosphorylation, ATPase synthesis, TCA cycle, pyruvate metabolism, and glutathione metabolism. The downregulated pathways concurrently indicated a slowdown of mitochondrial activities. Mitochondrial permeability was also significantly altered, as indicated by transcriptional activation of ATP/ADP, citrate/malate, Mg++, fatty acid and amino acid transporters, and transcriptional repression of zinc and metal ion transporters. Upregulation of cell cycle progression, MAPK, and PI3K/Akt pathways were more limited to certain region(s) of the pathway. Expression levels of c-Myc and Trp53 were also higher in melanoma. Moreover, transcriptional variants resulted from alternative transcription start sites or alternative polyadenylation sites were found in Ras and genes encoding adhesion or cytoskeleton proteins such as integrin, beta-catenin, alpha-catenin, and actin. The highly correlated results unmistakably point to a systematic downregulation of mitochondrial activities, which we hypothesize aims to downgrade the mitochondria-mediated apoptosis and the dependency of cancer cells on angiogenesis. Our results also demonstrate the advantage of using the PET approach in conjunction with KEGG database for systematic pathway analysis.
    BMC Cancer 02/2007; 7:109. DOI:10.1186/1471-2407-7-109 · 3.32 Impact Factor