[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Structural variations (SVs) in genomes are commonly observed even in healthy individuals and play key roles in biological functions. To understand their functional impact or to infer molecular mechanisms of SVs, they have to be characterized with the maximum resolution. However, high-resolution analysis is a difficult task because it requires investigation of the complex structures involved in an enormous number of alignments of next-generation sequencing (NGS) reads and genome sequences that contain errors. RESULTS: We propose a new method called ChopSticks that improves the resolution of SV detection for homozygous deletions even when the depth of coverage is low. Conventional methods based on read pairs use only discordant pairs to localize the positions of deletions, where a discordant pair is a read pair whose alignment has an aberrant strand or distance. In contrast, our method exploits concordant reads as well. We theoretically proved that when the depth of coverage approaches zero or infinity, the expected resolution of our method is asymptotically equal to that of methods based only on discordant pairs under double coverage. To confirm the effectiveness of ChopSticks, we conducted computational experiments against both simulated NGS reads and real NGS sequences. The resolution of deletion calls by other methods was significantly improved, thus demonstrating the usefulness of ChopSticks. CONCLUSIONS: ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer SV generation mechanisms.
[Show abstract][Hide abstract] ABSTRACT: Structural variations (SVs) change the structure of the genome and are therefore the causes of various diseases. Next-generation sequencing allows us to obtain a multitude of sequence data, some of which can be used to infer the position of SVs.
We developed a new method and implementation named ClipCrop for detecting SVs with single-base resolution using soft-clipping information. A soft-clipped sequence is an unmatched fragment in a partially mapped read. To assess the performance of ClipCrop with other SV-detecting tools, we generated various patterns of simulation data - SV lengths, read lengths, and the depth of coverage of short reads - with insertions, deletions, tandem duplications, inversions and single nucleotide alterations in a human chromosome. For comparison, we selected BreakDancer, CNVnator and Pindel, each of which adopts a different approach to detect SVs, e.g. discordant pair approach, depth of coverage approach and split read approach, respectively.
Our method outperformed BreakDancer and CNVnator in both discovering rate and call accuracy in any type of SV. Pindel offered a similar performance as our method, but our method crucially outperformed for detecting small duplications. From our experiments, ClipCrop infer reliable SVs for the data set with more than 50 bases read lengths and 20x depth of coverage, both of which are reasonable values in current NGS data set.
ClipCrop can detect SVs with higher discovering rate and call accuracy than any other tool in our simulation data set.
[Show abstract][Hide abstract] ABSTRACT: This study aims at automatic construction of a cell lineage from 4D(multi-focal, time-lapse) images, which are taken using a Nomarski DIC (di#erential-interference contrast) microscope. A system with such abilities would be a powerful tool for studying embryo genesis and gene function based on mutants, whose cell lineage may di#er from that of wild types. We have designed and implemented a system for this purpose, and examined its ability through computational experiments.
[Show abstract][Hide abstract] ABSTRACT: This study aims at automatic construction of a cell lineage from 4D (multi-focal, time-lapse) images, which are taken using a Nomarski DIC (differential-interference contrast) microscope. A system with such abilities would be a powerful tool for studying embryo genesis and gene function based on mutants, whose cell lineage may differ from that of wild types. We have designed and implemented a system for this purpose, and examined its ability through computational experiments. The procedure of our system consists of two parts: (1) Image processing which detect the positions of the nuclei from each 2D microscope image, and (2) Constructing a hypothetical cell lineage based on the information obtained in (1). We have also developed a tool which allows a human expert to easily filter out erroneous nuclei candidates generated in (1). We present computational results and also discuss other ideas which may improve the performance of our system.
Genome informatics. International Conference on Genome Informatics 02/1999; 10:144-154.
[Show abstract][Hide abstract] ABSTRACT: Introduction BONSAI is a machine learning system for knowledge acquisition from positive and negative examples of strings. It is reported that the system has discovered knowledge which can classify amino acid sequences of transmembrane domains and randomly chosen amino acid sequences located in other parts of the PIR database, with over 90% accuracy . A hypothesis generated by the system is a pair of a classification of symbols called an alphabet indexing, and a decision tree over regular patterns, which classifies given examples with high accuracy. The whole algorithm of the system consists of two parts: a learning algorithm for constructing a decision tree over regular patterns, and a searching algorithm for finding an alphabet indexing to produce a better decision tree. Through providing a service of BONSAI system, which is available at our web site http://bonsai. ims. u-tokyo. ac. jp/bonsai/, we have found problems concerned with the system. One of the problem is that for the