-
[show abstract]
[hide abstract]
ABSTRACT: Meiotic recombination causes a shuffling of homologous chromosomes as they are passed from parents to children. Finding the genomic locations where these crossovers occur is important for genetic association studies, understanding population genetic variation, and predicting disease-causing structural rearrangements. There have been several reports that recombination hotspot usage differs between human populations. But while fine-scale genetic maps exist for European and African populations, none have been constructed for Asians.
Here we present the first Asian genetic map with resolution high enough to reveal hotspot usage. We constructed this map by applying a hidden Markov model to genotype data for over 500,000 single nucleotide polymorphism markers from Korean and Mongolian pedigrees which include 980 meioses. We identified 32,922 crossovers with a precision rate of 99%, 97% sensitivity, and a median resolution of 105,949 bp. For direct comparison of genetic maps between ethnic groups, we also constructed a map for CEPH families using identical methods. We found high levels of concordance with known hotspots, with approximately 72% of recombination occurring in these regions. We investigated the hypothesized contribution of recombination problems to age-related aneuploidy. Our large sample size allowed us to detect a weak but significant negative effect of maternal age on recombination rate.
We have constructed the first fine-scale Asian genetic map. This fills an important gap in the understanding of recombination pattern variation and will be a valuable resource for future research in population genetics. Our map will improve the accuracy of linkage studies and inform the design of genome-wide association studies in the Asian population.
BMC Genetics 01/2013; 14:19. · 2.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The Total Integrated Archive of short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database stores and integrates human genome data generated from multiple technologies including next-generation sequencing and high-resolution comparative genomic hybridization array. The TIARA genome browser is a powerful tool for the analysis of personal genomic information by exploring genomic variants such as SNPs, indels and structural variants simultaneously. As of September 2012, the TIARA database provides raw data and variant information for 13 sequenced whole genomes, 16 sequenced transcriptomes and 33 high resolution array assays. Sequencing reads are available at a depth of ∼30× for whole genomes and 50× for transcriptomes. Information on genomic variants includes a total of ∼9.56 million SNPs, 23 025 of which are non-synonymous SNPs, and ∼1.19 million indels. In this update, by adding high coverage sequencing of additional human individuals, the TIARA genome database now provides an extensive record of rare variants in humans. Following TIARA's fundamentally integrative approach, new transcriptome sequencing data are matched with whole-genome sequencing data in the genome browser. Users can here observe, for example, the expression levels of human genes with allele-specific quantification. Improvements to the TIARA genome browser include the intuitive display of new complex and large-scale data sets.
Database The Journal of Biological Databases and Curation 01/2013; 2013:bat003. · 2.07 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The estimated glomerular filtration rate is a well-known measure of renal function and is widely used to follow the course of disease. Although there have been several investigations establishing the genetic background contributing to renal function, Asian populations have rarely been used in these genome-wide studies. Here, we aimed to find candidate genetic determinants of renal function in 1007 individuals from 73 extended families of Mongolian origin. Linkage analysis found two suggestive regions near 9q21 (logarithm of odds (LOD) 2.82) and 15q15 (LOD 2.70). The subsequent family-based association study found 2 and 10 significant single-nucleotide polymorphisms (SNPs) in each region, respectively. The strongest SNPs on chromosome 9 and 15 were rs17400257 and rs1153831 with P-values of 7.21 × 10(-9) and 2.47 × 10(-11), respectively. Genes located near these SNPs are considered candidates for determining renal function and include FRMD3, GATM, and SPATA5L1. Thus, we identified possible loci that determine renal function in an isolated Asian population. Consistent with previous reports, our study found genes linked and associated with renal function in other populations.Kidney International advance online publication, 19 December 2012; doi:10.1038/ki.2012.389.
Kidney International 12/2012; · 6.61 Impact Factor
-
Hansoo Park,
Seungbok Lee,
Hyun-Jin Kim, Young Seok Ju,
Jong-Yeon Shin,
Dongwan Hong,
Marcin von Grotthuss,
Dong-Sung Lee,
Changho Park,
Jennifer Hayeon Kim,
Boram Kim,
Yun Joo Yoo,
Sung-Il Cho,
Joohon Sung,
Charles Lee,
Jong-Il Kim,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: BACKGROUND: Musical abilities such as recognising music and singing performance serve as means for communication and are instruments in sexual selection. Specific regions of the brain have been found to be activated by musical stimuli, but these have rarely been extended to the discovery of genes and molecules associated with musical ability. METHODS: A total of 1008 individuals from 73 families were enrolled and a pitch-production accuracy test was applied to determine musical ability. To identify genetic loci and variants that contribute to musical ability, we conducted family-based linkage and association analyses, and incorporated the results with data from exome sequencing and array comparative genomic hybridisation analyses. RESULTS: We found significant evidence of linkage at 4q23 with the nearest marker D4S2986 (LOD=3.1), whose supporting interval overlaps a previous study in Finnish families, and identified an intergenic single nucleotide polymorphism (SNP) (rs1251078, p=8.4×10(-17)) near UGT8, a gene highly expressed in the central nervous system and known to act in brain organisation. In addition, a non-synonymous SNP in UGT8 was revealed to be highly associated with musical ability (rs4148254, p=8.0×10(-17)), and a 6.2 kb copy number loss near UGT8 showed a plausible association with musical ability (p=2.9×10(-6)). CONCLUSIONS: This study provides new insight into the genetics of musical ability, exemplifying a methodology to assign functional significance to synonymous and non-coding alleles by integrating multiple experimental methods.
Journal of Medical Genetics 11/2012; · 6.36 Impact Factor
-
Jeong-Sun Seo, Young Seok Ju,
Won-Chul Lee,
Jong-Yeon Shin,
June Koo Lee,
Thomas Bleazard,
Junho Lee,
Yoo Jin Jung,
Jung-Oh Kim,
Jung-Young Shin,
Saet-Byeol Yu,
Jihye Kim,
Eung-Ryoung Lee,
Chang-Hyun Kang,
In-Kyu Park,
Hwanseok Rhee,
Se-Hoon Lee,
Jong-Il Kim,
Jin-Hyoung Kang,
Young Tae Kim
[show abstract]
[hide abstract]
ABSTRACT: All cancers harbor molecular alterations in their genomes. The transcriptional consequences of these somatic mutations have not yet been comprehensively explored in lung cancer. Here we present the first large scale RNA sequencing study of lung adenocarcinoma, demonstrating its power to identify somatic point mutations as well as transcriptional variants such as gene fusions, alternative splicing events, and expression outliers. Our results reveal the genetic basis of 200 lung adenocarcinomas in Koreans including deep characterization of 87 surgical specimens by transcriptome sequencing. We identified driver somatic mutations in cancer genes including EGFR, KRAS, NRAS, BRAF, PIK3CA, MET, and CTNNB1. Candidates for novel driver mutations were also identified in genes newly implicated in lung adenocarcinoma such as LMTK2, ARID1A, NOTCH2, and SMARCA4. We found 45 fusion genes, eight of which were chimeric tyrosine kinases involving ALK, RET, ROS1, FGFR2, AXL, and PDGFRA. Among 17 recurrent alternative splicing events, we identified exon 14 skipping in the proto-oncogene MET as highly likely to be a cancer driver. The number of somatic mutations and expression outliers varied markedly between individual cancers and was strongly correlated with smoking history of patients. We identified genomic blocks within which gene expression levels were consistently increased or decreased that could be explained by copy number alterations in samples. We also found an association between lymph node metastasis and somatic mutations in TP53. These findings broaden our understanding of lung adenocarcinoma and may also lead to new diagnostic and therapeutic approaches.
Genome Research 09/2012; · 13.61 Impact Factor
-
Dongwan Hong,
Arang Rhie,
Sung-Soo Park,
Jongkeun Lee, Young Seok Ju,
Sujung Kim,
Saet-Byeol Yu,
Thomas Bleazard,
Hyun-Seok Park,
Hwanseok Rhee,
Hyonyong Chong,
Kap-Seok Yang,
Yeon-Su Lee,
In-Hoo Kim,
Jin Soo Lee,
Jong-Il Kim,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: FX is an RNA-Seq analysis tool, which runs in parallel on cloud computing infrastructure, for the estimation of gene expression levels and genomic variant calling. In the mapping of short RNA-Seq reads, FX uses a transcriptome-based reference primarily, generated from ~160 000 mRNA sequences from RefSeq, UCSC and Ensembl databases. This approach reduces the misalignment of reads originating from splicing junctions. Unmapped reads not aligned on known transcripts are then mapped on the human genome reference. FX allows analysis of RNA-Seq data on cloud computing infrastructures, supporting access through a user-friendly web interface. AVAILABILITY: FX is freely available on the web at (http://fx.gmi.ac.kr), and can be installed on local Hadoop clusters. Guidance for the installation and operation of FX can be found under the 'Documentation' menu on the website. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Bioinformatics 03/2012; 28(5):721-3. · 5.47 Impact Factor
-
Seung Hwan Paik,
Hyun-Jin Kim,
Ho-Young Son,
Seungbok Lee,
Sun-Wha Im, Young Seok Ju,
Je Ho Yeon,
Seong Jin Jo,
Hee Chul Eun,
Jeong-Sun Seo,
Oh Sang Kwon,
Jong-Il Kim
[show abstract]
[hide abstract]
ABSTRACT: To elucidate the genes responsible for constitutive human skin color, we measured the extent of skin pigmentation in the buttock, representative of lifelong non-sun-exposed skin, and conducted a gene mapping study on skin color in an isolated Mongolian population composed of 344 individuals from 59 families who lived in Dashbalbar, Mongolia. The heritability of constitutive skin color was 0.82, indicating significant genetic association on this trait. Through the linkage analysis using 1,039 short tandem repeat (STR) microsatellite markers, we identified a novel genomic region regulating constitutive skin color on 11q24.2 with an logarithm of odds (LOD) score of 3.39. In addition, we also found other candidate regions on 17q23.2, 6q25.1, and 13q33.2 (LOD ≥ 2). Family-based association tests on these regions with suggestive linkage peaks revealed ten and two significant single nucleotide polymorphisms (SNPs) on the linkage regions of chromosome 11 and 17, respectively. We were able to discover four possible candidate genes that would be implicated to regulate human skin color: ETS1, UBASH3B, ASAM, and CLTC.
Experimental and Molecular Medicine 12/2011; 44(3):241-9. · 2.48 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The identification of the molecular events that drive cancer transformation is essential to the development of targeted agents that improve the clinical outcome of lung cancer. Many studies have reported genomic driver mutations in non-small-cell lung cancers (NSCLCs) over the past decade; however, the molecular pathogenesis of >40% of NSCLCs is still unknown. To identify new molecular targets in NSCLCs, we performed the combined analysis of massively parallel whole-genome and transcriptome sequencing for cancer and paired normal tissue of a 33-yr-old lung adenocarcinoma patient, who is a never-smoker and has no familial cancer history. The cancer showed no known driver mutation in EGFR or KRAS and no EML4-ALK fusion. Here we report a novel fusion gene between KIF5B and the RET proto-oncogene caused by a pericentric inversion of 10p11.22-q11.21. This fusion gene overexpresses chimeric RET receptor tyrosine kinase, which could spontaneously induce cellular transformation. We identified the KIF5B-RET fusion in two more cases out of 20 primary lung adenocarcinomas in the replication study. Our data demonstrate that a subset of NSCLCs could be caused by a fusion of KIF5B and RET, and suggest the chimeric oncogene as a promising molecular target for the personalized diagnosis and treatment of lung cancer.
Genome Research 12/2011; 22(3):436-45. · 13.61 Impact Factor
-
Seung Hwan Paik,
Hyun-Jin Kim,
Seungbok Lee,
Sun-Wha Im, Young Seok Ju,
Je Ho Yeon,
Seong Jin Jo,
Hee Chul Eun,
Jeong-Sun Seo,
Jong-Il Kim,
Oh Sang Kwon
[show abstract]
[hide abstract]
ABSTRACT: Tanning ability is important, because it represents the ability of the skin to protect itself against ultraviolet (UV) radiation. Here, we sought to determine genetic regions associated with tanning ability. Skin pigmentation was measured at the outer forearm and buttock areas to represent facultative and constitutive skin color, respectively. In our study population consisting of isolated Mongolian subjects, with common histories of environmental UV exposure during their nomadic life, facultative skin color adjusted by constitutive skin color was used to indicate tanning ability. Through linkage analysis and family-based association tests of 345 Mongolian subjects, we identified 2 potential linkage regions regulating tanning ability on 5q35.3 and 12q13.2, having 6 and 7 significant single nucleotide polymorphisms (SNPs), respectively. Those significant SNPs were located in or adjacent to potential candidate genes related to tanning ability: GRM6, ATF1, WNT1, and SILV/Pmel17.
BMB reports 11/2011; 44(11):741-6. · 1.72 Impact Factor
-
Young Seok Ju,
Jong-Il Kim,
Sheehyun Kim,
Dongwan Hong,
Hansoo Park,
Jong-Yeon Shin,
Seungbok Lee,
Won-Chul Lee,
Sujung Kim,
Saet-Byeol Yu, [......],
Maryam Yavartanoo,
Hyunseok Peter Kang,
Omer Gokcumen,
Diddahally R Govindaraju,
Jung Hee Jung,
Hyonyong Chong,
Kap-Seok Yang,
Hyungtae Kim,
Charles Lee,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: Massively parallel sequencing technologies have identified a broad spectrum of human genome diversity. Here we deep sequenced and correlated 18 genomes and 17 transcriptomes of unrelated Korean individuals. This has allowed us to construct a genome-wide map of common and rare variants and also identify variants formed during DNA-RNA transcription. We identified 9.56 million genomic variants, 23.2% of which appear to be previously unidentified. From transcriptome sequencing, we discovered 4,414 transcripts not previously annotated. Finally, we revealed 1,809 sites of transcriptional base modification, where the transcriptional landscape is different from the corresponding genomic sequences, and 580 sites of allele-specific expression. Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome, and that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants.
Nature Genetics 07/2011; 43(8):745-52. · 35.53 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.
Nucleic Acids Research 11/2010; 38(20):e190. · 8.03 Impact Factor
-
Dongwan Hong,
Sung-Soo Park, Young Seok Ju,
Sheehyun Kim,
Jong-Yeon Shin,
Sujung Kim,
Saet-Byeol Yu,
Won-Chul Lee,
Seungbok Lee,
Hansoo Park,
Jong-Il Kim,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research.
Nucleic Acids Research 11/2010; 39(Database issue):D883-8. · 8.03 Impact Factor
-
Hansoo Park,
Jong-Il Kim, Young Seok Ju,
Omer Gokcumen,
Ryan E Mills,
Sheehyun Kim,
Seungbok Lee,
Dongwhan Suh,
Dongwan Hong,
Hyunseok Peter Kang, [......],
Hyeran Kim,
Song Ju Yang,
Kap-Seok Yang,
Hyungtae Kim,
Matthew E Hurles,
Stephen W Scherer,
Nigel P Carter,
Chris Tyler-Smith,
Charles Lee,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3x coverage) and two Asian genomes (AK1, with 27.8x coverage and AK2, with 32.0x coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.
Nature Genetics 04/2010; 42(5):400-5. · 35.53 Impact Factor
-
Jong-Il Kim, Young Seok Ju,
Hansoo Park,
Sheehyun Kim,
Seonwook Lee,
Jae-Hyuk Yi,
Joann Mudge,
Neil A Miller,
Dongwan Hong,
Callum J Bell, [......],
Gary P Schroth,
Thomas D Wu,
HyeRan Kim,
Kap-Seok Yang,
Woong-Yang Park,
Hyungtae Kim,
George M Church,
Charles Lee,
Stephen F Kingsmore,
Jeong-Sun Seo
[show abstract]
[hide abstract]
ABSTRACT: Recent advances in sequencing technologies have initiated an era of personal genome sequences. To date, human genome sequences have been reported for individuals with ancestry in three distinct geographical regions: a Yoruba African, two individuals of northwest European origin, and a person from China. Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1. The genome of AK1 was determined by an exacting, combined approach that included whole-genome shotgun sequencing (27.8x coverage), targeted bacterial artificial chromosome sequencing, and high-resolution comparative genomic hybridization using custom microarrays featuring more than 24 million probes. Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 non-synonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels). SNP and indel densities were strongly correlated genome-wide. Applying very conservative criteria yielded highly reliable copy number variants for clinical considerations. Potential medical phenotypes were annotated for non-synonymous SNPs, coding domain indels, and structural variants. The integration of several human whole-genome sequences derived from several ethnic groups will assist in understanding genetic ancestry, migration patterns and population bottlenecks.
Nature 08/2009; 460(7258):1011-5. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Genetic maps provide specific positions of genetic markers, which are required for performing genetic studies. Linkage analyses of Asian families have been performed with Caucasian genetic maps, since appropriate genetic maps of Asians were not available. Different ethnic groups may have different recombination rates as a result of genomic variations, which would generate misspecification of the genetic map and reduce the power of linkage analyses.
We constructed the genetic map of a Mongolian population in Asia with CRIMAP software. This new map, called the GENDISCAN map, is based on genotype data collected from 1026 individuals of 73 large Mongolian families, and includes 1790 total and 1500 observable meioses. The GENDISCAN map provides sex-averaged and sex-specific genetic positions of 1039 microsatellite markers in Kosambi centimorgans (cM) with physical positions. We also determined 95% confidence intervals of genetic distances of the adjacent marker intervals. Genetic lengths of the whole genome, chromosomes and adjacent marker intervals are compared with those of Rutgers Map v.2, which was constructed based on Caucasian populations (Centre d'Etudes du Polymorphisme Humain (CEPH) and Icelandic families) by mapping methods identical to those of the GENDISCAN map, CRIMAP software and the Kosambi map function. Mongolians showed approximately 1.9 fewer recombinations per meiosis than Caucasians. As a result, genetic lengths of the whole genome and chromosomes of the GENDISCAN map are shorter than those of Rutgers Map v.2. Thirty-eight marker intervals differed significantly between the Mongolian and Caucasian genetic maps.
The new GENDISCAN map is applicable to the genetic study of Asian populations. Differences in the genetic distances between the GENDISCAN and Caucasian maps could facilitate elucidation of genomic variations between different ethnic groups.
BMC Genomics 12/2008; 9:554. · 4.07 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Elevated heart rate has been proposed as an independent risk factor for cardiovascular diseases, but their interrelationships are not well understood. In this study, we performed a genome-wide linkage scan in 1,026 individuals (mean age 30.6 years, 54.5% women) from 73 extended families of Mongolia and determined quantitative trait loci that influence heart rate. The DNA samples were genotyped using deCODE 1,039 microsatellite markers for 3 cM density genome-wide linkage scan. Correlation analysis was carried out to evaluate the correlation of the covariates and the heart rate. T-tests of the heart rate were also performed on sex, smoking and alcohol intake. Consequently, this model was used in a nonparametric genome-wide linkage analysis using variance component model to create a multipoint logarithm of odds (LOD) score and a corresponding P value. In the adjusted model, the heritability of heart rate was estimated as 0.32 (P<.0001) and a maximum multipoint LOD score of 2.03 was observed in 77 cM region at chromosome 18. The second largest LOD score of 1.52 was seen on chromosome 5 at 216 cM. Genes located on the specified locations in chromosomes 5 and 18 may be involved in the regulation of heart rate.
Experimental and Molecular Medicine 11/2008; 40(5):558-64. · 2.48 Impact Factor