[Show abstract][Hide abstract] ABSTRACT: We performed a genomewide scan in six multiplex families with familial idiopathic pulmonary fibrosis (IPF) who originated from southeastern Finland. The majority of the Finnish multiplex families were clustered in the region, and the population history suggested that the clustering might be explained by an ancestor shared among the patients. The genomewide scan identified five loci of interest. The hierarchical fine mapping in an extended data set with 24 families originating from the same geographic region revealed a shared 110 kb to 13 Mb haplotype on chromosome 4q31, which was significantly more frequent among the patients than in population-based controls (odds ratio 6.3; 95% CI 2.5-15.9; P = .0001). The shared haplotype harbored two functionally uncharacterized genes, ELMOD2 and LOC152586, of which only ELMOD2 was expressed in lung and showed significantly decreased messenger-RNA expression in IPF lung (n = 6) when compared with that of healthy lung (n = 7; P = .05). Our results suggest ELMOD2 as a novel candidate gene for susceptibility in familial IPF.
The American Journal of Human Genetics 08/2006; 79(1):149-54. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe TreeDT, a novel association-based gene mapping method. Given a set of disease-associated haplotypes and a set of control haplotypes, TreeDT predicts likely locations of a disease susceptibility gene. TreeDT extracts, essentially in the form of haplotype trees, information about historical recombinations in the population: A haplotype tree constructed at a given chromosomal location is an estimate of the genealogy of the haplotypes. TreeDT constructs these trees for all locations on the given haplotypes and performs a novel disequilibrium test on each tree: Is there a small set of subtrees with relatively high proportions of disease-associated chromosomes, suggesting shared genetic history for those and a likely disease gene location? We give a detailed description of TreeDT and the tree disequilibrium tests, we analyze the algorithm formally, and we evaluate its performance experimentally on both simulated and real data sets. Experimental results demonstrate that TreeDT has high accuracy on difficult mapping tasks and comparisons to other methods (EATDT, HPM, TDT) show that TreeDT is very competitive.
IEEE/ACM Transactions on Computational Biology and Bioinformatics 01/2006; 3(2):174-85. · 1.62 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Bronchopulmonary dysplasia (BPD), the most common chronic lung disease in infancy, is influenced by a number of antenatal and postnatal risk factors and is mostly preceded by respiratory distress syndrome (RDS) in the newborn. Surfactant protein (SP-A, -B, -C and -D) gene variations may play a role in both BPD and RDS. An association study between these candidate genes and BPD was performed. A total of 365 preterm Finnish infants in a high-risk population with gestational age <or=32 weeks were genotyped for all SP genes. A multiparameter analysis was performed using Agrawal's algorithm based data mining and conventional methods of statistical allelic association. In singletons and presenting multiples, the frequency of SP-B intron 4 deletion variant allele was increased in BPD versus controls (P=0.008, OR=2.0, 95%CI 1.2-3.4). The presence of the SP-B intron 4 deletion variant was a risk factor for BPD even when essential external confounding factors were included in the analyses. No other SP polymorphisms associated with BPD, and the SP-B intron 4 variation did not associate with RDS. Transcription Element Search Software predicted allele-specific differences at several putative transcription factor binding sites that may be important in SP-B regulation. The present multiparameter analysis demonstrates the presumable direct involvement of the SP-B intron 4 deletion variant allele as a genetic risk factor to BPD. We propose that two separate SP-B gene polymorphisms have a phenotypic significance via separate molecular mechanisms: the intron 4 length variation affecting transcriptional regulation, and the exonic Ile131Thr variation affecting post-translationally.
Human Molecular Genetics 07/2004; 13(11):1095-104. · 7.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Functional polymorphisms in the genes encoding superoxide dismutases (SOD)-that is, superoxide scavenging antioxidant enzymes-may play an important role in the development of inflammatory airway diseases such as asthma.
The allele frequencies of two missense polymorphisms of SOD genes (Ala16Val in MnSOD (SOD2) and Arg213Gly in ECSOD (SOD3)) were investigated in Finnish patients with asthma and compared with family based controls. Both variants have been shown to be functionally interesting in the lung. The polymorphism at the exon-intron 3 boundary of a third SOD, CuZnSOD (SOD1), was also included in the analysis.
None of the SOD genetic variants studied appeared to be major genetic regulators in the development of asthma. We could exclude all models of inheritance that increased the risk of asthma more than 1.2 fold for MnSOD*Val (frequency of allele 0.74 in the population) and more than 6.6 fold for ECSOD*Gly213 (frequency of allele 0.03 in the population) compared with non-carriers. For the intronic polymorphism in CuZnSOD, a relative risk of more than 3.3 (frequency of allele 0.10 in the population) could be excluded.
It is highly unlikely that the functionally important genetic variants Ala16Val and Arg213Gly of SODs play a major role in the genetic susceptibility of asthma.
[Show abstract][Hide abstract] ABSTRACT: Epidemiological and genetic linkage studies have indicated a strong genetic basis for development of inflammatory bowel disease (IBD) which was recently supported by discovery of the Crohn's disease (CD) susceptibility gene termed NOD2/CARD15. We carried out a genome-wide linkage study in Finnish IBD families, providing a particular advantage to map susceptibility genes for ulcerative colitis (UC) within a genetic isolate. Initially, 92 IBD families with 138 affected sib-pairs (ASPs), were genotyped for 429 markers spaced at approximately 10 cM intervals. Next, the loci on chromosomes 2p13-11, 11p12-q13, and 12p13-12 were high-density mapped in the extended family cohort of 130 families with 173 ASPs. In this study, the most significant lod scores were observed for the UC families on chromosome 2p11 (D2S2333), in the vicinity of the REG gene cluster which is strikingly overexpressed in the IBD mucosa. The maximum two-point lod score was 3.34 (dominant model), and the corresponding NPL score 2.61. For UC, the second highest two-point NPL score of 2.00 was observed at proximal 12p13, where also some evidence for linkage disequilibrium emerged (P=0.07 and P=0.007 for the basic and extended IBD cohorts, respectively). The highest two-point NPL score for the CD families was 2.34 at D12S78 (12q23) with significant evidence for linkage disequilibrium (P=0.004), and for the mixed (MX) families 2.07 at D4S406 near the linkage peak reported previously. This study confirmed several of the IBD loci that have previously been reported and gives evidence for new IBD loci on chromosomes 2p11, 11p12-q13, 12p13-12, 12q23, and 19q13.
European Journal of HumanGenetics 03/2003; 11(2):112-20. · 4.32 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Preeclampsia is a common, pregnancy-specific disorder characterized by reduced placental perfusion, endothelial dysfunction, elevated blood pressure, and proteinuria. The pathogenesis of this heterogeneous disorder is incompletely understood, but it has a familial component, which suggests that one or more common alleles may act as susceptibility genes. We hypothesized that, in a founder population, the genetic background of preeclampsia might also show reduced heterogeneity, and we have performed a genomewide scan in 15 multiplex families recruited predominantly in the Kainuu province in central eastern Finland. We found two loci that exceeded the threshold for significant linkage: chromosome 2p25, near marker D2S168 (nonparametric linkage [NPL] score 3.77; P=.000761) at 21.70 cM, and 9p13, near marker D9S169 (NPL score 3.74; P=.000821) at 38.90 cM. In addition, there was a locus showing suggestive linkage at chromosome 4q32 between D4S413 and D4S3046 (NPL score 3.13; P=.003238) at 163.00 cM. In the present study the susceptibility locus on chromosome 2p25 is clearly different (21.70 cM) from the locus at 2p12 found in an Icelandic study (94.05 cM) and the locus at 2q23 (144.7 cM) found in an Australian/New Zealand study. The locus at 9p13 has been shown to be a candidate region for type 2 diabetes in two recently published genomewide scans from Finland and China. The regions on chromosomes 2p25 and 9p13 may harbor susceptibility genes for preeclampsia.
The American Journal of Human Genetics 02/2003; 72(1):168-77. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Previously, we have presented a data mining-based algorithmic approach to genetic association analysis, Haplotype Pattern Mining. We have now extended the approach with the possibility of analysing quantitative traits and utilising covariates. This is accomplished by using a linear model for measuring association. We present results with the extended version, QHPM, with simulated quantitative trait data. One data set was simulated with the population simulator package Populus, and another was obtained from GAW12. In the former, there were 2-3 underlying susceptibility genes for a trait, each with several ancestral disease mutations, and 1 or 2 environmental components. We show that QHPM is capable of finding the susceptibility loci, even when there is strong allelic heterogeneity and environmental effects in the disease models. The power of finding quantitative trait loci is dependent on the ascertainment scheme of the data: collecting the study subjects from both ends of the quantitative trait distribution is more effective than using unselected individuals or individuals ascertained based on disease status, but QHPM has good power to localize the genes even with unselected individuals. Comparison with quantitative trait TDT (QTDT) showed that QHPM has better localization accuracy when the gene effect is weak.
Annals of Human Genetics 12/2002; 66(Pt 5-6):419-29. · 2.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Coeliac disease is a common multifactorial disease with a strong genetic component, which is not entirely explained by the HLA association. Four previous whole-genome screens have produced somewhat inconsistent results suggesting genetic heterogeneity. We attempted to overcome this problem by performing a genome-wide scan in a Finnish sub-population, expected to be more homogeneous than the general population of Finland. The families in our study originate from the northeastern part of Finland, the Koilliskaira region, which has been relatively isolated since its founding in the 16th century. Genealogical studies have confirmed that the families share a common ancestor in the 16th century. Nine families with altogether 23 patients were genotyped for 399 microsatellite markers and the data were analysed with parametric linkage analysis using two dominant and one recessive model. A region on chromosome 15q11-q13 was implicated with a LOD score of 3.14 using a highly penetrant dominant model. Addition of more markers and one more sib-pair increased the LOD score to 3.74. This result gives preliminary evidence for existence of a susceptibility factor in this chromosomal region.
Human Genetics 08/2002; 111(1):40-5. · 4.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Epidemiological studies and case reports suggest that familial clustering of gliomas may occur in families that do not fit any known tumor syndromes. In the present study, 15 familial glioma pedigrees from a limited geographical area were hypothesized to carry the same low-penetrance susceptibility allele. We used a two-stage strategy for disease gene mapping. A genome scan in four glioma families revealed four interesting loci at chromosome arms 1q, 6q, 8p, and 15q. Additional markers in these regions provided evidence of significant linkage to 15q23-q26.3 with a maximum nonparametric linkage score of 3.35 with marker D15S130. Investigation of all 15 glioma families by association analysis (haplotype pattern mining) and through use of the transmission/disequilibrium test gave further evidence of significant association/transmission distortion at the same 15q locus (P = 0.02 and P = 0.03, respectively). No evidence of involvement of known tumor syndromes was obtained from the data provided by the linkage analysis or hospital records. Thus, the first genome-wide linkage analysis of familial glioma suggests a novel susceptibility locus at 15q23-q26.3.
Cancer Research 08/2002; 62(13):3798-802. · 8.65 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: this paper a lot. This research has been funded by the Academy of Finland, Graduate School in Computational Biology, Bioinformatics, and Biometry (ComBi), and Helsinki Graduate School in Computer Science and Engineering (HeCSE)
[Show abstract][Hide abstract] ABSTRACT: We introduce and evaluate TreeDT, a novel gene mapping method which is based on discovering and assessing tree-like patterns in genetic marker data. Gene mapping aims at discovering a statistical connection from a particular disease or trait to a narrow region in the genome. In a typical case-control setting, data consists of genetic markers typed for a set of disease-associated chromosomes and a set of control chromosomes. A computer scientist would view this data as a set of strings.
[Show abstract][Hide abstract] ABSTRACT: We describe a new method for linkage disequilibrium mapping, Haplotype Pattern Mining (HPM). The method is based on discovering recurrent patterns, inspired by data mining methods. We define a class of useful haplotype patterns in genetic case-control data, and give an algorithm for finding disease-associated haplotypes. The haplotypes are ordered by their strength of association to the phenotype, and all haplotypes exceeding a given threshold level are used for prediction of disease susceptibility gene location. The method is model-free, in the sense that is does not require, nor is it able to utilize, any assumptions about the inheritance model of the disease. The statistical model is non-parametric. The haplotypes are allowed to contain gaps, which improves the robustness to mutations and to missing and erroneous data.
[Show abstract][Hide abstract] ABSTRACT: The Van der Woude syndrome (VWS) is a dominantly inherited developmental disorder characterized by pits and/or sinuses of the lower lip, cleft lip and/or cleft palate. It is the most common cleft syndrome. VWS has shown remarkable genetic homogeneity in all populations, and so far, all families reported have been linked to 1q32-q41. A large Finnish pedigree with VWS was recently found to be unlinked to 1q32-q41. In order to map the disease locus in this family, a genome wide linkage scan was performed. A maximum lod score of 3.18 was obtained with the marker D1S2797, thus assigning the disease locus to chromosomal region 1p34. By analyses of meiotic recombinants an approximately 30 cM region of shared haplotypes was identified. The results confirm the heterogeneity of the VWS syndrome, and they place the second disease locus in 1p34. This finding has a special interest because the phenotype in VWS closely resembles the phenotype in non-syndromic forms of cleft lip and palate.
European Journal of HumanGenetics 11/2001; 9(10):747-52. · 4.32 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Developmental dyslexia is a neurofunctional disorder characterised by an unexpected difficulty in learning to read and write despite adequate intelligence, motivation, and education. Previous studies have suggested mostly quantitative susceptibility loci for dyslexia on chromosomes 1, 2, 6, and 15, but no genes have been identified yet. We studied a large pedigree, ascertained from 140 families considered, segregating pronounced dyslexia in an autosomal dominant fashion. Affected status and the subtype of dyslexia were determined by neuropsychological tests. A genome scan with 320 markers showed a novel dominant locus linked to dyslexia in the pericentromeric region of chromosome 3 with a multipoint lod score of 3.84. Nineteen out of 21 affected pedigree members shared this region identical by descent (corrected p<0.001). Previously implicated genomic regions showed no evidence for linkage. Sequencing of two positional candidate genes, 5HT1F and DRD3, did not support their role in dyslexia. The new locus on chromosome 3 is associated with deficits in all three essential components involved in the reading process, namely phonological awareness, rapid naming, and verbal short term memory.
Journal of Medical Genetics 10/2001; 38(10):658-64. · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: To date, two major familial breast cancer predisposition genes, BRCA1 and BRCA2, have been identified with hundreds of germ-line mutations, accounting for 5--10% of all breast cancer and 40--60% of all inherited breast cancer. Unexpectedly elevated incidence of breast cancer, especially in the older age classes, was observed in a Western Finnish region representing a relatively homogeneous population. This study was designed to test the hypothesis that there are inherited BRCA1 or BRCA2 mutations, which confer variable and/or age-dependent penetrance on carriers. Expecting a founder effect, we searched for geographical clustering of breast cancer cases and searched for associations between the affected phenotype and shared genomic segments in the BRCA1 and BRCA2 genomic regions. Our haplotype association study did not reveal any founder effects for either BRCA1 or BRCA2. However, there were two mutations prevalent in this geographical area with minor founder effects, BRCA2 T8555G and 999del5. This is one of the few geographically ascertained, population-based studies that indicate an overall frequency of BRCA1 and BRCA2 mutations at about 2--3% in all breast cancer cases. The geographical clustering of breast cancer cases was not explained by BRCA1 or BRCA2 genes.
[Show abstract][Hide abstract] ABSTRACT: We used Haplotype Pattern Mining, HPM [Toivonen et al., Am J Hum Genet 67:133-45, 2000], for gene localization in Genetic Analysis Workshop (GAW) 12 isolate data. In HPM, association is analyzed by searching all trait-associated haplotype patterns. Data mining algorithms are utilized to make the search efficient. The strength of the haplotype-trait associations is measured by a linear model, into which a pre-seelected set of covariates is incorporated. Marker-wise patterns of association are used for predicting the disease gene location. Genome-wide scans of susceptibility genes for affection status as well as for the quantitative traits (Q1-Q5) were performed. First analyses were made with small sample sizes, 63-94 trios per trait, which is compared with a pilot study of a larger complex disease-mapping project. Subsequently, the analysis was repeated with approximately 600 cases and 600 controls per trait to give higher power to the analyses. With small sample sizes, only the susceptibility genes having the strongest effects on the traits could be localized. The larger sample size gave very good results: all susceptibility genes, except one, could be correctly localized. First experiments on candidate genes suggested that HPM is applicable even to fine mapping of mutations in DNA sequence.