[Show abstract][Hide abstract] ABSTRACT: Antarctic krill (Euphausia superba) is a key component of the Southern Ocean food web. It supports a large number of upper trophic-level predators, and is also a major fishery resource. Understanding changes in krill abundance has long been a priority for research and conservation in the Southern Ocean. In this study, we performed stable isotope analyses on ancient Adélie penguin tissues and inferred relative krill abundance during the Holocene epoch from paleodiets of Adélie penguin (Pygoscelis adeliae), using inverse of δ(15)N (ratio of (15)N/(14)N) value as a proxy. We find that variations in krill abundance during the Holocene are in accord with episodes of regional climate changes, showing greater krill abundance in cold periods. Moreover, the low δ(15)N values found in modern Adélie penguins indicate relatively high krill availability, which supports the hypothesis of krill surplus in modern ages due to recent hunt for krill-eating seals and whales by humans.
[Show abstract][Hide abstract] ABSTRACT: Sources and sinks of methane, one of the most important greenhouse gases, have attracted intensive attention due to its role in global warming. We show that sea ice in the Arctic Ocean regulates methane level through two mechanisms, shielding of methane emission from the ocean, and consumption of methane. Using a static chamber technique, we estimated that the methane flux from under-ice water was 0.56 mg(CH4) m−2 d−1 on average in central Arctic Ocean, relatively higher than that in other oceans, indicating considerable methane storage in this region under sea ice. Average methane flux on under-ice water was higher than that above sea ice, which suggests that sea ice could limit methane emission. In addition, negative fluxes on sea ice suggest that there are methane consuming processes, which are possibly associated with both photochemical and biochemical oxidation. Our results provide a general understanding about how sea ice in Arctic affects regional and global methane balance.
[Show abstract][Hide abstract] ABSTRACT: Genomewide marker information can improve the reliability of breeding value predictions for young selection candidates in genomic selection. However, the cost of genotyping limits its use to elite animals, and how such selective genotyping affects predictive ability of genomic selection models is an open question. We performed a simulation study to evaluate the quality of breeding value predictions for selection candidates based on different selective genotyping strategies in a population undergoing selection. The genome consisted of 10 chromosomes of 100 cM each. After 5,000 generations of random mating with a population size of 100 (50 males and 50 females), generation G(0) (reference population) was produced via a full factorial mating between the 50 males and 50 females from generation 5,000. Different levels of selection intensities (animals with the largest yield deviation value) in G(0) or random sampling (no selection) were used to produce offspring of G(0) generation (G(1)). Five genotyping strategies were used to choose 500 animals in G(0) to be genotyped: 1) Random: randomly selected animals, 2) Top: animals with largest yield deviation values, 3) Bottom: animals with lowest yield deviations values, 4) Extreme: animals with the 250 largest and the 250 lowest yield deviations values, and 5) Less Related: less genetically related animals. The number of individuals in G(0) and G(1) was fixed at 2,500 each, and different levels of heritability were considered (0.10, 0.25, and 0.50). Additionally, all 5 selective genotyping strategies (Random, Top, Bottom, Extreme, and Less Related) were applied to an indicator trait in generation G(0,) and the results were evaluated for the target trait in generation G(1), with the genetic correlation between the 2 traits set to 0.50. The 5 genotyping strategies applied to individuals in G(0) (reference population) were compared in terms of their ability to predict the genetic values of the animals in G(1) (selection candidates). Lower correlations between genomic-based estimates of breeding values (GEBV) and true breeding values (TBV) were obtained when using the Bottom strategy. For Random, Extreme, and Less Related strategies, the correlation between GEBV and TBV became slightly larger as selection intensity decreased and was largest when no selection occurred. These 3 strategies were better than the Top approach. In addition, the Extreme, Random, and Less Related strategies had smaller predictive mean squared errors (PMSE) followed by the Top and Bottom methods. Overall, the Extreme genotyping strategy led to the best predictive ability of breeding values, indicating that animals with extreme yield deviations values in a reference population are the most informative when training genomic selection models.
[Show abstract][Hide abstract] ABSTRACT: The distribution of antimony (Sb) in topsoil and moss (Dicranum angustum) in disturbed and undisturbed areas, as well as coal and gangue, in Ny-Ålesund, Arctic was examined. Results show that the weathering of coal bed could not contribute to the increase of Sb concentrations in topsoil and moss in the study area. The distribution of Sb is partially associated with traffic and historical mining activities. The occurrence of the maximum Sb concentration is due to the contribution of human activities. In addition, the decrease of Sb content in topsoil near the coastline may be caused by the washing of seawater. Compared with topsoils, moss could be a useful tool for monitoring Sb in both highly and lightly polluted areas.
[Show abstract][Hide abstract] ABSTRACT: It has become increasingly clear that the mammalian genomes produce many long non-coding RNAs (lncRNAs). Accumulating evidence suggests important functions for lncRNAs in a variety of biological processes. However, little is known about lncRNA identity and characteristics in cattle. Using public bovine-specific expressed sequence tags sequences, we reconstructed transcript assemblies, from which reference sequences were obtained for RNAs. Intergenic regions with evidence of transcription were screened for putative lncRNAs using the combination of a gene-finding program and a support vector machine-based tool for the calculation of protein-coding potential. A total of 449 putative lncRNAs located in 405 intergenic regions were identified. Characterization of these putative bovine lncRNAs suggests that they are generally expressed in a tissue-specific manner, their GC contents are higher than randomly selected intergenic sequences but are lower than protein-coding genes, and they are moderately conserved among mammals. This is the first genome-wide catalogue of putative intergenic lncRNAs in cattle and provides important targets for functional studies.
[Show abstract][Hide abstract] ABSTRACT: Genome-assisted prediction of genetic merit of individuals for a quantitative trait requires building statistical models that can handle data sets consisting of a massive number of markers and many fewer observations. Numerous regression models have been proposed in which marker effects are treated as random variables. Alternatively, multivariate dimension reduction techniques [such as principal component regression (PCR) and partial least-squares regression (PLS)] model a small number of latent components which are linear combinations of original variables, thereby reducing dimensionality. Further, marker selection has drawn increasing attention in genomic selection. This study evaluated two dimension reduction methods, namely, supervised PCR and sparse PLS, for predicting genomic breeding values (BV) of dairy bulls for milk yield using single-nucleotide polymorphisms (SNPs). These two methods perform variable selection in addition to reducing dimensionality. Supervised PCR preselects SNPs based on the strength of association of each SNP with the phenotype. Sparse PLS promotes sparsity by imposing some penalty on the coefficients of linear combinations of original SNP variables. Two types of supervised PCR (I and II) were examined. Method I was based on single-SNP analyses, whereas method II was based on multiple-SNP analyses. Supervised PCR II was clearly better than supervised PCR I in predictive ability when evaluated on SNP subsets of various sizes, and sparse PLS was in between. Supervised PCR II and sparse PLS attained similar predictive correlations when the size of the SNP subset was below 1000. Supervised PCR II with 300 and 500 SNPs achieved correlations of 0.54 and 0.59, respectively, corresponding to 80 and 87% of the correlation (0.68) obtained with all 32 518 SNPs in a PCR model. The predictive correlation of supervised PCR II reached a plateau of 0.68 when the number of SNPs increased to 3500. Our results demonstrate the potential of combining dimension reduction and variable selection for accurate and cost-effective prediction of genomic BV.
Journal of Animal Breeding and Genetics 08/2011; 128(4):247-57. DOI:10.1111/j.1439-0388.2011.00917.x · 2.06 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A byproduct of genome-wide association studies is the possibility of carrying out genome-enabled prediction of disease risk or of quantitative traits. This study is concerned with predicting two quantitative traits, milk yield in dairy cattle and grain yield in wheat, using dense molecular markers as predictors. Two support vector regression (SVR) models, ε-SVR and least-squares SVR, were explored and compared to a widely applied linear regression model, the Bayesian Lasso, the latter assuming additive marker effects. Predictive performance was measured using predictive correlation and mean squared error of prediction. Depending on the kernel function chosen, SVR can model either linear or nonlinear relationships between phenotypes and marker genotypes. For milk yield, where phenotypes were estimated breeding values of bulls (a linear combination of the data), SVR with a Gaussian radial basis function (RBF) kernel had a slightly better performance than with a linear kernel, and was similar to the Bayesian Lasso. For the wheat data, where phenotype was raw grain yield, the RBF kernel provided clear advantages over the linear kernel, e.g., a 17.5% increase in correlation when using the ε-SVR. SVR with a RBF kernel also compared favorably to the Bayesian Lasso in this case. It is concluded that a nonlinear RBF kernel may be an optimal choice for SVR, especially when phenotypes to be predicted have a nonlinear dependency on genotypes, as it might have been the case in the wheat data.
[Show abstract][Hide abstract] ABSTRACT: It has become increasingly clear from systems biology arguments that interaction and non-linearity play an important role in genetic regulation of phenotypic variation for complex traits. Marker-assisted prediction of genetic values assuming additive gene action has been widely investigated because of its relevance in artificial selection. On the other hand, it has been less well-studied when non-additive effects hold. Here, we explored a nonparametric model, radial basis function (RBF) regression, for predicting quantitative traits under different gene action modes (additivity, dominance and epistasis). Using simulation, it was found that RBF had better ability (higher predictive correlations and lower predictive mean square errors) of predicting merit of individuals in future generations in the presence of non-additive effects than a linear additive model, the Bayesian Lasso. This was true for populations undergoing either directional or random selection over several generations. Under additive gene action, RBF was slightly worse than the Bayesian Lasso. While prediction of genetic values under additive gene action is well handled by a variety of parametric models, nonparametric RBF regression is a useful counterpart for dealing with situations where non-additive gene action is suspected, and it is robust irrespective of mode of gene action.
[Show abstract][Hide abstract] ABSTRACT: The objective was to evaluate the effects of directional selection based on estimated genomic breeding values (GEBVs) for a quantitative trait. Selection affects GEBV prediction accuracy as well as genetic architecture via changes in allelic frequencies and linkage disequilibrium (LD), and the resulting changes are different from those in the absence of selection. How marker density affects long-term GEBV accuracy and selection response needs to be understood as well. Simulations were used to characterize the impact of selection based on GEBVs over generations. Single-nucleotide polymorphism (SNP) marker effects were estimated with the Bayesian Lasso method in the base generation, and these estimates were used to calculate the GEBVs in subsequent generations. GEBV accuracy decreased over generations of selection, and it was lower than under random selection, where a decay took place as well. In the long term, selection response tended to reach a plateau, but, at higher marker density, both the magnitude and duration of the response were larger. Selection changed quantitative trait loci (QTL) allele frequencies and generated new but unfavorable LD for prediction. Family effects had a considerable contribution to GEBV accuracy in early generations of selection.
[Show abstract][Hide abstract] ABSTRACT: Genomic data provide a valuable source of information for modeling covariance structures, allowing a more accurate prediction of total genetic values (GVs). We apply the kriging concept, originally developed in the geostatistical context for predictions in the low-dimensional space, to the high-dimensional space spanned by genomic single nucleotide polymorphism (SNP) vectors and study its properties in different gene-action scenarios. Two different kriging methods ["universal kriging" (UK) and "simple kriging" (SK)] are presented. As a novelty, we suggest use of the family of Matérn covariance functions to model the covariance structure of SNP vectors. A genomic best linear unbiased prediction (GBLUP) is applied as a reference method. The three approaches are compared in a whole-genome simulation study considering additive, additive-dominance, and epistatic gene-action models. Predictive performance is measured in terms of correlation between true and predicted GVs and average true GVs of the individuals ranked best by prediction. We show that UK outperforms GBLUP in the presence of dominance and epistatic effects. In a limiting case, it is shown that the genomic covariance structure proposed by VanRaden (2008) can be considered as a covariance function with corresponding quadratic variogram. We also prove theoretically that if a specific linear relationship exists between covariance matrices for two linear mixed models, the GVs resulting from BLUP are linked by a scaling factor. Finally, the relation of kriging to other models is discussed and further options for modeling the covariance structure, which might be more appropriate in the genomic context, are suggested.
[Show abstract][Hide abstract] ABSTRACT: An 118-cm-long, well-preserved sediment profile was collected from a paleo-notch formed by ocean wave action before rising
to the terrace on Ny-Ålesund, Svalbard, Norway. A large number of mollusk shell fragments, predominantly Mya truncata, were found in the sediment profile. AMS 14C dating and stable oxygen and carbon isotope analyses were performed on the shell fragments samples. The reservoir-corrected
radiocarbon ages averaged ~9,400yr B.P., which accurately dates the raised terrace and the upper marine limit after Kongsfjorden
was completely deglaciated. The calibrated aragonite isotopic temperature equation was established for Ny-Ålesund by comparing
the δ18O profiles of modern mollusks as follows: T (°C)=16.26−3.68(δ18Oaragonite–PDB−δ18Owater–VSMOW). The reconstructed paleotemperature range was −0.52 to +4.78°C, warmer than today by about 1°C, which was further confirmed
by reconstructed sea surface temperature (SST) in west Svalbard. Moreover, the mortality of mollusks was very likely caused
by an abrupt cooling event at about 9,400yr B.P., which was triggered by reduced insolation, weakened thermohaline circulation,
and abruptly decreased SST. More evidences for this distinct but short cooling event centered at about 9,400yr B.P. were
found in Northern Siberia, North Atlantic, Alps, and Eastern Europe.
Keywords9,400yr B.P.–Cooling event–Shell fragments–δ18O–Ny-Ålesund
[Show abstract][Hide abstract] ABSTRACT: A challenge when predicting total genetic values for complex quantitative traits is that an unknown number of quantitative trait loci may affect phenotypes via cryptic interactions. If markers are available, assuming that their effects on phenotypes are additive may lead to poor predictive ability. Non-parametric radial basis function (RBF) regression, which does not assume a particular form of the genotype-phenotype relationship, was investigated here by simulation and analysis of body weight and food conversion rate data in broilers. The simulation included a toy example in which an arbitrary non-linear genotype-phenotype relationship was assumed, and five different scenarios representing different broad sense heritability levels (0.1, 0.25, 0.5, 0.75 and 0.9) were created. In addition, a whole genome simulation was carried out, in which three different gene action modes (pure additive, additive+dominance and pure epistasis) were considered. In all analyses, a training set was used to fit the model and a testing set was used to evaluate predictive performance. The latter was measured by correlation and predictive mean-squared error (PMSE) on the testing data. For comparison, a linear additive model known as Bayes A was used as benchmark. Two RBF models with single nucleotide polymorphism (SNP)-specific (RBF I) and common (RBF II) weights were examined. Results indicated that, in the presence of complex genotype-phenotype relationships (i.e. non-linearity and non-additivity), RBF outperformed Bayes A in predicting total genetic values using SNP markers. Extension of Bayes A to include all additive, dominance and epistatic effects could improve its prediction accuracy. RBF I was generally better than RBF II, and was able to identify relevant SNPs in the toy example.
Genetics Research 06/2010; 92(3):209-25. DOI:10.1017/S0016672310000157 · 2.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A 118-cm-long and well-preserved sediment profile in a paleo-notch, which was formed by ocean wave action before rising to
the terrace, was collected from the first terrace of Ny-Ålesund, Svalbard, Arctic. The bottom of this profile was dated as
9,400years B.P. based on two radiocarbon dates of fossil mollusc shell fragments. The organic material in the sediment was
identified by δ13Corg–C/N plot and δ15Norg characteristics to be predominantly composed of seabird guano, which was transported from the ocean via preying and excreting
by seabirds. These results indicate that seabirds have inhabited Ny-Ålesund since 9,400years B.P. after Kongsfjorden was
completely deglaciated. This is the first report on Holocene seabird occupation on Ny-Ålesund and it provides the foundation
for understanding the ecological history of seabirds in Svalbard in Holocene.
[Show abstract][Hide abstract] ABSTRACT: The objective of the present study was to assess the predictive ability of subsets of single nucleotide polymorphism (SNP) markers for development of low-cost, low-density genotyping assays in dairy cattle. Dense SNP genotypes of 4,703 Holstein bulls were provided by the USDA Agricultural Research Service. A subset of 3,305 bulls born from 1952 to 1998 was used to fit various models (training set), and a subset of 1,398 bulls born from 1999 to 2002 was used to evaluate their predictive ability (testing set). After editing, data included genotypes for 32,518 SNP and August 2003 and April 2008 predicted transmitting abilities (PTA) for lifetime net merit (LNM$), the latter resulting from progeny testing. The Bayesian least absolute shrinkage and selection operator method was used to regress August 2003 PTA on marker covariates in the training set to arrive at estimates of marker effects and direct genomic PTA. The coefficient of determination (R(2)) from regressing the April 2008 progeny test PTA of bulls in the testing set on their August 2003 direct genomic PTA was 0.375. Subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP were created by choosing equally spaced and highly ranked SNP, with the latter based on the absolute value of their estimated effects obtained from the training set. The SNP effects were re-estimated from the training set for each subset of SNP, and the 2008 progeny test PTA of bulls in the testing set were regressed on corresponding direct genomic PTA. The R(2) values for subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP with largest effects (evenly spaced SNP) were 0.184 (0.064), 0.236 (0.111), 0.269 (0.190), 0.289 (0.179), 0.307 (0.228), 0.313 (0.268), and 0.322 (0.291), respectively. These results indicate that a low-density assay comprising selected SNP could be a cost-effective alternative for selection decisions and that significant gains in predictive ability may be achieved by increasing the number of SNP allocated to such an assay from 300 or fewer to 1,000 or more.
[Show abstract][Hide abstract] ABSTRACT: Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies.
[Show abstract][Hide abstract] ABSTRACT: Interplay between genetic and environmental factors, genotype x environment interactions (G x E), affect phenotypes of complex traits. A methodology for assessing G x E was investigated by detecting hygiene (low and high) environment-specific SNP subsets associated with broiler chicken mortality, followed by an examination of consistency between SNP subsets selected from the 2 environments. The trait was mean progeny mortality rate in 253 sire families, after adjusting records for nuisance effects affecting mortality at the individual bird level. Over 5,000 whole-genome SNP were narrowed down via a machine-learning (filter-wrapper) feature selection procedure applied to mortality rates in each of the 2 environments. For both early and late mortality, it was found that the selected SNP subsets differed across hygiene environments, in terms of either across-environment predictive ability or extent of linkage disequilibrium between the subsets. Reduction in predictive ability due to G x E was assessed by the ratio of 2 predicted residual sum of squares statistics, one associated with SNP selected from the same hygiene environment and the other associated with the SNP subset from a different environment. Reduction was 30 and 20% for early and late mortality, respectively. An extremely low level of linkage disequilibrium between SNP subsets selected under low and high hygiene also indicated G x E. Findings suggest that there may not be a universally optimal SNP subset for predicting mortality and that interactions between genome and environmental factors need to be considered in association analysis of complex traits.
[Show abstract][Hide abstract] ABSTRACT: Four approaches using single-nucleotide polymorphism (SNP) information (F(infinity)-metric model, kernel regression, reproducing kernel Hilbert spaces (RKHS) regression, and a Bayesian regression) were compared with a standard procedure of genetic evaluation (E-BLUP) of sires using mortality rates in broilers as a response variable, working in a Bayesian framework. Late mortality (14-42 days of age) records on 12,167 progeny of 200 sires were precorrected for fixed and random (nongenetic) effects used in the model for genetic evaluation and for the mate effect. The average of the corrected records was computed for each sire. Twenty-four SNPs seemingly associated with late mortality were included in three methods used for genomic assisted evaluations. One thousand SNPs were included in the Bayesian regression, to account for markers along the whole genome. The posterior mean of heritability of mortality was 0.02 in the E-BLUP approach, suggesting that genetic evaluation could be improved if suitable molecular markers were available. Estimates of posterior means and standard deviations of the residual variance were 24.38 (3.88), 29.97 (3.22), 17.07 (3.02), and 20.74 (2.87) for E-BLUP, the linear model on SNPs, RKHS regression, and the Bayesian regression, respectively, suggesting that RKHS accounted for more variance in the data. The two nonparametric methods (kernel and RKHS regression) fitted the data better, having a lower residual sum of squares. Predictive ability, assessed by cross-validation, indicated advantages of the RKHS approach, where accuracy was increased from 25 to 150%, relative to other methods.
[Show abstract][Hide abstract] ABSTRACT: In genome-wide association studies using single nucleotide polymorphisms (SNPs), typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. Atwo-step SNP (feature) selection method was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). This was based on discretization of the continuous phenotypic values. The method was applied to chick early mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. Sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate thresholds. By varying these thresholds, 11 different "case-control" samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS)from a linear model was used. Naive Bayesian classification accuracy was improved over the case without feature selection (from 50% to 90%). Seventeen SNPs in the best case-control group (with smallest PRESS) accounted for 31% of the variance among sire family mortality rates.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies using single nucleotide polymorphisms (SNPs) can identify genetic variants related to complex traits. Typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. When predicting phenotypes, options for statistical model building range from incorporating all possible markers into the specification to including only sets of relevant SNPs (features). In the latter case, an efficient method of selecting influential features is required. A two-step feature selection method for binary traits was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). The filter reduces the large number of SNPs to a much smaller size, to facilitate the wrapper step. As the procedure is tailored for discrete outcomes, an approach based on discretization of phenotypic values was developed, to enable feature selection in a classification framework. The method was applied to chick mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. To mimic a case-control study, sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate cut points. By varying these thresholds, 11 different 'case-control' samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS) from a linear model was used. The two-step method improved naïve Bayesian classification accuracy over the case without feature selection (from around 50 to above 90% without and with feature selection in each case-control sample). The best case-control group (63 sires above or below the thresholds) had the smallest PRESS statistic among groups with model p-values below 0.003. The 17 SNPs selected using this group accounted for 31% of the variation in raw mortality rates between sire families.
Journal of Animal Breeding and Genetics 01/2008; 124(6):377-89. DOI:10.1111/j.1439-0388.2007.00694.x · 2.06 Impact Factor