Article

Statistical and bio-computational applications in animal sciences

Authors:
  • Indian Agricultural Research Institute
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The demand for food proteins, including plant and animal proteins is increasing at an exponential rate. The demand for animal products will nearly be doubled by 2030. Thus, to improve livestock production and meet the animal protein demand, it is essential to go for application of interventions based on genomics, statistics and informatics. Such interventions are quite often used in the animal improvement programs to develop offspring with desirable traits. More recently, with the emergence of high throughput sequencing technologies, genomes of farm animals, fishes and model organisms were sequenced and the same are available in public domain. Also, with the advent of new silicon technologies, it has become possible to manage the generated data from genome sequencing projects. Now, the challenge lies with the analysis and interpretation of sequence data in a biologically meaningful manner, for which many algorithmic based analytical techniques and high performance computing methods were developed. Here, a brief review is presented on the application of various statistical and computational approaches used in genomic data analysis. Applications of the above mentioned approaches for health management and sustainable animal and fish production from the view point of vaccine and drug designing, disease risk management, epigenomics and whole genome level SNP/CNV associations with traits at are also discussed here. Besides, this paper allows the molecular biologists and other application scientists to analyze overwhelming amount of genomic data by different methods outlined here.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Conference Paper
Full-text available
Genetic variations in Fas receptor and its ligand are observed in wide spectrum of immune system disorders, most prominently causing Autoimmune Lymphoproliferative Syndrome (ALPS). Till date, more than 300 families with approximately 500 patients were diagnosed with hereditary ALPS, worldwide over the last 20 years [1]. Nevertheless, the underlying basis of how these deleterious genetic variations elicit protein conformation in terms of structural stability, and protein binding affinity remains unexplored. Therefore, we aimed to study the structural and functional impacts of Fas/FasL mutations, by in-silico method as an alternate to traditional in-vivo and in-vitro approaches. Fas/FasL genetic variations details were collected from different databases and their corresponding clinical associations were confirmed by the text-mining. Initially, various computational algorithms were employed to categorize the genetic variations based on the degree of deleterious nature to Fas/FasL protein structures. Then the ab-initio protein structures for Fas/FasLwildtypes and mutant models of the most deleterious mutations were built by I-Tasser server and Swiss Model-Expasy respectively. Molecular docking was also performed to assess the binding affinity of wildtype and mutated protein of Fas/FasL complex. Five genetic variations were identified that mapped on highly conserved death domain region (exon-9) of Fas inducing significant conformational changes in the mutant proteins which alter the stability of Fas-FasL interactions resulting in ALPS. This study supports in-silico approach as a primary filter to verify the plausible degree of deleterious mutations based on the evolutionary conservation of sequence, structural homology and protein stability.
Article
Tuberculosis (TB) is considered as one of the most devastating global public health threats of the 21st century. It is an infectious disease that is responsible for second cause of death, after human immunodefiency virus (HIV). Multi-drug resistant tuberculosis (MDR-TB) poses grave challenge because of prolonged, limited and expensive treatment options with 10 to 30 per cent of cases resulting in failure of treatment and death. The aimed of this study was to identify novel inhibitors of UDP-N-Acetylglucosamine 1- Carboxyvinyltransferase using both natural and synthetics ligands libraries. To determine the 3D structures of the enzyme and their structural features, homology modeling was used using Modeller9.17. The modelled structure was further checked for high reliability by verify score and Ramachandran plot. Thegenerated model was used for molecular docking simulation studies for predicting the best inhibitors, the selected inhibitors were subjected to absorption, distribution, metabolism, excretion and toxicology (ADME/Tox) prediction, molecular dynamic simulation and In vitro studies. Twenty (20) compounds possessed high activities with minimum free binding energy ranges -10.73 to -8.76kcal/mol. Among the 20 compounds eight displayed the best conformation fitting stability in the binding sites of UDP-NAcetylglucosamine 1-Carboxyvinyltransferase analysis by molecular dynamics simulation for 5ns using Amber10. The compounds were subjected to bioactivity validation using the resazurin microtiter assay. Among the compound 3 showed high inhibitory activity up to 85% at 10 µg/mL concentration against the growth of the Mtb H37Rv strain. Therefore, the identified ligands would serve better lead compounds for future drug design perspective of both multidrug resistance and extensive drug resistance of Mtb H37Rv strain.
Article
Tuberculosis (TB) is considered as one of the most devastating global public health threats of the 21st century. It is an infectious disease that is responsible for second cause of death, after human immunodefiency virus (HIV). Multi-drug resistant tuberculosis (MDR-TB) poses grave challenge because of prolonged, limited and expensive treatment options with 10 to 30 per cent of cases resulting in failure of treatment and death. The aimed of this study was to identify novel inhibitors of UDP-N-Acetylglucosamine 1- Carboxyvinyltransferase using both natural and synthetics ligands libraries. To determine the 3D structures of the enzyme and their structural features, homology modeling was used using Modeller9.17. The modelled structure was further checked for high reliability by verify score and Ramachandran plot. Thegenerated model was used for molecular docking simulation studies for predicting the best inhibitors, the selected inhibitors were subjected to absorption, distribution, metabolism, excretion and toxicology (ADME/Tox) prediction, molecular dynamic simulation and In vitro studies. Twenty (20) compounds possessed high activities with minimum free binding energy ranges -10.73 to -8.76kcal/mol. Among the 20 compounds eight displayed the best conformation fitting stability in the binding sites of UDP-NAcetylglucosamine 1-Carboxyvinyltransferase analysis by molecular dynamics simulation for 5ns using Amber10. The compounds were subjected to bioactivity validation using the resazurin microtiter assay. Among the compound 3 showed high inhibitory activity up to 85% at 10 µg/mL concentration against the growth of the Mtb H37Rv strain. Therefore, the identified ligands would serve better lead compounds for future drug design perspective of both multidrug resistance and extensive drug resistance of Mtb H37Rv strain.
Article
Network motif search is useful in uncovering the important functional components of complex networks in biological, chemical, social and other domains. PATCOMP - a PARTICIA based novel approach for network motif search is proposed in this paper. The algorithm of PATCOMP takes benefit of memory compression and speed of PATRICIA trie to store the collection of subgraphs in memory and search them for classification and census of network. The structure of trie nodes and how data structure is developed to use it for counting the subgraphs is also described. PATCOMP was compared with QuateXelero and G-Tries.The main benefit of this approach is significant reduction in memory space requirement particularly for large network motifs with acceptable time performance. The experiments with directed networks like E.coli, yeast, social and electronic validated the advantage of PATCOMP in terms of reduction in memory usage by 2.7-27.7% as compared to QuateXelero for smaller motif sizes (with exceptions of s=6 for E. coli and s=6 for social), and 7.8-38.35% for larger motif sizes. For undirected networks, PATCOMP utilizes less memory by 0.07%-43% (with exception of s=7 for electronic and s=6,8 for dolphin networks).
Article
Full-text available
Abstract Domestic animals are invaluable resources for study of the molecular architecture of complex traits. Although the mapping of quantitative trait loci (QTL) responsible for economically important traits in domestic animals has achieved remarkable results in recent decades, not all of the genetic variation in the complex traits has been captured because of the low density of markers used in QTL mapping studies. The genome wide association study (GWAS), which utilizes high-density single-nucleotide polymorphism (SNP), provides a new way to tackle this issue. Encouraging achievements in dissection of the genetic mechanisms of complex diseases in humans have resulted from the use of GWAS. At present, GWAS has been applied to the field of domestic animal breeding and genetics, and some advances have been made. Many genes or markers that affect economic traits of interest in domestic animals have been identified. In this review, advances in the use of GWAS in domestic animals are described.
Article
Full-text available
The Omic revolution has generated voluminous genome sequence data. The discovery of genomic elements like genes, splice sites, regulatory motifs and transcription factor binding sites have become thrust areas of bioinformatics. The transcription factors are the proteins that bind to the transcription factor binding sites on the genome to regulate the gene expression. Thus, the identification of transcription factor binding sites and their genomic co-ordinates has been a prime interest in genomic research to understand the underlying mechanism of gene expression. Various experimental and computational approaches have been used to detect these sites. In this paper, Gibbs sampling has been applied to identify transcription factor binding sites and is discussed in terms of its parameters, model and procedures using the sequence data of Arabidopsis thaliana.
Article
Full-text available
Bayesian using GIBBS sampling (BUGS) algorithm to obtain numerical estimates of parameters of posterior distribution and variance components along with heritability (h 2) under 2-way nested random model has been used. Using Monte Carlo simulation, a comparison is made between the heritability estimates obtained under BUGS approach and traditional approaches like ANOVA. ML, REML for different family structures. The Bayesian approach is seen to be superior to traditional approaches for estimation of heritability under 2-way nested model.
Article
Full-text available
The initiation of the work in statistical genetics involving analysis of breeding data on Beetal goats dates back to 1940 when this Institute in its formative stages was working as a statistical section of ICAR. This work under the illustrious leadership of Professor P. V. Sukhatme led, for the first time, to the appreciation of the power of Statistics in drawing inferences on issues in animal sciences and other fields. With the reorganization of research activities under the Indian Council of Agricultural Research during the 70s the stage was set for making the impact of this discipline felt in the research arena of the country. After the recognition of this Institute as a full-fledged Institute of ICAR the work on statistical genetics, which involved research initiatives, both of theoretical and applied nature in plant and animal breeding was carried out in the Division of Animal Sciences. The activity of research in statistical genetics was, further strengthened by carving out a "Statistical Genetics Cell" from the Division of Animal Sciences in 1978. Realizing the importance of this area of research and the amount of work done in the past, the 'Quinquennial Review Team (1971-1981)' as well as the some of the UNDP experts who visited this Institute from time to time recommended that research in the field of statistical genetics and other areas like biometry, bio-assay and bio-statistics must be carried out on a much larger scale rather than by a small cell. Consequent on these recommendations the Division of Bio-statistics and Statistical Genetics came into existence in March 1985. Subsequently, the name of the Division was changed to 'Biometrics', in 1998. New theoretical developments were made from time to time and numerous methodologies for application in plant and animal breeding and related areas were developed. The major research contributions made during last 25 years by the scientists and students in the area of statistical genetics are reviewed. Prior to this period the work has been exhaustively reviewed by Narain et al. (1987). In the next few sections we have given a brief account of the important research and academic accomplishments in the area of statistical applications in genetics and breeding. In the discussion of research achievements, the material has been organized under different heads. At the end a complete list of publications of the scientists, that formed the main basis of the highlights of research achievements, have been provided in this article. 2. Estimation of Genetic Parameters
Article
Full-text available
Research have been conducted screening in silico chemical compound inhibitor α-glucosidase from plants dringo (Acorus calamus L) based on the binding site (binding site) are owned by some of the compounds obtained respectively from the inhibition of enzyme / receptor (docking) using the program Argus Lab. Model of the enzyme α-glucosidase was obtained through the protein data bank with the code 1lwj in the donwload NCBI website. Models of chemical compounds contained in dringo (A. Calamus L) obtained through the site Take out "jamu" Knapsack and made in the formula structures of 2D and 3D using the program ACD / Chemsketch. Docking results showed activity in the compound 1-ethenyl-1-methyl-2,4-at (prop-1-en-2-yl) Cyclohexane with free energy - 8.04385 kcal / mol, and the compound Isocaespitol with a free energy - 8.28388 kcal / mol
Article
Full-text available
Domestic animals are invaluable resources for study of the molecular architecture of complex traits. Although the mapping of quantitative trait loci (QTL) responsible for economically important traits in domestic animals has achieved remarkable results in recent decades, not all of the genetic variation in the complex traits has been captured because of the low density of markers used in QTL mapping studies. The genome wide association study (GWAS), which utilizes high-density single-nucleotide polymorphism (SNP), provides a new way to tackle this issue. Encouraging achievements in dissection of the genetic mechanisms of complex diseases in humans have resulted from the use of GWAS. At present, GWAS has been applied to the field of domestic animal breeding and genetics, and some advances have been made. Many genes or markers that affect economic traits of interest in domestic animals have been identified. In this review, advances in the use of GWAS in domestic animals are described.
Article
Full-text available
Recent studies of mammalian genomes have uncovered the vast extent of copy number variations (CNVs) that contribute to phenotypic diversity. Compared to SNP, a CNV can cover a wider chromosome region, which may potentially incur substantial sequence changes and induce more significant effects on phenotypes. CNV has been becoming an alternative promising genetic marker in the field of genetic analyses. Here we firstly report an account of CNV regions in the cattle genome in Chinese Holstein population. The Illumina Bovine SNP50K Beadchips were used for screening 2047 Holstein individuals. Three different programes (PennCNV, cnvPartition and GADA) were implemented to detect potential CNVs. After a strict CNV calling pipeline, a total of 99 CNV regions were identified in cattle genome. These CNV regions cover 23.24 Mb in total with an average size of 151.69 Kb. 52 out of these CNV regions have frequencies of above 1%. 51 out of these CNV regions completely or partially overlap with 138 cattle genes, which are significantly enriched for specific biological functions, such as signaling pathway, sensory perception response and cellular processes. The results provide valuable information for constructing a more comprehensive CNV map in the cattle genome and offer an important resource for investigation of genome structure and genomic variation underlying traits of interest in cattle.
Article
Full-text available
Recently, the amount of available single nucleotide polymorphism (SNP) marker data has considerably increased in dairy cattle breeds, both for research purposes and for application in commercial breeding and selection programs. Bayesian methods are currently used in the genomic evaluation of dairy cattle to handle very large sets of explanatory variables with a limited number of observations. In this study, we applied 2 Bayesian methods, BayesCπ and Bayesian least absolute shrinkage and selection operator (LASSO), to 2 genotyped and phenotyped reference populations consisting of 3,940 Holstein bulls and 1,172 Montbéliarde bulls with approximately 40,000 polymorphic SNP. We compared the accuracy of the Bayesian methods for the prediction of 3 traits (milk yield, fat content, and conception rate) with pedigree-based BLUP, genomic BLUP, partial least squares (PLS) regression, and sparse PLS regression, a variable selection PLS variant. The results showed that the correlations between observed and predicted phenotypes were similar in BayesCπ (including or not pedigree information) and Bayesian LASSO for most of the traits and whatever the breed. In the Holstein breed, Bayesian methods led to higher correlations than other approaches for fat content and were similar to genomic BLUP for milk yield and to genomic BLUP and PLS regression for the conception rate. In the Montbéliarde breed, no method dominated the others, except BayesCπ for fat content. The better performances of the Bayesian methods for fat content in Holstein and Montbéliarde breeds are probably due to the effect of the DGAT1 gene. The SNP identified by the BayesCπ, Bayesian LASSO, and sparse PLS regression methods, based on their effect on the different traits of interest, were located at almost the same position on the genome. As the Bayesian methods resulted in regressions of direct genomic values on daughter trait deviations closer to 1 than for the other methods tested in this study, Bayesian methods are suggested for genomic evaluations of French dairy cattle.
Article
Full-text available
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PC, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PC or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for non-genetic covariates to AUCs of 0.58 (pedigree), 0.62 (PC), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of SNPs in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.
Article
Full-text available
Recent advances in high-throughput genotyping have motivated genomic selection using high-density markers. However, an increasingly large number of markers brings up both statistical and computational issues and makes it difficult to estimate the breeding values. We propose to apply the penalized orthogonal-components regression (POCRE) method to estimate breeding values. As a supervised dimension reduction method, POCRE sequentially constructs linear combinations of markers, i.e. orthogonal components, such that these components are most closely correlated to the phenotype. Such a dimension reduction is able to group highly correlated predictors and allows for collinear or nearly collinear markers. Different from BayesB, which predetermines hyperparameters, POCRE uses an empirical Bayes thresholding method to obtain data-driven optimal hyperparameters and effectively select important markers when constructing each component. Demonstrated through simulation studies, POCRE greatly reduces the computing time compared with BayesB. On the other hand, unlike fBayesB which slightly sacrifices prediction accuracy for fast computation, POCRE provides similar or even better accuracy of predicting breeding values than BayesB in both simulation studies and real data analyses.
Article
Full-text available
Naive T lymphocytes exhibit extensive antigen-independent recirculation between blood and lymph nodes, where they may encounter dendritic cells carrying cognate antigen. We examine how long different T cells may spend in an individual lymph node by examining data from long term cannulation of blood and efferent lymphatics of a single lymph node in the sheep. We determine empirically the distribution of transit times of migrating T cells by applying the Least Absolute Shrinkage & Selection Operator ([Formula: see text]) or regularised [Formula: see text] to fit experimental data describing the proportion of labelled infused cells in blood and efferent lymphatics over time. The optimal inferred solution reveals a distribution with high variance and strong skew. The mode transit time is typically between 10 and 20 hours, but a significant number of cells spend more than 70 hours before exiting. We complement the empirical machine learning based approach by modelling lymphocyte passage through the lymph node [Formula: see text]. On the basis of previous two photon analysis of lymphocyte movement, we optimised distributions which describe the transit times (first passage times) of discrete one dimensional and continuous (Brownian) three dimensional random walks with drift. The optimal fit is obtained when drift is small, i.e. the ratio of probabilities of migrating forward and backward within the node is close to one. These distributions are qualitatively similar to the inferred empirical distribution, with high variance and strong skew. In contrast, an optimised normal distribution of transit times (symmetrical around mean) fitted the data poorly. The results demonstrate that the rapid recirculation of lymphocytes observed at a macro level is compatible with predominantly randomised movement within lymph nodes, and significant probabilities of long transit times. We discuss how this pattern of migration may contribute to facilitating interactions between low frequency T cells and antigen presenting cells carrying cognate antigen.
Article
Full-text available
Developing machine learning and soft computing techniques has provided many opportunities for researchers to establish new analytical methods in different areas of science. The objective of this study is to investigate the potential of two types of intelligent learning methods, artificial neural networks and neuro-fuzzy systems, in order to estimate breeding values (EBV) of Iranian dairy cattle. Initially, the breeding values of lactating Holstein cows for milk and fat yield were estimated using conventional best linear unbiased prediction (BLUP) with an animal model. Once that was established, a multilayer perceptron was used to build ANN to predict breeding values from the performance data of selection candidates. Subsequently, fuzzy logic was used to form an NFS, a hybrid intelligent system that was implemented via a local linear model tree algorithm. For milk yield the correlations between EBV and EBV predicted by the ANN and NFS were 0.92 and 0.93, respectively. Corresponding correlations for fat yield were 0.93 and 0.93, respectively. Correlations between multitrait predictions of EBVs for milk and fat yield when predicted simultaneously by ANN were 0.93 and 0.93, respectively, whereas corresponding correlations with reference EBV for multitrait NFS were 0.94 and 0.95, respectively, for milk and fat production.
Article
Full-text available
Unlabelled: The small hairpin RNAs (shRNA) are useful in many ways like identification of trait specific molecular markers, gene silencing and characterization of a species. In public domain, hardly there exists any standalone software for shRNA prediction. Hence, a software shRNAPred (1.0) is proposed here to offer a user-friendly Command-line User Interface (CUI) to predict 'shRNA-like' regions from a large set of nucleotide sequences. The software is developed using PERL Version 5.12.5 taking into account the parameters such as stem and loop length combinations, specific loop sequence, GC content, melting temperature, position specific nucleotides, low complexity filter, etc. Each of the parameters is assigned with a specific score and based on which the software ranks the predicted shRNAs. The high scored shRNAs obtained from the software are depicted as potential shRNAs and provided to the user in the form of a text file. The proposed software also allows the user to customize certain parameters while predicting specific shRNAs of his interest. The shRNAPred (1.0) is open access software available for academic users. It can be downloaded freely along with user manual, example dataset and output for easy understanding and implementation. Availability: The database is available for free at http://bioinformatics.iasri.res.in/EDA/downloads/shRNAPred_v1.0.exe.
Article
Full-text available
There is a constant demand for new and improved vaccines. Nanovaccine is emerging as a novel approach to the methodology of vaccination. Nanovaccines are more efficient than conventional vaccines in that they induce both humoral and cell-mediated immune response. Nanovaccines have the promise to harness the body's immune system to kill infections and to prevent infections and diseases from spreading. Nanovaccines might hold promise in chronic autoimmune diseases, such as multiple sclerosis, rheumatoid arthritis, HIV, malaria and others.
Article
Full-text available
Copy Number Variations (CNVs) have been shown important in both normal phenotypic variability and disease susceptibility, and are increasingly accepted as another important source of genetic variation complementary to single nucleotide polymorphism (SNP). Comprehensive identification and cataloging of pig CNVs would be of benefit to the functional analyses of genome variation. In this study, we performed a genome-wide CNV detection based on the Porcine SNP60 genotyping data of 474 pigs from three pure breed populations (Yorkshire, Landrace and Songliao Black) and one Duroc × Erhualian crossbred population. A total of 382 CNV regions (CNVRs) across genome were identified, which cover 95.76Mb of the pig genome and correspond to 4.23% of the autosomal genome sequence. The length of these CNVRs ranged from 5.03 to 2,702.7kb with an average of 250.7kb, and the frequencies of them varied from 0.42 to 20.87%. These CNVRs contains 1468 annotated genes, which possess a great variety of molecular functions, making them a promising resource for exploring the genetic basis of phenotypic variation within and among breeds. To confirmation of these findings, 18 CNVRs representing different predicted status and frequencies were chosen for validation via quantitative real time PCR (qPCR). Accordingly, 12 (66.67%) of them was successfully confirmed. Our results demonstrated that currently available Porcine SNP60 BeadChip can be used to capture CNVs efficiently. Our study firstly provides a comprehensive map of copy number variation in the pig genome, which would be of help for understanding the pig genome and provide preliminary foundation for investigating the association between various phenotypes and CNVs.
Article
The non-linear statistical models are tried to study the growth pattern of 4 Indian breeds of goats viz., Jamunapari (A), Beetal (B), Barbri (C), Black Bengal (D) and their crosses. The corrected data for body weight of male animals from birth to 12 month of age was used to fit the different non-lilnear models such as monomolecular, Gompertz and logistic curves. The data were corrected for 3 nongenetic fixed effects i.e. type of birth, season and period. On comparing the values of R2 and error mean square; it is found that monomolecular gave the best fit which is closely following by Gompertz curve. Since, the monomolecular curve has the drawback that it does not provide the point of inflexion, the second best Gompertz curve is used to obtain the age, body weight and maximum growth rate at point of inflexion. The optimum age at point on inflexion for male animals in Jamunapari, Beetal, Barbari and Black Bengal are 11.5, 3.9, 8.3 and 5.9 months respectively. The optimum age in crosses is around 6 months except for Jamunapari x Babari and Beetal x Black Bengal for which these are 8.5 and 7.6 months respectively. The growth rate is maximum at the optimum age and it starts declining thereafter at a slow pace. There is considerable improvement in body weight at the point of inflexion / optimum age in crosses as compared to pure breed parent with lower body weight.
Article
We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample ( RNA - Seq ). This provides a digital measure of the presence and prevalence of transcripts from known ...
Article
Genetic correlation is one of the important genetic parameters widely used by the breeders and geneticists in selection and improvement programs. Evidences are cited in the literature for the estimation of genetic correlation based on normality assumptions. This paper presents the estimates of genetic correlation under non-normality assumptions, from the variance covariance components under half-sib model. The estimates of genetic correlation, bias, standard error and mean square error estimates are reported in the study.
Article
The formula of standard error of genetic correlation based on half-sib intra-class correlation is highly approximate and depends upon the estimates and standard errors of other genetic parameters, which themselves are estimated with low precision. This results in unreliable estimation of the standard error of genetic correlation. The present investigation is an attempt to study some of the statistical properties of this important genetic parameter using bootstrap technique. The results showed that in 95% of cases the underlying distribution of the genetic correlation is non-normal. The bootstrap estimates of genetic correlation are biased to the extent of 5-10%. The higher estimates of standard error obtained by this technique have revealed that the standard error is underestimated by traditional estimation procedure using approximate formulae, available in literature. As expected the confidence intervals obtained by the percentile method are shorter in majority of cases as compared to confidence intervals obtained under normality assumption.
Article
The concept of vaccination has been around for centuries.Vaccines constitutes cost-effective measures for preventing disease. Advances in biotechnology and an understanding of the inductive and effector components of immune responses have ushered in a "golden age" of vaccine development and implementation. Many licensed vaccines have one or more ideal characteristics, but none manifests them all. Of the generic vaccine technologies and vaccination strategies in different stages of development, some have already demonstrated their flexibility, practicality, robustness and potential simplicity of production and others hold promise for the future. Although conventional methods of development of vaccines are successful in many cases, this approach took a long time to provide vaccines against those pathogens for which the solution was easy and failed to provide a solution for those bacteria and parasites that did not have obvious immunodominant protective antigens. The reverse approach to vaccine development takes advantage of the genome sequence of the pathogen. This approach allows not only the identification of all the antigens seen by the conventional methods, but also the discovery of novel antigens that work on a totally different paradigm. With the genome sequences of many bacteria, parasites and viruses to be completed in the near future, many vaccines impossible to develop will become reality, and novel vaccines, using non-conventional antigens (i.e. non-structural proteins) can be developed.
Article
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
Article
Modified ANOVA method, which enforces restriction on the parameter space of the variance components, has been used to get the heritability estimates in the admissible range. A Monte Carlo study was conducted to compare the performance of this method with the other traditional methods such as ANOVA, ML, REML and MIVQUE(0) in the presence of unbalancedness and scale contamination. The modified ANOVA estimator has lesser MSE than ANOVA for all the parametric values of heritability under different sample sizes.
Article
The research work done in the field of bio-statistics and statistical genetics in India over the last 5 decades has been reviewed. For convenience in the presentation, the review is divided into 4 broad heads, viz. Theoretical developments, applications in plant breeding, applications in animal breeding and future challenges.
Article
Stayability being a binary trait in animal breeding needs thorough genetic analysis. The procedure of beta-binomial is modified and used in unbalanced data set, Dempster-Lerner method was also used to estimate the heritability of stayability, and was subsequently compared empirically with the beta-binomial method. For all these comparisons the situation of unbalanced data was considered. The unbalancedness led to estimates with large standard error. Further relative root mean square error was also obtained and found that precision and accuracy of estimates were affected by unbalancedness. The results of beta-binomial method and Dempster-Lerner method of estimation are almost similar.
Article
Neisseria meningitidis is a major cause of bacterial septicemia and meningitis. Sequence variation of surface-exposed proteins and cross-reactivity of the serogroup B capsular polysaccharide with human tissues have hampered efforts to develop a successful vaccine. To overcome these obstacles, the entire genome sequence of a virulent serogroup B strain (MC58) was used to identify vaccine candidates. A total of 350 candidate antigens were expressed in Escherichia coli, purified, and used to immunize mice. The sera allowed the identification of proteins that are surface exposed, that are conserved in sequence across a range of strains, and that induce a bactericidal antibody response, a property known to correlate with vaccine efficacy in humans.
Article
Stayability, being an all-or-none trait in dairy cattle breeding, is one of the important characteristics and needs an in-depth study of its inheritance. The modification in the beta-binomial method along with its comparison with other methods is being done for the estimation of heritability of stayability. For illustration purposes, different data sets with varying levels of unbalancedness are simulated and the results obtained show significant superiority of the modified beta-binomial method. The effect of unbalancedness on the estimates of heritability and its precision is also examined.
Article
Heritability is the ratio of additive genetic variance to the total phenotypic variance and expresses the extent to which phenotypes are determined by the genes transmitted from the parents. Very few studies have been conducted to establish optimal methods for the estimation of variance components and heritability from the data containing abnormal values. In the literature however methods are reported to deal with this situation for one-way random effects models. We propose a robust method that exploits the sample covariance matrix for the estimation of variance components and thereby the heritability for a two way random nested random effects model.
Article
Leptospirosis is an important global human and veterinary health problem. Humans can be infected by exposure to chronically infected animals and their environment. An important focus of the current leptospiral research is the identification of outer mem- brane proteins (OMPs). Due to their location, leptospiral OMPs are likely to be relevant in host–pathogen interactions, hence their potential ability to stimulate heterologous immunity. The existing whole-genome sequence of Leptospira interrogans serovar Copen- hageni offers a unique opportunity to search for cell surface proteins. Predicted genes encoding potential surface proteins were amplified from genomic DNA by PCR methodology and cloned into an Escherichia coli expression system. The partially purified recombinant proteins were probed by Western blotting with sera from human patients diagnosed with leptospirosis. Sixteen pro- teins, out of a hundred tested, were recognized by antibodies present in human sera. Four of these proteins were conserved among eight serovars of L. interrogans and absent in the non-pathogenic Leptospira biflexa. These proteins might be useful for the diagnosis of the disease as well as potential vaccine candidates.
Article
Stayability, the ability to stay in the herd is an important character in dairy cattle breeding and needs a thorough genetic analysis. Herdlife which depends on many auxiliary characters is an aspect of stayability. Thus to study the genetic parameters of stayability, herdlife has been adjusted for production and other unrelated characters affecting it. The corresponding methodology for estimation of heritability has been worked out. To validate the theory developed, stochastic simulation has been used. It is seen that, even small adjustments for characteristics such as production or foot angle have great effects on estimates of heritability of stayability. Further relative absolute bias has also been studied and found that reduction in relative absolute bias is significant due to adjustment of unrelated auxiliary characters.
Article
New sources of genetic polymorphisms promise significant additions to the number of useful genetic markers in agricultural plants and animals, and prompt this review of potential applications of polymorphic genetic markers in plant and animal breeding. Two major areas of application can be distinguished. The first is based on the utilization of genetic markers to determine genetic relationships. These applications include varietal identification, protection of breeder's rights, and parentage determination. The second area of application is based on the use of genetic markers to identify and map loci affecting quantitative traits, and to monitor these loci during introgression or selection programs. A variety of breeding applications based on these possibilities can be envisaged for Selfers, particularly for those species having a relatively small genome size. These applications include: (i) screening genetic resources for useful quantitative trait alleles, and introgression of chromosome segments containing these alleles from resource strain to commercial variety; (ii) development of improved pure lines out of a cross between two existing commercial varieties; and (iii) development of crosses showing increased hybrid vigor. Breeding applications in segregating populations are more limited, particularly in species with a relatively large genome size. Potential applications, however, include: (i) preliminary selection of young males in dairy cattle on the basis of evaluated chromosomes of their proven sire; (ii) genetic analysis of resource strains characterized by high values for a particular quantitative trait, and introgression of chromosome segments carrying alleles contributing to the high values from resource strain to recipient strain.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
An investigation was carried out on 12,854 fortnightly test day milk yields records of first lactation pertaining to 643 Sahiwal cows sired by 51 bulls spread over 49 years located at the National Dairy Research Institute, Karnal. The comparison was made between the relative efficiency of multiple linear regression analysis and artificial neural network (ANN) for prediction of first lactation 305 d milk yield (FL305DMY) in Sahiwal cows. Artificial Neural Network was trained using three back propagation algorithms viz. Bayesian regularization (BR), Scaled conjugate gradient (SCG) and Levenberg–Marquardt (LM). Further, these three algorithms were compared using four sets of training and test data sets at 66.67–33.33%, 75–25%, 80–20% and 90–10%. It has been found that the coefficient of determination of the models was increased with the addition of test day milk yields as input variables. It was inferred from the study that artificial neural network was better than the multiple linear regression analysis to predict FL305DMY with more than 80% accuracy by almost all the models at an early stage i.e. by 111th day of the lactation having lesser value of RMSE than MLR. Therefore, it is recommended that ANN can be a potential tool for the prediction of the first lactation 305-day milk yield in Sahiwal cows.
Article
The present investigation is an attempt to compare the estimated, predicted, empirical and bootstrap Standard Errors (SE) for different combinations of population heritability, genetic and phenotypic correlations for different family sizes and structures under half-sib mating design. The data under half-sib model are simulated by taking sire effects following normal as well as gamma distribution. It is observed that the empirical SE of genetic correlation, when sire effects are from gamma distribution, are invariably higher as compared to the data with sire effects following normal distribution irrespective of the sample size, heritability and genetic correlation of the traits. The empirical SE of estimates of genetic correlation are very high for lowly heritable traits for whole range of genetic correlation. The large sample approximation of SE given by Tallis is always underestimating the SE even for large family size of 30 to 50 and should not be used in practice. Barring small sample size, the bootstrap estimates of SE are very close to predicted SE and can be used as an estimate of SE of genetic correlation. The bootstrap estimates of SE of genetic correlation are found to be very close to the predicted SE for sample size 500 and above in case of lowly heritable traits for whole range of genetic correlation. In case of moderately and highly heritable traits, the bootstrap estimates of SE are found very close to predicted SE for all values of genetic correlation and for all the sample sizes and family structures except for small sample size with moderately heritable traits. Hence, it can be concluded that the bootstrap estimates of SE which are very close to predicted values can be used to estimate the SE instead of approximate formula given in literature. It is also found that in case of non-normal datasets with sire effects following gamma distribution the bootstrap estimates of SE of genetic correlation are always underestimated.
Article
Automated animal behaviour monitoring systems have become increasingly appealing for research and animal production management purposes. However, many existing systems are suited to measure only one or two behaviour patterns or activity states at a time. We aimed to develop and pilot a method for automatically measuring and recognising several behavioural patterns of dairy cows using a three-dimensional accelerometer and a multi-class support vector machine (SVM). SVM classification models were constructed based on nine features. The models were trained using observations made of the behaviour of 30 cows fitted with a neck collar bearing an accelerometer that recorded horizontal, vertical and lateral acceleration. Measured behaviour patterns included standing, lying, ruminating, feeding, normal and lame walking, lying down, and standing up. Accuracy, sensitivity, precision, and kappa measures were used to evaluate the model performance. The SVM classification models achieved a reasonable recognition of standing (80% sensitivity, 65% precision), lying (80%, 83%), ruminating (75%, 86%), feeding (75%, 81%), walking normally (79%, 79%), and lame walking (65%, 66%). The results were poor for lying down (0%, 0%) and standing up (71%, 29%). The overall performance of the multi-class model was 78% precision with a kappa value of 0.69. Each of the behaviour categories had one or two other behaviour patterns that became confused with them the most. The problematic behaviours were expectedly those that resemble each other in terms of movement. Possible solutions for the problems in classification are presented. In conclusion, accelerometers can be used to easily recognise various behaviour patterns in dairy cows. Support vector machines proved useful in classification of measured behaviour patterns. However, further work is needed to refine the features used in the classification models in order to gain the best possible classification performance. Also the quality of acceleration data needs to be considered to improve the results.
Motifs are the biologically significant fragments of nucleotide or peptide sequences in a specific pattern. Motifs are categorized as structural motifs and sequence motifs. These are discovered by phylogenetic studies of similar genes across species. Structural motifs are formed by three dimensional arrangements of amino acids consisting of two or more α helices or β strands whereas sequence motifs are formed by the nucleotide fragments appearing in the exons of a gene. The arrangement of residues in structural motifs may not be continuous while it is continuous in sequence motifs. Sequence motifs may encode to the structural motifs. The algorithms used for motif discovery are important part of the bio-computational studies. The purpose of motif discovery is to identify patterns in biopolymer (nucleotide or protein) sequences to understand the structure and function of the molecules and their evolutionary aspects. The main aim of this paper is to provide systematic compilation of a review on different approaches, databases and tools used in motif discovery.
Article
Anoxia is an important abiotic stress factor which negatively impacts agricultural systems. Vitellogenin (VTG), an anoxia tolerant gene, is widely reported in fish and other oviparous species. Similarly, Submergence 1, a submergence tolerance gene is reported in rice. It is expected that there must be a common mechanism during the process of evolution in which both genes exhibited tolerance to reduced oxygen across species. In silico conserved/key residues responsible for anoxia tolerance across species are described. Initially, the protein domain of vitellogenin gene of Danio rerio was extracted and subjected to similarity searches across species. Selected proteins from similarity search were compared with the submergence 1 gene products of plant species, specifically, with Oryza species to identify the conserved regions of interest by multiple sequence alignment. Residue(s) conservation was determined across species by in silico proteomics analysis. Results show that the residue arginine was conserved at a defined position in the final alignment profile of proteins of the species studied and identified as a key residue responsible for reduced oxygen tolerance across species. From an evolutionary point of view, proteins responsible for submergence tolerance in aquatic plants were found much closer to the proteins responsible for anoxia tolerance in fishes.
Article
Known genetic loci that affect metric traits may be useful in livestock improvement. Their value depends on the proportion ( R ) of the total additive genetic variation due to the known loci relative to the heritability of the trait concerned and on the form of selection practised. When normal selection is effective, further information on known loci can add only a little to the rate of improvement. But if normal selection is not very effective, as for characters of low heritability, or if indirect selection on relatives must be used (as for sex-limited or carcass traits) then known loci may add significantly to the rate of improvement possible. Sampling errors in the estimated effects and in the proportion ( R ) may cause selection effort to be misdirected and may even lead to losses rather than gains in improvement. Such errors are most likely to occur when the heritability of the character is low. Reports on several loci with large effects in the various farm species have been summarised, but the evidence is often inconsistent and contradictory. At present, there appear to be no loci that could be used with confidence in the improvement of economic traits in farm animals.
Article
Up to date research in biology, biotechnology, and medicine requires fast genome and transcriptome analysis technologies for the investigation of cellular state, physiology, and activity. Here, microarray technology and next generation sequencing of transcripts (RNA-Seq) are state of the art. Since microarray technology is limited towards the amount of RNA, the quantification of transcript levels and the sequence information, RNA-Seq provides nearly unlimited possibilities in modern bioanalysis. This chapter presents a detailed description of next-generation sequencing (NGS), describes the impact of this technology on transcriptome analysis and explains its possibilities to explore the modern RNA world.
Article
In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Article
The cattle tick, Rhipicephalus microplus, is arguably the world's most economically important external parasite of cattle. Sustainable cattle tick control strategies are required to maximise the productivity of cattle in both large production operations and small family farms. Commercially available synthetic acaricides are commonly used in control and eradication programs, but indiscriminate practices in their application have resulted in the rapid evolution of resistance among populations in tropical and subtropical regions where the invasive R. microplus thrives. The need for novel technologies that could be used alone or in combination with commercially available synthetic acaricides is driving a resurgence of cattle tick vaccine discovery research efforts by various groups globally. The aim is to deliver a next-generation vaccine that has an improved efficacy profile over the existing Bm86-based cattle tick vaccine product. We present a short review of these projects and offer our opinion on what constitutes a good target antigen and vaccine, and what might influence the market success of candidate vaccines. The previous experience with Bm86-based vaccines offers perspective on marketing and producer acceptance aspects that a next-generation cattle tick vaccine product must meet for successful commercialisation.
Article
The discovery of copy number variation (CNV) in the genome has provided new insight into genomic polymorphism. Studies with chickens have identified a number of large CNV segments using a 385k comparative genomic hybridization (CGH) chip (mean length >140 kb). We present a detailed CNV map for local Chinese chicken breeds and commercial chicken lines using an Agilent 400k array CGH platform with custom-designed probes. We identified a total of 130 copy number variation regions (CNVRs; mean length = 25.70 kb). Of these, 104 (80.0%) were novel segments reported for the first time in chickens. Among the 104 novel CNVRs, 56 (53.8%) of the segments were non-coding sequences, 65 (62.5%) showed the gain of DNA and 40 (38.5%) showed the loss of DNA (one locus showed both loss and gain). Overlapping with the formal selective sweep data and the quantitative trait loci data, we identified four loci that might be considered to be high-confidence selective segments that arose during the domestication of chickens. Compared with the CNVRs reported previously, genes for the positive regulation of phospholipase A2 activity were discovered to be significantly over-represented in the novel CNVRs reported here by gene ontology analysis. Availability of our results should facilitate further research in the study of the genetic variability in chicken breeds.