Questions related to Quantitative Genetics
Hi, I have problems to get a SNP input file working well in Genalex. I tried to change letters by numbers (A=1; C=2; G=3 and T=4) in one column but it does not work. It does not seem to work either using numbers and two columns (codominant). Some help on that ?? In attached, an exemple of my SNP data set. Thanks
Greeting to all researchers,
As we all know that YVMV in okra is not yet all seed transmitted viral disease, so, what happens if we go for harvesting of those infected plants. because, breeding for viral disease resistance is not only concerned trait.. so, the question arises can we go for harvesting of YVMV infected plants for generation proceeding?
Hi, I am working on a theoretical quantitative genetics paper exploring rank distributions of parents and progeny based off of genetic (narrow heritability) , GXE, and environmental (random, maternal effects) influences. I want to test some data but public datasets of complete (not summarized by moments or correlations) data of parent-progeny phenotypes are hard to find.
Ideally, I would like a data set with animal ID, animal sex, sire and dam ID, sire and dam phenotype, and animal phenotype. Best case would be an approximately outbred population but I accept a lot of agricultural and lab animals have some inbreeding present so I can take what I can get a long as it is not a consciously inbred line or with frequent backcrossing, etc.
Also, the number of progeny per coupling is an important variable so animals with a moderate number of offspring (5-10) per female would be appreciated.
Thanks for any help.
Hello everyone, I want to learn a new programming language. The purpose is to write software for animal quantitative genetics. I found that the three most popular software (BLUPF90, DMU, ASREML) in the quantitative genetics of animals are all written in Fortran. But some of my friends suggested that I learn python.
I am good at the R language. I think there is not much difference between R and Python, and Julia is also one of my languages to consider.
So I am now confused about which language to learn? Which language is more suitable for animal quantitative genetics? Anyone who knows, please give me some suggestions. I still have three years to learn this new language.
I have a population of a dioecious species with significant phenotypic variation and want to select individuals from this population. What statistical methods can I use to perform multi-trait selection or dissimilarity studies with data of individual plants? If have a R package is better yet.
I am having a problem running PHASE. The error that appears to me is that the number of alleles is too large, that I need to increase the KMAX value in constants.h and recompiling. It is not explained in the manual how to do this. I have seen an archive called constants.hpp within the folder phase-master\src\phase.2.1.1.source, but it does not work to change the Kmax number here.
Could you help me with this issue?
I do not know how to do it in MS-DOS
Thank you for your help
I have un-replicated mean data of various grain yield traits from two consecutive years. I want to estimate the broad sense heritability from these data. Can I estimate? Kindly give your kind suggestions. If yes then please share me an example or any link?
Apologies if this does not make sense, I am very new to GWAS analysis.
I have recently imputed my data using the Michigan imputation server which has aligned my data to the HRC reference panel and has returned chr.vcf.gz and chr.info files for analysis.
I am now trying to filter to data and convert to plink bim/bed/fam files, however, my question is this. My data contains multi allelic variants and so may have duplicate ids. If I filter on MAF 0.01 using vcftools, if one of the alt variants has MAF<0.01 will it also remove the other due to the same duplicate id?
Alternatively, I was thinking of converting to plink format without filtering, however, I believe plink 1.9 default is to keep the most common alt allele and set the other to missing. I don't know what effect that would have on my analysis.
Hope this makes sense.
Any help is appreciated, thanks.
I am currentliy investigating post harvest data of breeding-field trials* and besides phenotypic correlations among traits, I also receive genetic correlations from the analysis software.
For instance I receive the values
1.08 + for the relationship yield with test weight
0.71 ++ for the relationship yield with thousand kernel weight.
My questions are:
1. Why are there correlation coeffecients above 1?
2. Is the "+" nomenclature a common nomenclature for significance levels in genotypic correlations?
3. What would I use as an indirect selection trait for yield in my case: test weight or thousand kernel weight?
Thanks in advance!!
*wheat yield trials, analysis with software Plabstat
I have data from randomly selected individuals for which i wish to determine the PI(sib). What is the best software program to find the Pi from the sequenced data against 9 markers?
I am testing the hypothesis that being heterozygote at a certain gene locus, say X, increases the chances of having the good allele of X compared to being homozygote. I have data of more than 6000 individuals with their zygosity for gene X. What could be the background distribution of allele frequency for testing the hypothesis if the observed frequency of good allele is more in heterozygotes compared to one expected by chance. Thank you so much.
The Best Linear Unbiased Estimates (BLUE) are to be used in GWAS of resistance to yellow rust in durum wheat landraces. I have data from three environments generated through alpha lattice design in 2 replications. Per replication, I have 10 blocks each containing 30 accessions summing up a total of 300 materials. The accessions were sown in two rows of 0.5 m long and 0.2m far apart; the distance between two accessions is 0.4 m.
I have 3 data sets of resistance evaluation from two locations generated in a field experiment of alpha-lattice design ( 2 replications of 300 materials and per each replication, 10 incomplete blocks containing 30 accessions). Two of the data sets are from the same location but different years; and the third one is a single year data from another location). So I am thinking of calculating the BLUE / BLUP of each location and a total one for the combined data to be used in the GWAS.
I needed to fine map a 10cM QTL region to a narrow interval to find candidate genes associated with the phenotype. I did QTL mapping with F2 populations. and selected the recombinant F2 plants using the significant (flanking) markers and selfed to produce F2:3 seeds. Would some one direct me how to find the number of F2:3 individuals needed for fine mapping based on recombination frequency in the interval?
Thank you for sharing the thoughts in advance
can correlation alone be useful?..have seen studies interpreting twins using correlation coefficient alone.. will it be enough to tell the role of genetics in diseases..?..if not, which other tests can be done?...came across this two tests heritability estimate and proband concordance..but there are mixed reviews about this tests...please suggest me how to carry on..
what I considere till know is Environment variance + Genotype variance + GxE variance needs to be 100% however my editor said it should be E + G + GxE + error variance =100%. Further I considered Residuals means square as error is it correct?
I am soon to be analyzing some nanopore data and wanted to see if anybody has any particularly good papers on the technical approaches of best pipeline for this type of data? Thanks.
I fed in my diploid data (29 samples, 18 primers (loci) and average of 8 bands/locus) for dominat analysis for Shanon Index, total allele diversity, percentage polymorphism etc sucessful. But at the last comment of analysis, the dialog box showing OUT MEMORY was displayed. How do i overcome this problem?
Is it methodologically correct to normalize grain moisture data in maize to mean zero and unit variance and then calculate variance components?
Since grain moisture in full-season maize hybrids is environmentally limited (in environments of interest) and in some years dry-down before autumn frost is slower, while faster in other years, would it be convenient to perform variance components analysis with data that is centered to same mean? In some years (rainy), there is 250-290 g/kg-1 water in grain while in others, dry years, there is no more than 200 g/kg-1.
I am aware that this would shrink environmental variance, and give information only about genotype x environment interaction and genotipic variance, but if goal of selection is only to get drydown faster than in some parental line, information should be preserved since the relations within environments would not change.
Any opinion is much appreciated.
I have 15 genotypes each of corn and sorghum, planted in 2 different irrigation regimes with 3 replications in each regimes. I have data related to these 6 replications - yield/area, yield/panicle, chlorophyll content, plant height, total plant weight, harvest index, etc. I need to estimate BLUEs, BLUPs, prediction and/or estimate table of effect of each line for a major trait and effect of each environment for the same trait using mixed model (also Bayesian but not REML because some variance components are in negative values) in JMP.
Making inbreds is the most important step to develop hybrids, and it is not always easy to find genetic variations, so, some researchers try to get some new inbreds from same old inbred, by selfing different plants in color, stature, ear length,...etc which character is better to start on , quantitative or qualitative ?
When would one use QTL mapping to find loci underlying complex traits?
What are the advantages of QTL mapping over GWAS? When would you want to use GWAS instead?
I've noticed some recent papers that use a combination of both techniques, but I'm wondering what their key differences are.
Just as the title says:
Suppose I have a set of genotype data with hundreds of samples. I want to simulate the gene expression data of the same samples, and at the same time incorporate the genotype information. I know there is tool like ruvcorr. But can the simulated expression also integrate genotype information as well?
Thanks a lot.
I am working on germplasm materials collected from 9 different locations. Several different species were collected from each location. However, the number of species collected from each location are not equal (some location does not have certain species). Which statistical model should I use to evaluate the genetic variances of this population? Attached is an example of my dataset. Any help is much appreciated.
I want to test how much variation in expression of 10 different genes (combined effect 10 genes) explains variation in one particular phenotype.
I have obtained the microarrays data for the large cohort (both sexes). I have performed initial GWAS for all the SNPs from all the chromosomes to check the genetic association with trait which I am interested in. I found some regions but the most interesting is the one in X chromosome (in my opinion it is not a fake). However, I am a bit confused because I do not know - can I? and how can I? - analyse these data. for women there is standard 3 alleles distribution but for men, it possible to have only 2 variants: presence of allel or lack of allel.
- should I divide cohort for separate analysis for men and women subsets?
- what kind of statistics should I use for men, because I think there is impossible use simple MAF? and are the statistics results only for men subset from PLINK are reliable?
- or do you have any more advice?
I would be very grateful for all you help.
Question : In a large random mating population, there are initially 91 percent dominants for a monogenic trait.
a. If there would be 20% selection against the dominants, what would be the expected frequency of the three genotypes in the next generation?
b. If selection would be against the recessives at an intensity of 0.10, what would be the frequency of the recessives after one generation? What would be the change in the frequency of the recessive allele?
c. If selection against the recessives would be complete every generation, how many generations would be needed to reduce the frequency of the recessive gene to 5%?
d. If the population would be selfed before completely discarding the recessives, what would be the frequency of the recessives after one generation of selection?
I have some categorical data that I would like to analyse with GenStat software using GLMM Model. But I am not sure how to introduce data in software, is it like row data ( please see attached file ) or contingency table since are not continuance variables?
Thank you for your help
In mlm : genetic Variance (σ2G) =?????
in glm genetic Variance (σ2G) = CMp - CMe / r
Env Var (σ2A) = CMe
Fen Var (σ2F) = σ2G + σ2
donde: Msp = mean square of
Mse = mean square of
experimental, r = rep
In mlm : genetic Variance (σ2G) =?????
Hello everyone! Does anyone know how I can transform the data output from brlmm-p to a phd file from the Afflymetrix 5.0 genome-wide chip?
probe_id A-100 A-1 A-2 A-4 A-3 A-5 A-6 A-11 A-12 A-13 A-14 A-15 A-16 A-17
SNP_A-1780520 1 -1 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2
-1= no call
0=AA (homozygous for the reference allele)
2=BB (homozygous for the alternative allele)
Thank you in advance!
I'm a R user and would like to perform some quantitative genetics tests. The experimental design is full- half-sib design (each male mated with two females).
Thanks in advance
Is mixed linear model (MLM) always better than GLM (general linear model) in association mapping? I was told by a researcher that MLM model is always better than GLM in association mapping, because MLM uses not only the population structure but also the kinship to control the false positive, whereas GLM only considers population structure. However, I was also told by others that it is better to choose the right model and MLM is not always better than GLM. Indeed, I read some recent published papers that only use GLM in association mapping. My question is that under what kinds of circumstances, GLM is a better option compared to MLM? How can I determine that? Thank you for your help and answer in advance!
I am not familiar with GenStat progamtion,So, I am using just Windows version, I would like to know how to calculate broad sense heritability?
Hi folks - I am trying to run a Joint Scaling test but am having difficulties finding a step-by-step description of how to run one. Does anyone have a great go-to book? (Please note, my university library doesn't have a copy of the Lynch & Walsh (1998) textbook.) Any suggestions of where to look/read, would be much appreciated!
Can I compare the copy numbers of two genes estimated by qPCR? I am quantifying two genes from soil samples (gene A and gene B), assuming I get a copy number of 1x103 and 1x105 for gene A and B, respectively. Can I say there is less gene A present in soil than gene B? Or in a quantitative way: gene A is 100-folds less than gene B?
Additionally, can I compare the copy number of gene A in other publications that estimated gene A using different primer set?
Thank you for your help
I have 100 genomes and they were tested in three replication. I need to know the model in R which allow me to know if there is a significant interaction between replications x genotypyes
I know the model will be replication + genotypes + replication x genotyeps. How can I test the significance of replications x genotypes
I'm working on a gwas project and running the data in plink (with the --assoc and --adjust options) originally gave me a genomic inflation factor of around 1.12. I conducted a pca of the data using gcta software and tried to exclude samples seen as outliers but my genomic inflation factor just seems to keep increasing (1.13 - 1.14) everytime I exclude samples. Is there a reason why this is happening?
Edit: Forgot to add that the samples I am using are all cases (i.e. affected and unaffected groups are made up individuals positive for a particular phenotype). Would such sample sets lead to the slight increase in genomic inflation originally seen?
I actually want to test association between disease genetic markers and quantitative traits other than the disease phenotype itself.
The disease markers have dominant mode of inheritance and are monogenic but I would like to know for testing association between them and other quantitative traits that are polygenic; which model serves better, Dominant genetic model or the Additive one?
I intend to obtain kinship matrix from pedigree data through proc inbreed to be used as input to proc mixed in order to obtain additive genetic variance. The warning message is always "Individual clone=I011412 has been previously defined. Observation 27 corresponding to this individual will
not be processed." How do I order individuals to avoid this warning message?
proc inbreed data=pedigree covar outcov=kinmat;
var clone female male;
In plant breeding, we often talk of estimate of random effect i.e. BLUP and variance component on quantitative variables such as yield, plant height and etc. My thought is that estimate of BLUP or variance component for qualitative trait that are ranked or ordinal is wrong and unreliable. Take for example, if disease severity is scaled from 1 = no symptom to 5 = symptom is extremely severe. This is qualitative trait that is ranked! Is it appropriate to estimate the BLUP or variance component of random effect on such trait?
I would be grateful for your opinion.
I am working on archival FFPE prostate samples as old as year 1994. The RNA quantity varies between 991-14500pg/ul ( Agilent bioanalyzer) and RIN number of 1.5-2.5 at best and I only have less than 2ul for each samples. I plan to reverse transcript RNA to cDNA and move on to Fludigm experiments.
What is the best kit for reverse transcription with low RNA concentration, low RIN number samples?
The 15 farms have new treatments in common but the local variety which is considered to be a check variety varies from one farm to another. There was NO replication of each treatment within a farm. Therefore, each farm is taken as a block in order to estimate block effect and residual term for comparison. Does it make sense to make a pairwise comparison since the local check is unique from one farm to another? The SAS script for the analysis is as follows
proc mixed data=onfarm covtest;
class farm variety;
model fyld=variety nohav/ddfm=satterth; /* nohav is a covariate */
lsmeans variety/diff adjust=tukey;
I am working on QTL associated with seed yield using SNP. 18 SNPs have been generated, my concern is, to what extent could this small number affects the authenticity of my result? I limit the number because of the cost associated with increasing the SNP number. I am a self sponsor female PhD student. Please ,do kindly advise me
I`ve been seaching on the internet but can`t find good examples, guides and any other tutorial-type documents for INLA (and AnimalINLA) being applied to quantitative genetics.
Does anyone have any good material?
Dear all, Can you please tell me which allele should be selected for a QTL with additive effect in DH populations? I have a DH population, lets say it was generated by crossing cultivar (cv.) A with B and cv. A is poor for a particular trait (e.g. protein content) while B is superior. In genotyping, parent A is scored as A and parent B as B. In QTL analysis I identified 5 QTLs, 2 with negative additive effect and 3 with positive additive effect. In this case, which type of QTL (either with -ve or +ve additive effect) is/are desirable to increase protein content and for that/those desirable QTL(s) which parental allele(s) should be selected?
I am very new to the topic of Transposon sequencing. Can anyone suggest a good book for beginners? My question at this moment is :
I come across the term Fitness very often when reading articles on Transposon-sequencing. For example: Tn-seq, a robust and sensitive method for the discovery of quantitative genetic interactions in microorganisms through massively parallel sequencing. The approach
does not depend on a pre-existing array of mutants but is instead based on the assembly of a saturated transposon insertion library. After growth of the library under a test condition, the change in frequency of each insertion mutant is determined by sequencing the transposon flanking regions en masse. The change in frequency reflects the effect of the insertion on
fitness. Fitness of every insertion in a genome can be determined in this way and is a quantitative measure of the growth rate.
I'm trying to run a matrix of co-dominat diploid data in POPGENE 1.32. I'm working with 3 populations and 9 loci.
My matrix was coded according to the manual, in fact I already got some results but I want to know the flow gene between popA VS popB, popA VS popC, etc...
I was looking for a detailed manual on web but no results yet.
The results that I got is added in the image.
Thanks in advace!
Currently, I am interested in several (around 100) genes in fish and would like to investigate their expression level using public available RNA-Seq data. My strategy is to build up the reference sequences (interested genes). Index them with bowtie 2 and then align the public available RNA-Seq SRA data (filtered using SRA tool kit) against it. The obtained SAM file was further counted by eXpress for each gene expression level using the FPKM value.
I have several questions about this strategy,
Firstly, when building up the functional gene reference, what kind of sequences should I use if there is no genomic data available? For example, gene A may studied by several scholars and their sequence results can be found in the NCBI Nucleotide database but with difference lengths. Which one should I choose. Besides, RNA splicing proceeded during RNA expression, introns may be spliced out. Therefore, which sequence should I use before or after splicing (this is important because the length of the gene affect the final FPKM value) and how I can identify whether the obtained RNA sequence is spliced or not.
Secondly, is there any problem with the estimated expression level using this strategy? Over or underestimated.
Any other suggestions are strongly welcomed!
I have calculated genetic gain across two generations with mean values from two generations (F8 and F9) that were grown in two years. How to write the units for this situation?
Ex: Yield = kgha-1 generations-1 (gain in kg per hectare per generation)
Do I need to divide it by two to get per generation? Can't it be considered as one generation because genetically F8 and F9 are almost fixed and our genetic gain calculations were based on mean value across two generations
I am preparing a quantitative genetics post-doc project on age-related dynamics in vertebrates (probably only birds in a first instance) and there is a strong bias in the suitable datasets toward temperate and polar species. I am thus wondering whether anyone would have suitable datasets on tropical species. The main criteria would be a good pedigree (even if only a social pedigree) for the population and a time span sufficient to have individuals of known age measured in their late life.
I would be happy to describe the aims and methodology of the project in more details and discuss a collaboration with interested people.
Thanks for your help!
I want to calculate pooled genetic gain across 5 environments but trials across 5 environments are in different designs and different number of replications. So practically speaking I cannot do combined analysis. Therefore I have done individual analysis. But I would still like to see genetic gain across all five environments. Can I use mean values of each of these environments to get a pooled mean across 5 environments and use pooled mean values to estimate genetic gain across 5 environments? If not want is the best way to do pooled genetic gain?
Sometimes we need to analyze diallel data using Hayman's approach along with Griffing's approach. Wr-Vr graph is one of the outputs of Hayman's approach. I know 'Dial98' can do the same. But I am looking for a script or package in R.
Can any one suggest the best way to Q-PCR data representation including the +-SE..?. In many cases the expression fold is represented as usual log 2 base 2^-ddct but error bar is plotted in case of control also ... In that case from where the deviation is represented in control also? because double derivative of Ct value makes the control 1 and the log conversion makes the control 0.... So when the data is normalized to control how the error bar comes in case of control?
To calculate broad sense heritability, we normally take genetic variance and divide it by total variances (for example, Genetic variance, GxE variance, error variance).
Proc Mixed (Method=Type3, or REML as default) is often used to get the variance estimate with a model following by a bunch of random terms.
However, if the analysis produce negative covariance estimate, what should we do with this negative number?
In 2014, I had planted 210 lines, 3 checks (1 repeated check & 2 random checks) in augmented design 1 replication, 1 location ( 2 loc were planted but lost 1 loc for late freeze damage). These 210 lines comes from 12 populations (Family structure is complex I have 7 wild relatives back crossed to 2 elite parents). In 2015 out of 210 lines, 93 were advanced to next generation based on tillering ability (alpha lattice, 2 replications, 4 locations). I have done BLUPs and BLUEs for 2015 using META & I got heritability for 2015. I have done moving mean analysis using Agro Base Gen II for 2014. End of the day I have to calculate genetic gain we have achieved for grain yield by indirect selection for tillering ability. Please guide me step by step
The observed frequencies of genotypes (AA Aa, aa) and genotypes (BB, Bb, bb) in the controls were (160, 181, 79) and (182, 132, 6). Both demonstrated a significant departure from HWE (both p=0.03, goodness-of-fit x2 test).
1. Genotyping errors? (The study followed well-described genotyping methods; 5% random samples were assayed twice, concordance >98%; The ’a’ and ‘b’ allele frequencies in controls are very close to that in similar populations from previous reports)
2. The influence of study design? (Case-control design was used in the study. Each case-control pair was matched on gender, 5-yr age group, and study site).
4. Any other possible explanations?
Possible impact on risk estimates?
Thank you for your attention and contribution in advance.
Hi science folks! I am trying to explain the distribution of mutations base in mutagenesis experiments. I calculate mutation frequency based in number of mutants obtained from the total growing, but in order to set the distributions, I have seen the use of an old formula for the mutation rate: -(ln(m0)/n), being "m0" the number of experiments with no mutants found (fraction of 0) and "n" the average bact/ml. However, the mutation rate should be a mesure of mutation per time, so it doesnot seem to me totally appropriate or perhaps I do not understand the maths behind. Is there anyone who used this or other parameters and can help me to understand it? Thanks a lot!
We are having trouble to determine the threshold line of qPCR in a ABI 7500. We always used automatic settings for determination of the threshold line, but recently the software is setting the threshold line below reaction background. Do you have any idea about what is causing this problem and how we can fix it? Attached is following an amplification plot of the wrong threshold. Thank you.
I have been dealing with some data from PLINK. Actually, from those stored in the GWAS catalogue, which are GWAS performed by many groups. I want to understand the meaning of the p-values and the possible correlation with odds ratios.
I understand that a p-value refers to the statistical significance of the association to a particular SNP, and an odds ratio refers to the increase risk of having that particular SNP among individuals of a given population. PLINK presents beta values instead of odds ratios, which is basically a log of the odds ratios. But is it possible to predict or calculate a potential odds ratio from a p-value for a given SNP?
I calculate GLMs where the response variable is presence/absence of each allele. There is 6 alleles in total, the organism is diploid and I suppose that using Bonferroni correction with adjusted alpha = 0.05/6 may be too conservative, as there are two alleles per individual. Unfortunately, I cannot assign the allele to locus. Has any of you dealt with similar problem?
I have recently come across a clinical study that expressed gene expression in the following way: "RNA results were then reported as 40-DeltaCt values, which would correlate proportionally to the mRNA expression level of the target gene." (Where delta Ct was the difference between the Ct values of the gene of interest and a reference gene. In this case 40 cycles were used for amplification.) In what type of experiments is it useful to apply this (40 - delta Ct) calculation? How does this relate to the more frequently applied 2(deltaCt) - method?
the qPCR standard curve for a B.bifidium DNA starting from 20ng to 0.00002ng is giving strange Ct values , eventhough i repeated the qPCR three times in case i mixed up the serial dilution tubes. The Ct value for the highest conc DNA is higher than the less concentrated !What could be the reason? Knowing that the efficiency of the reaction is 90.2%. Below are the average Ct values i got .
I have measured the sugar content of some potato landraces, These data are not normally distributed, I think this is mainly because I have some few individuals with extreme phenotypes. I have read in association mapping articles that many authors use normalized phenotypic data to perform association analysis. For me it is not clear the fundamentals of using transformed data into these analysis. I would be very grateful to hear your opinion into this matter.