Science method
Single Nucleotide Polymorphism - Science method
A single nucleotide variation in a genetic sequence that occurs at appreciable frequency in the population.
Questions related to Single Nucleotide Polymorphism
All the data conversions (SNP file and metadata file) have been done correctly. The impute command is also executed correctly. Both the "frequency" and "neighbour" commands were tried, but all the missing data simply gets replaced by "NA" and does not allocate an imputed value. Can someone tell what is going wrong?
Note: My species is an oprhan tree species, which does not have any population sequencing data to use as a reference panel, although the genome has been sequenced for a standalone tree. I want to use de novo imputation using the existing data.
I'm researching one of the lignin biosynthesis encoding genes in 9 accessions of foxtail millet (Si4CL1). I amplify the gene using the standard PCR protocol and perform sequencing using Sanger method. I already got the sequence and already aligned the gene with the DNA genomic sequence from the reference as well as the CDS sequence. The aligned sequences show several SNPs in both exon and intron, and there are about 300 bp deletions in the intron part, but there's no difference between 9 foxtail millet accessions, the difference is only with the reference sequence (I don't know, can I still call it SNP or not). I'm curious whether are there alternative splicing or not, but I don't have enough budget to do mRNA analysis, so I plan to do only prediction but I have no idea how to do that. Is it possible to do the prediction analysis?
The length of the Si4CL1 gene is about ~4500 bp, I divided the reference sequence into 7 fragments to design primers for the amplification (with the length of each fragment about ~700 bp). The deletion is found in the 4th and 5th fragments, it is because when I tried to amplify those parts, I got the clear single band in the length of about ~500 bp, and I also already tried to repeat the amplification several times but I still got the same result. So I think the result wasn't caused by a technical error.
Here's the link to the reference I used in my research:
PS:
I'm still a student in my master's degree, and actually my previous degree isn't related to molecular biology, so this thing is quite new for me, you can ask for clarification if you feel my question is unclear
I have a few sample vcfs which are not in a very good quality. They are 23andme files from OpenSNP in the following format:
rsID, chromosome no, position, genotype
I have tried remapping them using Galaxy. However, I guess the error is due to the format. The vcfs contain only SNPs.
ANY IDEAS PLEASE? How can i make it work?
These vcfs are mapped on the GRCh36/hg18 and need to be remapped on hg38.
I have a specific list of SNPs (according to the hg38) in a csv format which I need to filter from each of these vcfs after remapping.
Please suggest any alternate workflows if there are any to help me make this work.
I am currently conducting a mendelian randomization study, and I was attempting to use PhenoScanner to look for potential confounders associated with the selected SNPs (any SNPs significantly associated with risk factors of my outcome). I tried using the PhenoScanner website directly, and accessing the database via R, but to no avail.
Hi, I am designing primers to discriminate SNP alelle, which is 2 forward primer specific for each allele. To improved the discrimination of SNP, what is the acceptable delta G value of the primer?
Hi, I am doing research on quantification of SNP on miRNA. I designed a universal reverse primer and two forward primers that have 3’ end are specific for each alleles. Anyway, when I did experiment using sample of allele A adding Forward specific for allele C instead of primer specific for allele A, I obtained the result that the primer specific for allele C is binding to allele A sequence and synthesize.
How can I make two primers specific when it is binding to a sequence with just only one nucleotide difference?
I understand that VCFtools can do the trick, so does tool like vcf-kit. I also read some high profile papers that report Tajima D based on VCF file. I am always afraid of calculating Tajima D using VCF file. In my experience, Tajima D is very sensitive to the low frequency polymorphism. However, in VCF file, some of the low frequency (say less than 1%) variations are caused by errors of sequencing or SNP calling process, and may not be the real variations. So in most VCF file based analysis, we do SNPs filtering and discard those SNPs whose af are less than 5%. If we discard low frequency SNPs, Tajima D will be definitely affected. So I found myself in a dilemma. Any help is appreciated.
I am getting problem to exactly identify the SNP position.
Seeking Efficient Tools for Detecting Disease-Causing SNPs in Large Genomic Datasets. Any recommendations for advanced computational tools or strategies? I already used SIFT CADD MSC AlphaMissense etc and I only have the data from 7 patients and their parents. thank you for your help !!!
I am getting a significant marker trait association for traits. But understanding the allelic association is difficult; NN is the missing allele. Can anyone help me?
We have done in-silico analysis on SNPs (pathogenicity analysis, stability analysis, etc), which has given us the deleterious SNPs that were affecting protein structure. Now that we want 2 to 4 SNPs for experimental validation in a particular population using ARMS-PCR, my question is: how do we select the SNPs?
What if there is no literature on the SNPs but they are marked as deleterious?
How do we deal with such SNPs in the experiment?
Hi, I have a question regarding my PCR products. Okay so i did some primer optimization for primer 1 using gradient temperature starting 55-60’C and i chose temperature 56.6’C as my temperature as it show a clear band. However, when i did amplification using primer 1 at 65.5’C for all my 60 samples, they showed double bands which are quite questionable as they did not even showed up when i did my optimization. Can someone help me to identify the problem? I did the same step with the same primer and the same machine. Here, i attached my result. For your information, 1e is the band for 56.6’C.
Collaboration and joint research Call:
Recently we conducted studies on BRCA1, BRCA2, EGFR, and ESR1 mutations in cancer! Our computational approach identified potential pathogenic variations in these genes, increasing the risk of cancer development.
However, with limited funds, instead of using sequencing, we intend to utilise ARMS-PCR for experimental validation. Notably, the predicted SNPs are not found in literature showing risk with cancer.
How do we select and validate these SNPs for further investigation?
Let's discuss strategies for bridging the gap between bioinformatics predictions and experimental validation in cancer research. hashtag#CancerResearch hashtag#BRCA1 hashtag#BRCA2 hashtag#ESR1 hashtag#EGFR hashtag#Bioinformatics hashtag#WetLabE
these SNPs do not have restriction enzymes is it possible to do RFLP
Which database should I use to extract phenotype data related to protein and yield traits and genotype data consisting of SNP markers from the selected sources?
Hello,
I received some GWAS results. There is a column for SNP, position, REF/ALT allele, p value, SE, which is pretty straightforward. However, I am confused about the column titled 'Effect', they are values ranging from -1.4 to 1.3. Any help would be greatly appreciated, thank you in advance.
Can anyone please tell me the database names or websites from where I can download human SNP datasets along with the quantitative traits (phenotypes) for genome-wide association studies (GWAS)?
I'm using the Novaseq 6000 and HiSeq 4000. Assembled alone, the Hiseq data has little missing data and many reads per sample, Novaseq has more missing data but still is useable. When I assemble them together, Hiseq individuals have few or no SNPs. I've checked trimming for both datasets and that does not appear to be the issue.
Assembling using ipyrad, I've assembled de novo and mapped to a reference.
I am doing a research on pharmacogenetics of treatment of type2 diabetes. the SNPs i have selected are in close proximity on the same chromosome. I want to carry out linkage disequilibrium and haplotype analysis in our research. Can anyone please give the easiest way / program to conduct this and how the results are interpreted
Hi,
Does my TaqMan probe recognize the gene only if the sequence has 100% coverage with the probe or will I also get signal if there is a SNP or even a few bp difference? And if a few bp are ok are there any consequences, e.g. weaker binding?
Thanks
Aleks
How can you identify a specific variant (specific SNPs) after we have received the sequences after sanger sequence ? that is mean how we can reveal a specific variants in the sequence ?
Thanks :)
Hi,
I have provided my LD plot by Haploview and I used Linkage Format (.ped & .info) to this aim. I am wondering how could I have the gene structure on the top of my LD plot ?
I mean I need something like this :
I am trying to predict the stability of a protein with different SNP. I tried using DUET, Predict SNP and Dynamut. The problem with DUET is that I cannot do double mutation however, it gives fast result. But Predict SNP and Dynamut takes long time to generate the result in my case.
Please suggest me other tools that can be used for the stability prediction that are accurate also convenient.
I want to assess multiple SNPs in multiple sample treatment groups and I am not sure which lab technique is the best for that?
Hi, in last period we used dbSNP database in NCBI for two
variants rs2304365 and rs17315309 in ST18 gene According
to the dsSNP database, about the variant rs2304365 the wild type
allele is C and the variants are C or T, But according to the
publications the risk alleles is A not C or T (Etesami, I.,
Seirafi, H., Ghandi, N., Salmani, H., Arabpour, M., Nasrollahzadeh, A., ...
& Keramatipour, M. (2018). The association between ST 18 gene polymorphism
and severe pemphigus disease among Iranian population. Experimental
Dermatology, 27(12), 1395-1398.)
Also specifically about rs17315309 the wild type
is A and the variants are G or T , but in the references ( Vodo, D.,
Sarig, O., Geller, S., Ben-Asher, E., Olender, T., Bochner, R., ... &
Sprecher, E. (2016). Identification of a functional risk variant for pemphigus
vulgaris in the ST18 gene. PLoS genetics, 12(5),
e1006008. ) ,
the article that cited in the database the variant risk allele is C, I don't understand
why in the most of articles or specific articles that cited in the dsSNP
database mention different SNPs with the dsSNP database .
I really want to understand this,
Thanks:)
I need to calculate number of different and effective allele for a diversity study using large SNP dataset. Normally I use GenAlex but I can't load the data due to limited no of columns. Please advise.
I keep getting the same answer when googling this question (which is the percentage in a population).
is there any difference between non-synonymous SNP and mutation?
and between synonymous SNP and silent mutation?
What's the difference between their effect on protein function?
I'd appreciate a reference on the topic.
thanks.
Dear All
I got some significant markers from different articles and I want to convert the genetic positions (cM)of these markers to their physical positions in base pairs (bp).
Regarding interpretation of sequence variants using ACMG rules.
Are ACMG rules PS3 and PP3 exclusionary?
In other words, if both PS3 and PP3 rules are fulfilled, can PP3 can be applied?
PS3 - Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product.
PP3 - Multiple lines of computational evidence support a deleterious effect on the gene or gene product -conservation, evolutionary, splicing impact, etc.
P.S - For me, they are not exclusionary.
When comparing SNP densities for intronic and exonic PAS, how did you account for the fact that exonic PAS lie within coding regions and, thus, are under amino-acid sequence-preserving selection? Did you consider only SNPs that do not alter aa sequence?
I want to know what is more recommended for Specific SNPs (single nucleotide polymorphism)
Who has good experience, What is more recommended GWAS or candidate gene? for Specific SNPs in specific genes are related to specific rare disease .
Also for small genetically homogenous samples ? what the best methods and what the diffrences ?
Hi Everyone,
I am working with data of SNPs, I want to do logistic regression analysis.
In multinomial logistic regression, is it compulsory to choose most common genotype as reference? or I can choose any genotype as reference?
In my one SNP (Genotypes: II, ID, DD), when I choose most common II genotype as reference than Odds ratio come out like 0.57, but on choosing ID genotype Odds ratio change to 1.67 with p <0.05. Is it fine to choose heterozygous genotype as reference?
Thanks,
I performed an analysis for a specific gene/region of interest using GWAS summary statistics for a binary phenotype (+/- disease). Using the corrected p values in the summary statistics and mapping in FUMA, I found a significant missense/coding SNP, with p=0.025. Other papers have shown this SNP leads to a less functional protein. This SNP was not in LD with the other sig SNPs (intronic/non-coding). Could it be an independent SNP? Beta is 0.002 and MAF is 0.005. Is it possible this SNP is a false positive due to its rarity? Population size is n=400k and cases n=1100. Any help would be appreciated!
I am planning to process the Illumina Omni5 SNP data. However, I am currently using Macbook Pro that is provided by my new institute. Does anyone have any suggestions on selecting any Mac-compatible software to replace the functions of Illumina's GenomeStudio? I need to analyze the raw data from the Omni5 SNP platform. Thanks a lot.
I need CEL1 enzyme to use it in my laboratory work to complete my PhD thesis. Unfortunately, I didn't find it anywhere. So I'm waiting for any request about companies name where is available or anyone who sell it for me.
thank you.
I am asking about the meaning of the letter "c" in c.G421T?
As above, I'm wondering what is the justification for removing monomorphic SNP loci for genetic diversity analysis?
Using a genome wide association study, I am analysing SNP data for a wide ranging animal species from multiple regions and want to be able compare diversity between regions.
Screening for monomorphic SNPS results in loss of up to 20% SNPs for some regions and <5% for others - is it reasonable to compare these data with monomorphs excluded?
I have been using a PCR master mix green coloured from a Promega ( green taq). But for my recent work in a particular gene that PCR master mix is not working. I borrowed Phusion DNA polymerase ( high fidelity Cat no: F-530XL) from a neighbouring lab for two samples and the amplification did take place. My aim is to amplify the gene promoter region ( PCR product size: 269 bp) and study Single nucleotide polymorphism in the region using RFLP technique. For this purpose I have introduced an restriction cut site ( AluI) near the 3' end of my Reverse primer ( single basepair mismatch) . While checking the details of this phusion DNA polymerase I found it has 3'-5' exonuclease activity and it is known for its accuracy. Hence I am anticipating that there is a chance that the polymerase might cut the error basepair for editing and accuracy maintenance. If this happens then I will have false results for my SNP study. I have never used a high fidelity polymerase before, I used taq polymerase which does not have 3'-5' exonuclease activity. In this regard I am confused which polymerase should I buy. I have seen in thermo catalogue there are normal taq polymerases , dreamtaq as well as many other options but I am unaware about their properties and which one to select. I donot have access to more phusion DNA polymerase or any other taq pol at this moment. This particular gene amplification has been complicated for from the beginning. I have 300+ patient DNA samples to look for SNPs so I am looking for an economic option as well. Kindly guide me what my options are regarding this and which polymerase should I opt for.
We have a myostatine SNP g.6215942T>C but can not find some references about this one, need some help for this, thanks
Dear Folks,
To do a functional analysis of a specific gene In SIFT software we need desired gene FASTA sequence and amino acid changes at specific positions are needed. In our case, we have desired genes dbSNP ID only. How can we retrieve Amino acid changes by using db SNP ID?
I have only the rsID of the SNP of interest, but need to determine how much of my population of interest is affected.
Dear experts,
We have analyzed the FOXP3 gene mutation of 10 healthy volunteers and 13 diseased samples. Out of these, 3 healthy volunteers (30%) and 8 diseased patients (61.53%) were found to have mutations at specific SNPs. Now, we would like to perform a statistical analysis of these results. Could you kindly guide us on how to conduct the statistical analysis? If possible, please suggest the software package that should be used for this purpose.
Thank you for your assistance
Dear Colleagues,
I am excited to reach out to experts in the field of bioinformatics to invite you to collaborate on an ongoing research project focused on validating and confirming the significant association of various gene polymorphisms with breast cancer risk within the Pashtun ethnicity. Our recent studies have successfully confirmed the association of the following gene polymorphisms with breast cancer risk: BRCA1 (rs1799950), BRCA2 (rs144848), TP53 (rs1042522), OPG (rs2073618, rs3102735), RANKL (rs9533156), ESR1 (rs2234693 and rs2046210), HER1 (rs11543848), and HER2 (rs1136201). These findings have been published in reputed journals, underscoring their significance.
To further strengthen the validity of our results and expand our knowledge on ethnic-specific polymorphisms, we seek collaboration with experts in bioinformatics who can contribute their expertise to our research. The proposed collaboration will involve the utilization of various bioinformatics tools and databases to enhance our understanding of the identified gene polymorphisms. Specifically, we plan to utilize the following tools:
- ENSEMBL, dbSNP, or NCBI databases: These databases will serve as invaluable resources to retrieve detailed information about the identified SNPs. We will explore their genomic locations, functional annotations, potential disease associations, and other relevant information to deepen our understanding of their implications in breast cancer risk.
- Population-specific variation databases: By leveraging population-specific variation databases, we aim to assess the frequency and distribution of the identified SNPs within different populations, including Pashtun and other ethnic groups. This analysis will enable us to evaluate the potential presence of ethnic-specific polymorphisms associated with breast cancer risk.
- Gene expression datasets, pathway databases, and functional annotation tools: Integrating gene expression datasets, pathway databases, and functional annotation tools will allow us to uncover the functional implications of the identified SNPs. By examining their potential involvement in breast cancer development and related pathways, we can gain insights into the underlying mechanisms and further refine our understanding.
We believe that collaborating with experts like you will significantly enhance the effectiveness and robustness of our studies. We welcome your expertise and recommendations for additional bioinformatics tools that can further enrich our research and facilitate the exploration of ethnic-specific polymorphisms in breast cancer risk.
By joining forces on this project, we can collectively advance our understanding of the genetic factors contributing to breast cancer risk within the Pashtun ethnicity and potentially identify other ethnic-specific polymorphisms. Moreover, our research outcomes will have broader implications for personalized medicine, risk assessment, and tailored interventions in breast cancer management.
If you are interested in collaborating on this research endeavor or have suggestions for other bioinformatics tools that could strengthen our studies, please do not hesitate to reach out to us. Together, we can make significant strides in breast cancer research and contribute to the development of more effective strategies for risk assessment, early detection, and management.
We eagerly anticipate the opportunity to collaborate with you and drive forward our collective understanding of breast cancer genetics.
Sincerely,
I have a list of SNPs for a rice gene. And I have two questions related to this:
1. I want to make a figure showing the respective position of the haplotype in the genomic region (UTR, exon, intron). How can I do it?
2. If not manually, how can I replace the original sequence nucleotides with the haplotype's SNPs?
40 samples show one heterozygous (CA)(175bp, 202bp) and one homozygous allele (CC)(202bp) only and not (AA)(175bp) allele. Attached is just a snippet of some samples showing CA and CC allele mutations. Outer primers cover band size of 326bp.
2.5% agarose gel electrophoresis. Longitudinal. Long run.
Please guide me to find the SNPs of a specific gene region, for example, a specific region of the CYP2D6 gene, How can I find? and which way is good? I tried SNP database for NCBI but I couldn't found.
Thank you,
I'm looking for Durango Diversity (Phaseolus vulgaris) sequencing SNP data. Can anyone please explain briefly?
I have developed this multiplex PCR panel for next generation sequencing library prepration. This panel is used for the diagnosis of a particular bacteria infection, as well as some SNP I'm interested in.
The panel works well with sputum samples, but failed to detected some expected SNPs when we tested with FFPE samples. The copy number of this bacteria might be lower (120) than my detection limit (250). We still managed to get at least 5000 coverage in most of the SNP locations. But only about half the SNPs were called, why not the others?
Hello everyone,
I calculated some pairwise FST estimates with hierfstat (pairwise.fstWC function) in R on my SNP dataset and then calculated confidence intervals (not p-values) on those estimates with the same package (function boot.ppfst). Is there a need to correct my CI for multiple testing?
As far as I know, I am estimating parameters and not testing hypotheses (p-value) so I wouldnt need to correct for false positives. Am I correct? Can anyone explain this to me?
Thanks,
Giulia
I am working on cattle genomics and now, I am looking for a validated cattle SNPs used for VQSR approach in GTAK. Is there any validated cattle SNPs? If not, am I suppose to use a separate vcf or combined (i.e. Includes SNP and INDELs) file for SNPs and INDELs Hard filtering?
I would like to know if you could help me with a theoretical problem. I know the proportion of Europeans, Africans, and Amerindians in the population of Rio de Janeiro. I am studying the frequency of certain SNPs in a sample of leukemia patients. Ideally, I should have the frequency of the wild-type and mutated alleles in the population of Rio de Janeiro. However, this information is not available. I have the frequency information for these SNPs in the European, African, and Amerindian populations from the 1000 Genomes project. Would it be sufficient to calculate a weighted average? Do you have any books or articles that discuss this problem? I found many on the opposite scenario, determining the original populations based on a mixed population. Could you provide some guidance?
Dear All
I need a reference (proofed study) that report the minimum number of markers for GWAS.
Recently, I have read so many papers on GWAS which were published in high-profile international journals. I have found a wide range of markers that were used in GWAS extending from 200 to up to 1,000,000 SNPs.
I have conducted GWAS with GAPIT using almost 400k SNPs and nearly 500 accessions, but I can locate the R2 values from the output, can anyone help me to resolve since GAPIT keep changing? The last time I used was October 2022, it was not like this. Thank you in advance.
Hi,
I see some SNPs are represented with rs+number and some are rsl+number! I do not know what is the difference between the two IDs?
Hi,
Other than randomForest, how do you go about analyzing by GWAS the SNPs genotyping data on categorical phenotypes (say, host species for a pathogen)?
Any pointers would be great!
-Marcin
I have simple sanger sequencing aligned data with SNPs of about 80 specimens. How can I reconstruct the most likely haplotypes from them?
How can you downloaded SNPs from NCBI after the last update on the site?
Hello
I am reading papers about SNPs and I find these sentences "a person with A / G and G / G type mutations in the skin of a healthy person"
I think it means when A turns into G. So what is the meaning of G/G? I can not understand it!!!!
I’m a field biologist working on social evolution. I want to switch from micro-satellites to SNPs to determine kinship and sex of individuals (I'm working on a corvid bird). Does anybody know which commercial companies that can do Dartseq for the SNP discovery step and the subsequent SNP typing (e.g. with DD-RAD methods or similar) in Europe or North America? And, what would the first step and the second step cost for 3000 individuals?
Thanks a lot!
Hello,
I have human TNF-A DNA sequences aligned in Fasta format and I want to import them to Haploview in order to represent the linkage disequilibrium between SNPs.
Someone can explain to me how to get the format accepted by Haploview from my Fasta alignments.
Thank you in advance.
I am trying to use ANGSD (Korneliussen et al. 2014) to calculate population allele frequencies from PL values (Phred-scaled genotype likelihoods). However, ANGSD expects a beagle file containing genotype probabilities and not genotype likelihoods, so I was not able to get the allele frequencies. I am wondering if there is a way to get genotype probability values from PL values that I have in a vcf file or from genotype likelihoods that I have in a beagle file.
Thank you in advance!
Hi!
I'm looking for a database where I could find different SNPs for a set of genes in the context of NSCLC. I've tried Haploview software, but I suspect that the servers are down since 10 years ago and the software doesn´t work, it can't connect to the HapMap database.
I would be very grateful if anyone could help me in my thesis project.
Hi there,
I have a PCR sequence for the SNP of a particular gene,
and I want to submit it to NCBI, so I need to know how to submit the sequence to NCBI to get an accession number online.
Is there anyone who can help?
Regards
Dear community,
I am planning on conducting a GWAS analysis with two groups of patients differing in binary characteristics. As this cohort naturally is very rare, our sample size is limited to a total of approximately 1500 participants (low number for GWAS). Therefore, we were thinking on studying associations between pre-selected genes that might be phenotypically relevant to our outcome. As there exist no pre-data/arrays that studied similiar outcomes in a different patient cohort, we need to identify regions of interest bioinformatically.
1) Do you know any tools that might help me harvest genetic information for known pathways involved in relevant cell-functions and allow me to downscale my number of SNPs whilst still preserving the exploratory character of the study design? e.g. overall thrombocyte function, endothelial cell function, immune function etc.
2) Alternatively: are there bioinformatic ways (AI etc.) that circumvent the problem of multiple testing in GWAS studies and would allow me to robustly explore my dataset for associations even at lower sample sizes (n < 1500 participants)?
Thank you very much in advance!
Kind regards,
Michael Eigenschink
I analyzed two SNPs for linkage disequilibrium using SNPStats. It gave D'=0.9995 and r=0.99 but when I analyze my SNP data Manually, always these two SNPs are either present or absent together in all of my study subjects, giving a hint that these two SNPs are completely linked. Can I call these two SNPs in 'perfect linkage disequilibrium' despite D' and r values being very slightly less than 1. Do we ever get D' and r values equal to 1 practically? Do these values become closer to 1 for SNPs in perfect linkage disequilibrium with increasing sample size?
What is the meaning of the string (*) in CYP3A4*22, knowing that this SNP has the ID rs35599367?
Can someone please explain me the logic behind identifying genes present within 50KB, 100KB and 500KB (both side) of a SNP locus ? How does the SNP affect the function of the genes present within the above mentioned windows?
Hi,
is there anyone who used to analyze the sequencing of SNP data of specific genes?
regards
Hello!
I am trying to find SNPs which are in linkage disequilibrium with my target SNP on LDproxy (https://ldlink.nci.nih.gov/?tab=ldproxy), however I realize I get very different results using different genome builds (hg37 and hg38). In particular, one interesting LD SNP result I got for hg37 completely disappeared for hg38. The SNP can still be found on hg38 ENSEMBL.
I do not think the query region has been removed or changed from hg37 to hg38, and I would really appreciate any explanation regarding the vast differences in results for hg38 and hg37.
Thank you and have a great day! :)
Hi all,
I have a quite large set of SNPs from several geographically distinct populations. I want to estimate average allele frequencies for each SNP among, lets say, two-three westernmost and two-three eastern most populations.
As I understand, it is a rather trivial task for a plenty of further pipelines, but have no idea how to do it correctly.
Can I do it manually by calculating the arithmetic mean of AF value of each allele in each population (taken from vcf), and then calculating the arithmetic mean among particular populations, or maybe, this task should be solved by means of some utility using more complex mathematics?
I've been asked to help with a project determining genotypes of the gene coding for a specific enzyme in patient samples, but the most recent literature seems to only mention the use of qPCR with specific probes for the detection of the specific SNPs of interest. Due to the characteristics of the project, we can't do qPCR at the moment, just PCR, but we have access to sequencing as well, so I wanted to know if that's a feasible option, at least while we are able to perform qPCR
More specifically, not knowing much about the way sequencing is currently done beyond undergrad level lessons, I wanted to know if heterozygosity at a specific SNP could be picked up by commercial sequencing, i.e. getting two smaller, relatively equallly-sized peaks at a specific position instead of just a single peak, from a PCR-amplified fragment encompassing the SNP location. And if that's the case, how certain could I be that's actually a SNP and not just an artifact (assuming it's in the expected location)
Thanks in advance !!