Science method
Phylogenetic Analysis - Science method
Phylogenetic Analysis is an exchange knowledge in the field of molecular systematics, phylogenetic reconstruction and their application to systematics, biogeography and evolutionary studies
Questions related to Phylogenetic Analysis
I've run it three times to try different parameters, but it stops changing after running for a short while. Please help me understand how to fix this. Is there a problem with the sequence or the parameters?
6471000 -- (-192324.911) [-190903.882] (-191836.670) (-203192.846) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:24:32
6472000 -- (-192324.911) [-190903.882] (-191836.670) (-203199.097) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:23:56
6473000 -- (-192324.911) [-190903.882] (-191836.670) (-203206.485) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:23:22
6474000 -- (-192324.911) [-190903.882] (-191836.670) (-203186.976) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:22:47
6475000 -- (-192324.911) [-190903.882] (-191836.670) (-203176.349) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:22:12
Average standard deviation of split frequencies: 0.143548
6476000 -- (-192324.911) [-190903.882] (-191836.670) (-203174.778) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:21:37
6477000 -- (-192324.911) [-190903.882] (-191836.670) (-203190.326) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:21:02
6478000 -- (-192324.911) [-190903.882] (-191836.670) (-203201.748) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:20:27
6479000 -- (-192324.911) [-190903.882] (-191836.670) (-203196.704) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:19:52
6480000 -- (-192324.911) [-190903.882] (-191836.670) (-203173.040) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:19:18
Average standard deviation of split frequencies: 0.143548
6481000 -- (-192324.911) [-190903.882] (-191836.670) (-203178.787) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:18:43
6482000 -- (-192324.911) [-190903.882] (-191836.670) (-203151.441) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:18:08
6483000 -- (-192324.911) [-190903.882] (-191836.670) (-203159.321) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:17:33
6484000 -- (-192324.911) [-190903.882] (-191836.670) (-203156.963) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:16:58
6485000 -- (-192324.911) [-190903.882] (-191836.670) (-203142.242) * (-192291.336) (-195396.079) (-197058.880) [-190667.824] -- 36:16:24
Average standard deviation of split frequencies: 0.143548
I am quite confused because some sources distinguish between 5 types: folded, lamellar, villous, trabecular, labyrinthine (e.g. ; ); whereas some sources distinguish only 3 types: villous, trabecular, labyrinthine (e.g. ; ).
So, who is right? Why the folded and the lamellar types do not appear in some sources? In the sources where they appear, Suidae are said to have folded interdigitations, but trabecular in sources where they do not appear. Carnivora are said to have lamellar interdigitations, but their interdigitations are referred as labyrinthine in sources where the word "lamellar" is not even mentioned. Is the distinction between folded and trabecular spurious? As well as the distinction between lamellar and labyrinthine? This seems odd, because on textbook's diagrams these types are very different.
Guys, I need help with phylogeny. I am very new to this field and currently trying to position a bacterial strain. I have selected four crucial loci for the genus, but I am encountering limitations in employing a multilocus approach. I attempted to use Paup/Mrmodelblocks - Mrbayes; however, even though the individual tree for each locus yields an acceptable topology (confirmed through marker sequences), concatenating the different genetic sequences results in a significant error in the tree's topology. I have tried various solutions to rectify this issue, but despite seeking assistance from experienced friends, I haven't been able to resolve it.
This led me to explore new approaches, considering a phylogeny based more on genomes using orthofinder. Initially, it worked well, but when I attempted to repeat the process, my PC couldn't support it, even though it is a robust computer. I'm using Linux, and every time I try to use orthofinder again, the system restarts. Consequently, I sought another approach, utilizing iqtree. I repeated the process using a multilocus strategy, but once again, I can create individual trees for each region, yet I struggle to obtain an acceptable topology when concatenating the regions to generate the tree for the four chosen regions.
The program does concatenate the sequences, but when I try to run the command to generate the tree with nucleotide substitution models, it produces an error due to the size of the concatenated sequences.
I am in need of tips and alternatives on how to address this issue. Although I successfully used the new cognac package in R, considering it's a recent approach, I need to validate my data. Please assist me with any possible alternatives. Thanks.
Suppose we are performing a phylogenetic analysis among different fungal species and we wanted to see how closely our species of interest is related with the other species across different families belonging to the same Order. After performing the multiple sequence alignment, can we extract the aligned regions (which are common across all the species) and use only that much length of nucleotides for constructing the phylogeny (and remove the unaligned regions)?
For e.g. if the length of our sequence of interest is 700bp and only 200-300bp is aligning with the other retrieved sequences, then can we extract 200bp aligned region and use only this much portion for constructing the phylogeny? Or the sequence of the entire length is required for phylogenetic analysis?
Hello, I am working with closely related endogenous retrovirus (ERV) sequences. I suspect these might have been the result of several integration events from different but related exogenous retroviruses according to host phylogeny and geography. Basically, distant hosts that share geographical distribution have higher ERV identity between theirs than with closer species. ERVs from a single species that is closely related to a few others and has the same habitat, is however very different from its neighbors.
So two possibilities: each ERV clade comes from a different virus or all of them descend from the same insertion event.
But it remains as a hunch. Is there some sort of statistical test or other kind of test that might at least support or oppose this claim?
Thank you in advance.
Can I decide with no phylogenetic analysis if individuals within a population form an infraspecific taxon?
How can I differentiate between subspecies, variety and forma? Which their formal definitions are?
I appreciate it if you could provide updated bibliography and information about this issue.
Kind regards,
AIJ
I have obtained both the forward and reverse sequences of a PCR amplicon, which contain a few ambiguous bases and gaps.
I am considering manually editing the forward sequence using the reverse complementary sequence of the reverse sequence as a reference. Is this a recommended approach for resolving the ambiguous bases?
Furthermore, I intend to conduct a phylogenetic analysis,
But generating a consensus sequence from the forward sequence and reverse complement of the reverse sequence leads to a shortened sequence, with around 50 bases deleted from the beginning and 70 bases from the end.
In light of this, would it be more appropriate to merge the forward sequence and the reverse complement of the reverse sequence into a single CONTIG sequence, instead of creating a consensus sequence?
Can this contig sequence be submitted to the NCBI database, and is it acceptable to use for phylogenetic analysis?
I want to do a Phylogenetic analysis to classify organisms using following sequences.
I have download some sequences (ITS region) from NCBI database and aligned them but because of few sequences long gaps have been generated see picture.
My question is what to do with these kind of sequences? Can i remove the middle portion of these sequences to reduce gap or i should just leave it like that. Can i exclude those sequences from my studies because of these long gaps??
I wish to use NONA or Hennig86 through Winclada to perform cladistic analysis based on morpho-cladistics characters of Coleoptera families. I am unable to find the two anywhere. I tried searching for the same but did not come across anything useful. What should I do? Thanks for your help.
Hi, so I am trying to follow the steps in the MrBayes manual to create a tree. I start MrBayes with mb. Then I type execute <file_name>. The manual says (and some youtube videos as well) show that it should only upload my file and then give me another prompt. However, it doesn't ask for anything else and just goes straight into the analysis. The manual also shows that every 100th generation I should be asked if I want to continue with the analysis but that doesn't happen either. Why am I not seeing these prompts? I know I'd like to eventually set "sumt burnin=2500". Do I need to do that before I do the execute command?
Lastly, how do I see what my final standard deviation is besides the terminal screen? I closed out of my first run and don't know what the number was and am not sure which output file it would be in.
Thanks!
I have been given 4 organisms (insect) and need to manually construct the max possible trees and then choose the most parsimonious and back this up by research. they are arthropods.
first How can i verify what are the number of possible trees, I have already drawn 12, but got feedback that that is not enough. I am using 10 characters.
Im trying to find an oligoprimer set for HA gene amplification of H5N1?
I will appreciate if you could point out a primer set for phylohenetic analysis by H5N1 HA gene sequencing.
I am currently working on a paper and we need to discuss the relationship of these two variables. I already saw someone ask about this (May 2022) but the only answer provided was not that helpful.
Why do we partition our aligned molecular data in phylogenetic analysis?
Also I would like to ask what are the different ways of partitioning data in phylogenetic analysis and what are the best tools for data partitioning?
Thank you for your time and consideration
here is the code:
library(babette)
library(seqinr)
library(BeastJar)
library(beastier)
fasta <- read.fasta("nuc.fasta")
get_default_beast2_bin_path(
beast2_folder = get_default_beast2_folder(),
os = rappdirs::app_dir()$os
)
fasta_filename <- "nuc.fasta"
output <- bbt_run(fasta_filename)
##############after this run it gives following error:
Error in beastier::check_input_filename_validity(beast2_options) :
'input_filename' must be a valid BEAST2 XML file. File 'C:\Users\User\AppData\Local\beastier\beastier\Cache\beast2_9ec3aae162d.xml' is not a valid BEAST2 file. FALSE
Why does it need to converge in phylogenetic analysis?
What does a bootstrapping value of 70 mean?
Which part of the result shows the bootstrapping value after running the following script?
!raxml-ng --bsconverge --bs-trees T2.raxml.bootstraps --prefix Test --threads 2 --bs-cutoff 0.03
Currently I don't have the scope of sequencing and DNA extraction.....wet lab facility.
I want to reconstruct the phylogeny of four subfamilies of Malvaceae using matK, rbcL, ITS.
Can I use public data from GenBank, like downloading the genes (FASTA) of different plants, then using software to analyze the clusters (MEGA/Mr.Bayes/other software), then comparing the trees of different genes, to see if there is any similarity/dissimilarity, then drawing a conclusion based on my observation.
Will that flow of work be supported scientifically??
Please help. Sincerely-Sunzid.
I am a beginner and studying phylogenetic analysis, for analyzing my angiosperm family Sterculiaceae how should I approach for outgroup selection?? Should it be related closely?? Like members of Tiliaceae/Malvaceae??
I have learned using MEGA 11 to some extent.
Also I need some resources / papers/ slides/ anything that might be helpful for me to start as a beginner.
Thank you so much.
Hello all, I have a sequence alignment of ~2000 sequences, which is likely more than is necessary. If I begin to remove sequences manually or using some software program I'm sure I can reduce the number of gaps, but this will of course reduce the size of the alignment (and may introduce some amount of bias/subjectivity). Is it better to keep the larger dataset at the expense of greater gap character? Is there a rough criteria for minimum amount of gaps an alignment should contain for reconstruction? Thanks very much.
Hi, I have previously heard in a conference someone said "The isolates within group A are similar to group B with R > 0.2"
Is there a way for me to calculate the R value of two different groups of isolates based on their nucleotide/amino acid sequences or based on their sequence homology, so that in the end I could reach a conclusion just like the example I've provided?? Thank you so much!
Hi everyone,
I am working with 14074 PAVs (presence/absence variations) in a panel of 99 accessions.
I converted the PAVs table into a binary matrix to construct a PAV-based NJ. Now I would like to test its stability by bootstrapping the tree.
Someone can inform me about the R code to do that?
Attached is the CSV input file.
Here below the commands that I run in R:
Data <- read.csv("Table.csv", sep=";", row.name=1)
dist_mat <- dist(Data, method = 'binary')
tree <- nj(dist_mat)
We need to sequence markers to perform phylogenetic analysis of a snake but the only material we currently have available is a dried blood sample. I would like to know if it is feasible or possible to obtain information from this type of sample.
Thank you.
Hi. I have acquired 177 original (yet similar) sequences in my taxonomy research. In order to produce a concatenated background dataset for phylogenetic analysis, I did BLAST searches for each original sequence and ended up with ~7000 sequences in one file.
There is much redundancy in the BLAST results as expected, however, manually removing the repeat sequences is frustratingly tedious and not reliable.
Please advise how I can reduce my background dataset to only unique sequences? That is, the file must have only one identifier for each sequence found in it- whether my original sequence or one from the BLAST search.
Thank you for any advice offered.
Regards, Tamiko
I am greeting you all. I wonder if anyone here faces the same problem with me that I often cannot access a TreeBASE website? Also, can anyone suggest an alternative database that I can use to deposit my phylogenic tree? Thank you very much for your time and your support.
I am currently working on a project that aims to characterise in R on a pool of 500 bird species the traits that may be at the origin of their introduction outside their natural habitat and thus allowing them to become invasive or not.
Thus, out of my pool of 500 species, I ended up with 150 bird species that were introduced elsewhere (introduction = 1) versus 350 others that were not introduced (introduction = 0), with approximately 80 life history traits for each of them.
My idea was therefore to use PGLS (linear models correcting for the phylogenetic effect of species on their traits) on my pool of 500 species and see which traits could explain the "introduction" variable.
The problem is that by doing this my results are biased by the presence of many more non-introduced birds than introduced birds. My initial idea was to use bootstrapping to resample my n=350 birds to n=150 and run my PGLS on this new pool of 300 species (n=150 introduced and n=150 unintroduced), repeat it and then do some model averaging.
However by doing this my final models obtained in this way are completely different at each of my R sessions. I have tried increasing the number of bootstrap runs to 10,000 but this does not solve the problem. When I do this with basic GLMs I do not encounter this problem of non-repeatability.
Would you have a solution to solve this problem of repeatability with the PGLS in my process?
Hello everyone, I'm working on phylogenetic analysis and sequence retrieval by tBLASTn tool of NCBI. But the page is showing a maximum of 100 alignments and I wish to see all the alignments i.e. at least more than 100. How can I do so?
I'm seeking your quick answers.
Thanks.
Traditional search in TNT in my case is retaining 7500 trees with the best score of 382. How do I find the particular tree with the best score for saving it?
Is there any option, so that we can get the results very faster for 1 million generations?
It takes one month to finish the phylogenetic analysis for the same.
I am analyzing population structure of a fungal organism collected from various locations using RADseq data. For this purpose I ran few analyses, but want to focus on maximum likelihood phylogeny and estimation of inbreeding coefficient (Fis) here. Phylogenetic analysis identified three well supported clades within one population (similar branching was observed with phylogeny based on protein coding loci). When I estimate inbreeding coefficient using the same dataset as for the phylogeny I get a value of 0.009. These two results do not seem to agree with each other, since phylogeny seems to detect signature of subpopulation structure within this population, but Fis doesn't. What may explain this pattern?
Thanks for the ideas!
Olga
Hello everyone!
I'm trying to replicate analysis 1.3 from the paper: "Phylogenetic analysis of a new morphological dataset elucidates the evolutionary history of Crocodylia and resolves the long-standing gharial problem" of J.P.Rio and P.D.Mannion. With their supplementary files "Raw data: TNT file with continuous and discrete morphological characters (used in Analysis 1 and 3)." and the following protocol :
piwe=
open the matrix
piwe=12 ; (weight of 12)
xpiwe= ; (active EIW)
piwe; (see if all is ok weight of 12 and EIW ON)
Then NewTechnologySearch with :
- on letf side : Sect Search + Ratchet + Drift + Tree fusing (on default setting)
- on right side : Driven search with init addseqs of 5 + Stabilize consensus 5 times with 75 factor
Random seed of 1 and Auto-constrain and replace existing trees ON
Then TBR with Tree on ram.
But I get a tree with the best score of 8900 something whereas in the paper the best score is 8181.9.
Does someone have an idea where I mistaken?
I have performed a phylogenetic analysis using CLUSTALW and obtained a phylogenetic tree. I know how to read a phylogenetic tree and how to relate the sequence similarity between two sequences by looking at the tree, but what I am concerned about is how one can decipher the father-daughter relationship by looking at the tree? Also, if I want to isolate the sequence of the father corresponding to a daughter sequence how can I do that?
I have attached the corresponding image. In the image, I know that sequences "QXN18196" and "QXN18436" are closely related. But will it be safe to say that sequence "QXN18520" is the father sequence for both the sequences? Or if I want to establish a father-daughter relationship for the whole tree, what will be the relation between the sequences?
I’m a master student and I’m going to start to work with genomic data in the field of population genomic. I need to buy a new laptop and I was wondering if the new Macbooks with the M1 processor are suitable for genomic and population genomics analysis. I’m mainly afraid of incompatibility issues with the most common used programs. I would like also to know if it’s possible to run on a M1 MacBook programs for phylogenetic analysis such as BEAST, MEGA ecc..
Thank you so much for the help!
Andrea.
Im trying to partition a set of characters, is there a way of doing this by specifying the interval instead of the whole sequence of numbers (like 0-1044)
I'm very new doing phylogenetic analysis specially using bayesian inferences. I'm building a phylogenetic tree with 3 mitochondrial markers and after running partition finder I obtained two partitions which their best substitution models are HKY+I+G+X and HKY+G+X. I would like to receive some advice on how to set up these models, either directly from the BEAUTI interface or by modifying the XML file.
Thank you in advance
Based on 16S rRNA gene sequencing
I would like to add a data matrix of morphological data, assembled in the software Mesquite, to a manuscript. I would either add an electronic supplement (MS Excel format) or a table as *.txt or *.dic file. Anyone with experience around? I find Mesquite to be a bit user-unfriendly with this regard.
I am trying to calculate the origin time of some bacteria lineages, and testing the beast2 with a very sample dataset with only 12 taxa and 1 protein sequence with 1000 AAs, with wag model. I used the prior root age with 3500 MA and one cyanobacteria lineage 1200 MA with normal distribution at "priors" at BEAUti, and calibrated yule model using a fixed starting tree (with 4 parameters turn to 0). However, I keep getting the results that have very short branch length and the ESS is always low even I set the chain length to 40000000. Could anyone provide me some suggestions? Thanks a lot!
I'm currently learning the how's and why's of bioinformatics, this is the third three I build and I noticed that the more sequences I add, the lower is the bootstrap.
My sequences were trimmed using trimAl I choose to obtain them with no gaps. The evolutionary model was chosen using the Jmodeltest software and the phylogenetic tree was generated on Mega.
This is a Maximum Likelihood tree, my Log Likelihood is -24465.21, the three were built using the General Time Reversible model, G+I, gamma 6, NNI, 25 threads, and 500 Boostrap reps.
Can anyone elucidate why this is happening and what I should have in mind when constructing the next three?
Thank you very much
Hi everyone, do you know any phylogenetic analysis to assess the dependence between both categorical and continuous independent variables (plant traits in this case) and phylogenetic relationships?
I have looked at 'phytools' but I'm not sure if it has a function for such a variety of variables at the same time.
I saw that some papers use both cpDNA and nrDNA in a single phylogenetic study. Why not just choose one? What are the functions of each type of DNA? What is the benefit of using both?
We have a phylogenetic tree of 16s rRNA sequence of many bacteria, and we also have a phylogenetic tree of protein sequences of those bacteria. Now, how can we correlate these two types of sequence in terms of evaluation?
I am eager to be a researcher in an international project about statistical, mutational, phylogenetic analysis management topics.
Dear RG community,
For robust phylogenetic analysis, it is often necessary to base a species tree on concatenated alignments for many genes rather than for just one gene such as 18S rDNA etc. Finding multi-gene sequences that are available for every species in the tree, with one's species of interest as a starting point, is the most time-consuming step in this procedure.
I wonder if there is any commercial or free software that can conduct such a search automatically?
For example, Wiegmann et al. (2009) ( ), reconstructed phylogeny of holometabolous insects using six genes: AATS, CAD, TPI, SNF,
PGD, and RNA POL II. This group obtained sequences of these genes from 29 species based on their own sequencing data, which s very cool. When one needs to get such information manually from GenBank, the task becomes very tedious and time consuming, because it is almost impossible to find species for which sequences of all the six genes are available.
Thank you.
What are the important points one must keep in mind while doing beastv2 analysis in connection with molecular dating?
Dear All,
I am struggling with a constant problem with a csv extension while preparing data for MuSSE model analysis: have tried to do a bunch of stuff to fix a problem but no success - always the same thing ("All names must be length 1"). I would very grateful for your help! :)
library(diversitree)
dat="MuSSE_hosts.csv"
dat<- read.table("MuSSE_hosts.csv", header=TRUE, dec=".", sep=",", row.names=1)
mat <- dat[,2:ncol(dat)]
lik.0 <- make.musse.multitrait(tree, mat, depth=0)
Error in check.states.musse.multitrait(tree, states, strict = strict, :
All names must be length 1
Thank you a lot in advance!
How many average genetic distance values of mtDNA control region Dloop for indicative conspecific populations or valid species in Chiroptera?. Thank you
Greetings to all,
Anybody please suggests me. How important is Bayesian posterior probability in phylogenetic analysis. Is MP and ML sufficient for phylogenetic analysis. I didn't get consistent PP values for each analysis I performed using Mr Bayes. What could be the reason? Also, the SD of split frequencies never falls below 0.01 even after adding number of generations.
Rather than using sequence alignment data, I wanted to have phylogenetic tree from distance matrix and bootstrap as part of statistical analysis. Anyone to tell me how to execute this analysis?
I am using Unipro UGENE for sequence alignment. after alignment, I usually export the file as mega format and it always works. But for some reasons its not working now. Any of you facing the same problem? any alternative? or online website to fix the line errors?
Thanks in Advance
How would this appear on a tree if COI only resolved those closely related species and not more distantly related species? Thank you
I have a question, when I used the picante function to analyze the phylogenetic community structure and phylogenetic signals in R, it appeared an error:"'phylo' is not rooted and fully dichotomous", I don't understand what's the problem, the attached file is my phylogenetic information, please check it, I am sorry to trouble you all, but I really want to solve this problem, thanks a lot.
I will do an experimental analysis to perform a comparative genomic analysis among Avian Pathogenic E. coli to explore the properties characteristics of type 3 secretion system loci in APEC. The idea is to isolate and compare APEC strains from chickens' brains with septicaemic symptoms and from chickens without symptoms (control). First, I will use RT-PCR to screen all APEC isolated for the presence of conserved T3SS structural core genes. Then, I will use whole genome sequencing and bioinformatics to screen the genomes for the presence/absence of a full set of genes known to be part of a complete T3SS, MLST, and other genes. To compare the APEC field strains with the control, I will use virulence and AMR gene profiling and phylogenetic analysis (based on MLST and average nucleotide identity). I wondered which is/are the best way/ways to analyze the results and what I should expect in terms of results. Thank you for your thoughts; I appreciate it.
Hello, everyone,
One of the aims of my current study is to place a particular plant species within the context of the whole genus phylogenetically. The systematic position of these plant species is well known. But the species that I have in hand is rare and endemic and its phylogenetic position is obscured.
I will launch a phylogenetic analysis of the DNA sequences of this plant species. I have a few DNA sequences of 5 markers. Please I need help and answers to the following points:
1- I did a BLASTn search, should I use Mega blast search instead?
2- If the retrieved list includes a plant species with several accessions, should I download all accessions or just one accession for each species is enough for the phylogenetic analysis?
3- What is the threshold of similarity percent to the query sequence I should select?
4- How many accessions needed to cover the diversity of the plant species under investigation?
5- Which phylogenetic analysis methodology suits systematic research, (i.e.) Bayesian analysis, Maximum likelihood, or something else?
I am interested in the different tools that can be used to create custom databases for targeted sequencing and how to trim the databases based on the amplicon size? Also, should custom databases contain species not assigned to a species level?
I am carrying out a phylogenetic analysis on the relationships of several families within Anura and was planning on using cytb and COI in my analysis but I am not sure whether I should be choosing less conserved genes?
What are the benefits of concatenating two or more gene sequences in alignments, and is it better to use nucleotide sequences or amino acid sequences in this? Also, does this work for all genes from the same taxa or are there exceptions?
Hello!
I now have a complete list of about 23k plant species that I want to check if Genbank has the sequencing data for a phylogenetic analysis later, and I am really new to the topic.
Is there a way to do this using R?
Thanks!
I am getting negative branches with the Neighbor-joining method, which I set to zero. However, I've read that I should transfer negative distances somewhere else and I do not know how. Does any have an script/method to transfer negative distances to the corresponding branches?. Thanks in advance for any help.
I have a wide range of freshwater microalgae collected from various districts of Tamil Nadu. I am looking for a collaborator who could help me with molecular identification of some rare unknown taxa. Please ping me for further details.
Normally dN/dS ratios are calculated and interpreted as below one negative selection, above 1 is positive selection and 1 means neutral selection. How to interpret dS/dN ratios? Programs like SNAP provides dS/dN graphs and ratios.
For example:
Averages of all pairwise comparisons: ds = 0.1678, dn = 0.4090, ds/dn = 0.4072, ps/pn = 0.4707
Please see image as well.
Can somebody explains it in simple words as I am not much familiar with this?
Hello all,
I am trying to obtain information about the specific conditions for the method of preserving feces samples for mammals ( preservative type, preservative concentrate, preservation temperature, ....) in order to perform a phylogenetic analysis.
Thank you
I am conducting a phylogenetic analysis on LuxR solo of alpha, beta and gammaproteobacteria. I am building three separate trees for each. What would serve as the best outgroup in all three trees?
I found some bases with disagreement in the assembly of the contig. I read that because it is a nuclear marker, this disagreement may not be due to a failure in sequencing but due to heterozygosity, and therefore I should not change the base in the contig by the highest-quality criterion, but set it as an IUPAC ambiguity code.
On the other hand, I was told that when it is heterozygous, the peaks are usually the same size. But it is not always easy to determine how equal it is.
So I would like to know if there is a cut-off point or defined protocol to determine if a mismatch is due to heterozygosity or an error in sequencing on one of the strands?
What is cophenetic correlation coefficient and how is it computed for a clustering method.
Also how can it be used to compare two different distance matrices?
Hi,
I'm currently trying to do a multiple sequence alignment for a gene family using the coding nucleotide sequences. I want to create an alignment based on the amino acid sequences but that is in the form of nucleotides, so that I'm able to account for synonymous mutations in my subsequent analyses. The ClustalW (codons) and MUSCLE (codons) alignment options in MEGA-X sound like they fit my needs. However, I am unable to select them in the drop down menu and they are greyed out (see picture attached).
Does anyone know of any reasons why these options might not be available to me?
Thanks in advance,
Jacob
Hi,
In my study I performed a coalescent model phylogenetic analysis, but before to identify the probable number of ancestors I did an admixture analysis. I want to connect my admixture results after the coalescent model tree. I want to derive a connection between these two analyses. If someone knows an answer to this, please help.
Hi all,
I'm a structural biologist and I'm attempting to trace the evolution of a protein domain family of interest. I have a very large collection of domain sequences, that (we hypothesise) have diverged in sequence and structure over time from a common ancestor to the specialised domains that we're looking (~15 domains in total). This is challenging because the domains are very different from each other in sequence, so we're trying to see if we can trace the evolution by increasing the number of sequences to tease out the phylogenetic signal. To trace the evolution, I acquired a large number of sequences of these domains (~10,000-15,000 each) and performed a super-huge alignment of all of them using PASTA over 20 iterations, coming out at ~150,000 sequences (~300aa long on average).
I now need to perform a phylogenetic analysis, and I did attempt to use PhyML on my university's supercomputer. I get error messages saying that I should not use more than 4000 taxa. My question to you, dear phylogenetic fellows, is: which software do you think is best for performing phylogenetic analysis of such a large number of aligned sequences? Any help would be appreciated!
P.S. as a side question - how many bootstraps should I be aiming for? I've read that ~100 is generally acceptable, and I set PhyML to run 500 (though I fear this may be unrealistic given the size of the input data).
Thanks very much!
Rob
I am studying two types of Ophidiid fishes having features morphologically similar and would like to genetically classify them.
I have already investigated COI region of mtDNA, but the genetic difference is quite small (2 to 7 mutational steps between nearest and farthest haplotypes), so I am considering of conducting experiments using nuclear DNA.
However, I am not sure which genetic marker will be suitable for analysis to make a marked difference than COI.
So far, I am considering of using the RAG1 region, which has been studied in related species, or ITS, where base substitution is likely to occur because of non-coding region.
Could you tell me if there are better genetic markers that has a fast base substitution rate and is effective in clarifying interspecific or intraspecific differences?
I've tried several programs (e.g. TreeView, FigTree, MEGA, Archaeopteryx, Mesquite) but they just display node numbers and branch lengths. I'm using the software SYMMETREE to infer diversification rate shifts on particular nodes within a tree. Results refer to "branch numbers" which, according to the manual, can be displayed using MacClade. However, I´m not using a Mac OS platform.
Is it possible to enter two different types of information into Mesquite?
I have two sets of species data:
- the divergence times (in millions of years) i.e. how old each species is and the relationships between them i.e. how they are grouped on a tree, using brackets.
- the sexual system status of each species (e.g. 0 = gonochoristic, 1 = hermaphroditic).
It seems I can only open one of the following to make a tree:
- a nexus file containing the divergence times/relationship data (begin trees;)
or
- a nexus file containing character states (begin data;)
I have combined both sets of information (see attachment) into a single nexus file but this doesn't seem to work; I get a brilliant tree with all the species in the correct positions and branching correctly, but the character states are not shown.
It appears the only way to combine them is to open the character nexus file into a tree, re-position every branch/species until they match a reliable existing phylogeny, and then add divergence times manually too.
As the fish families I am studying have upwards of 500 species in them, this is a long, slow and mind-numbing process!
Does anyone know of a better way? I feel I am missing something really obvious that could save me a heap of time!
Many thanks!
What is the best way to date a phylogenomic tree using fossil calibration? It is more or less straightforward with a few Sanger loci using programs like BEAST, but it becomes intractable with hundreds of genes, as produced with phylogenomic approaches (e.g., target capture). Just wondering if anyone had any opinions?
Thanks a lot!
which software can be used to prepare nexus file for phylogenetics analysis in Mr Bayes? which form