Phylogeography and Phylogenetic Biogeography

Phylogeography and Phylogenetic Biogeography

  • Santiago Ramírez-Barahona added an answer:
    How to construct the input matrix to infer haplotype network using TCS software?

    Hi everybody! I've been tying to construct an haplotype network using TCS software (Molecular Ecology (2000) 9, 1657–1659) but I'm having problems with the input matrix. I realize that TCS was developed to infer haplotype networks from DNA sequences. However a large quantity of studies infer haplotype networks from cpSSR. How can I construct a distance matrix from cpSSR when I only have the allele sizes? or can I use a matrix with the "raw" data from the cpSSR?

    Thank you in advance!

  • Lauri Kaila added an answer:
    When inferring deep branches of an ancient protein phylogeny, is analyzing thousands of sequences better than a more sophisticated analysis with less?

    I work on an ancient protein family that exists in both eukaryotes and prokaryotes and my interest is in elucidating its deeper branches. When I pull down the data from NCBI I get about 6000 sequences. If I use an algorithm to help me cluster similar proteins (e.g. 80% similar) it gets reduced, and if I do my clustering with 50% similarity I get about 550.

    The question is, in your experience, which one would you choose:

    - use the smaller dataset (or even smaller) and do the most sophisticated, yet computationally intensive, analyses you can hoping that if there is a signal, these analyses would pick it up and thus the deeper branches get better support.
    - use a lot of data and hope that then the intermediate evolutionary steps could be inferred more easily and the deeper branches get more support?

    Lauri Kaila

    Matthews' answer has an important point that I want to highlight, pointing towards an intermediate approach. I might try first with the larger data set. If you find lots of near-duplicates in some clusters, thinning there taxon sampling may improve possibilities for more sophisticated methods without loosing phylogenetic signal for greater patterns. But, definitely, all more deviating taxa should be kept intact, otherwise the risk  for severe bias due to long-branch attraction (or rather short branch explusion) is almost inevitable.

  • Csaba Csuzdi added an answer:
    Can paleoclimate be correlated to population genetic divergence to explain distribution of a sp. of extant tropical highland plant?

    What kind of treatment can be done and what kind of data is needed? And what meaningful implication can we observe? (to plant conservation, etc)

    Csaba Csuzdi

    Here you are a nice paper dealing with the speciation pattern of blind mole-rats (Spalacidae). Here the authors proved, that far before the Quaternary climatic fluctuations the dry/wet climatic cycles caused significant speciation in the investigated group (of course also tectonic changes resulted in speciation).

  • Bjarne Larsen added an answer:
    How too make a correct bootstrap on R using dominant markers?
    I am trying to analyze some AFLP data, and making dendrograms, I need some bootstrap support, but, it seems that something is wrong with that. I built my tree like that: dist=vegdist(gel, method="jaccard", binary=TRUE, diag=FALSE, upper=FALSE, na.rm = FALSE) dendro=hclust(dist,"complete") dendro2<-as.phylo(dendro,cex=0.5) Then I used the function boot.phylo: boot.phylo(dendro2,gel,make.tree,B=100,rooted=TRUE) Most of my values equal 0, even for clades that are far from others on the tree, and that seems logical, (different places etc...) Is it the bootstrapping method that is wrong, or is it something else? Any ideas?
    Bjarne Larsen

    I am working with a polyploid SSR dataset. I have successfully made a distance matrix using the package "polysat":

    testmat <- meandistance.matrix(appleDK)

    and a NJ tree using the package “ape”:

    appleDK <- nj(testmat)


    Now, I need to add bootstrap values to my NJ tree.

    How do I do that?

  • Santiago Sanchez-Ramirez added an answer:
    What could possibly be the reason for getting really low ESS when a discrete trait (location) is added to the BEAST analysis?

    I was trying to do Bayesian analysis on some of my sequence data using BEAST 1.7.5 to see how closely related they are and their migration patterns. 

    The substitution model used was GTR+I+G (strict molecular clock). I did 10 million iterations primarily to have a better ESS thus a rich posterior probability. Well it worked fine and for each run, I had ESS <700.

    But once their locations (discrete trait) are added to the analysis, ESS dropped down to <10. Even after combining 4 independent runs, ESS remained low (<75). Trees each run generated were significantly different and location patterns doesn't seem to right. The branch colours were really confusing.

    Can anyone help me to get this analysis right with the discrete trait (location)?

    I guess if everything goes right, the posterior probability values I got w/o locations should be similar to with locations, right?

    My expertise with Bayesian algorithms and BEAST/ beauti is extremely low.


    Santiago Sanchez-Ramirez

    Hi Harindra,

    I meant the number states in your trait (e.g. Areas). Also, what trait reconstruction model are you using (symetrical or asymetrical?). Try with longer generations, I think its just a convergence issue. If the model gets more complex, it needs more time to finish. If you are increasing to say 50 M generations, remember also to increase the sampling frequency to 5,000 so you end up with 10,000 states.


  • Cristian Cornejo Latorre added an answer:
    Could somebody please advise on how to make Haplotype networks in DNAsp or MEGA 6?

    Looking to make Haplotype networks in the above mentioned software or any other such freeware. 

    Cristian Cornejo Latorre

    I know three options where you can make your "haplotypes networks" 

    1) Arlequin Software (manually)

    2) Network 

    3) TCS with statistical parsimony


  • Danny J Gustafson added an answer:
    Any ideas for mitochondrial primer sequence for species level discrimination in plants?

    I am working with closely related species in family Caprifoliaceae. Full ITS amplification is a major problem in this group. I have used several chloroplast regions but not much variation detected.

    Could you suggest some potential mitochondrial region for plant or specifically for caprifoliaceae.

    Danny J Gustafson

    I want to agree with Keir Wefferling - the series of papers by Joel Shaw published in the American Journal of Botany are just what you need. The alternative are the typical Bar Code sequences (also mentioned above). 

  • Eneas Konzen added an answer:
    Does anyone have experience in using GenGIS?

    The software merges geographic, ecological and phylogenetic biodiversity data in a single interactive visualization and analysis environment.

    Does anybody know to how build the various layers such as maps, genetic information as well as geographic location?

    Any help would be much appreciated. 

    Eneas Konzen

    OK, glad you made it! 

  • Cintia Souto added an answer:
    Does anybody have information about plant DNA barcode as tool for identification of illegal traffic of endangered species?

    I am looking for information about plant DNA barcode as tool for identification of illegal traffic of endangered species. Thank you very much in advance.

    Cintia Souto

    You can check the Barcode of Life database at 

    Also the Scientific abstracts from the 6th International Barcode of Life Conference in the last issue of Genome

  • Jingming Zheng added an answer:
    How do I compare trait in the context of phylogenetic dependence?

    when I try to compare difference of trait among tree\shrub\liana in a big database of China woody species, I konw ANOVA or Kruskal-Wallis rank sum test and Wilcoxon rank sum test could be used in traditional statistics. however, in phylogenetic context, I don't quite clear how or when to use the right technique.

    phylogenetic analysis is not easy for me, it seems like a brandnew field to an ordinary ecological researcher. Thus, I am looking for help or potential collaborator. 

    thanks for any suggestion.

    Jingming Zheng

    thanks Vladislav! I will try this.

    may I ask another specific question:  as my phylo tree is too big to manipulate for PGLS, I want to use family phylo mean to regress one trait vs. others. I realize that the family mean is not arithmatic mean, or the family pics is not the average of species pics in the family.  some paper(Zhang et al, 2010, GEB) mentioned that could be solved by "analysis of traits" in PHYLOCOM or "pic3" in  R package picante, the latter is removed and the former is too much for me.  would you recommend other ways or literature?

    thanks in advance.


  • Jaspreet Kaur added an answer:
    What is the minimum genetic distance required (by K80 model) to differentiate the species into two varieties ?


    I am barcoding plant populations of species that are geographically separated. I am getting some genetic divergence between the same species (intraspecific) but, I am not sure if it is significant enough to propose the two varieties. I would appreciate if some one can help me on deciding to criterion to infer the genetic distances.

    Jaspreet Kaur

    the taxa I am working on belong to same species but there is controversy regarding their varietal status. Thanks all for giving your valuable suggestions and I think Andrew is right about hybridization concept. I am not able to distinguish between two taxa with regard to varieties. The genetic distances I am getting in my samples are all random.

  • Claudia Pätzold added an answer:
    Are chloroplast regions with relative less informative characters in one species, will show much more informative characters in other species?

    Dear all,

    Maybe it is widely known that there is a rank in informative characters contains in chloroplast regions (Fig.4  in American Journal of Botany 94(3): 275–288. 2007).

    Recently, I plan to find genetic variation between two subspecies.  I sequenced some chloroplast regions with  relative more informative characters, such as rpl32-trnL, trnQ-5'rps16, psbJ-petA, 3'rps16-5'trnK and atpI-atpH and other high rank regions.

    But, only one indel and one substitution are detected in rpl32-trnL and trnQ-5'rps16,respectively. I feel maybe it is not easy to detect enough variation even if more regions are sequenced.  So, I plan to use others species. 

    But I heard that, in some plant species, rank of protential informative characters is not always similar to Fig.4 in the formerly mentioned paper. Some low rank regions may show far more informative sites than high rank regions. If so, I will sequence more regions to find informative sites.

    So, is there anyone would like to share you experience in primer screening? Is it possible that a low rank region show far more informative site than high rank regions?

    Thanks in advance.

    Claudia Pätzold

    Hi Yue Li,

    It is propably a good idea to go with a Nuclear Single or Low copy gene. However if you are still interested in an improved chloroplast data set (e.g. for comparison with a nuclear region to search for indication of hybridization events), I would reccomend a Look at a paper series called "The Tortoise and the Hare", in which relative informativeness of up to 21 noncoding cp regions were compared for different angiosperm families. Maybe you will find a region with higher variability in there.

    Godd luck to you

  • Biswajit Bose added an answer:
    Can I differentiate between closely related plant species?

    Kindly could anybody suggest any highly variable markers for discrimination for closely related plant species/lower taxonomic level identification. If DNA barcoding could able to go at that level? I have already used rbcla, matK, ITS2 , trnL-F and psbA-trnH but it didn't able to discriminate. Now, what should I use? any other highly variable intergenic spacers/NGS/RAD seq?

    Biswajit Bose

    Yes Mark, I want to do transcriptome sequencing...But, how to do that.

    I am working on the genus Nardostachys and Valeriana from family Caprifoliaceae

  • Birutė Frercks added an answer:
    Does anyone have an idea about the phylogenetic analysis of SSR allelic data?

    I have demonstrated a project, assessing the genetic diversity of breeds of an animal using SSR markers. I've got the allelic data from GeneMapperv.3.7 but i have no idea how to plot the dendrogram to show the genetic distance and what is the software through which I can calculate the Cophenetic Coefficient for my data? Can anyone guide me through this? 

    I have also attached an excel file containing the allelic data.

    Thank you

    Birutė Frercks
    • If you have the SSR data in binary matrix (so your object is not diploid), then I supose you to use the Treecon software (as I wrote in my last comment).
    • If you have the SSR data for diploids, then you don't need to convert it to binary matrix, just normalize the geneMapper data and use the Powermarker software
  • Ignazio Avella added an answer:
    What's the best nuclear marker to show evidence of hybridization and/or introgression in two congeneric snake species?

    I'll probably use Rag-1 or C-mos, but I don't know which one of these two is the best.

    Ignazio Avella

    Thank you everybody!

  • Basil V. Iannone III added an answer:
    Any advice on phylogenetic species evenness and phylogenetic clustering (sensu Helmus 2007) in bacterial communities?

    I calculated PSE and PSC in bacterial freshwater communities and get very low values for PSE (< 0.2) but constantly high values for PSC (>0.8). How does that go together? Shouldn't low tip clustering correlate with higher phylogenetic evenness? Sure the species abundance if often quite uneven which drags down PSE, but PSV is also low (<0.5 ). 

    Also: does anyone know of some studies that used these metrics in bacterial communities in order to get an idea of the range of values that are normally found?

    Basil V. Iannone III

    I apologize.  In my prior post, rho should equal -0.65 and not positive 0.65.  Thanks again for any thoughts.

  • Mahmoud Magdy Elmosallamy added an answer:
    Can anyone tell why I am getting empty result file in Structure?
    Hi everyone I am running Structure programme smoothly. I scored data as o and 1and perform analysis but in the result files are empty at the end of the simulation. I tried both window 7 and 8 for running structure version 2.3.3 and 2.3.4 but the problem is still there. The data file I prepared as a binary matrix and species is assumed haploid input format is when uploaded with structure and programme were set with length of burning period 20000 and number of MCMC100000, using Admixture model with correlated allele frequency assuming set of population of 10. I set K=2 to K=10 the programme run smoothly but I could not get result file even after so many change. One more thing want to share that I am getting the value of Ln Like= --- and Ln PD = 0 is this the reason of no results. and if this the reason then how I can remove it Kindly suggest what is needed to be done exactly
    Mahmoud Magdy Elmosallamy

    Right click on the program icon => run as administrator. This should be an alternative to changing the program directory.

    Best wishes

  • Ursula Torres added an answer:
    Ecological niche modeling in freshwater fishes: What is the best way to use ENMs to predict fish distributions? How can we make better data layers?
    Ecological niche modeling is increasingly important for understanding the factors that shape species distributions, as well as testing biogeographical hypotheses about species past, present, and future distributions as well as the role of ecology in speciation. However, most niche modeling work has focused on terrestrial and marine species (sound like conservation biology, in general?). I have previously used MAXENT to develop and project models of fish distributions, and the models we have published exhibited excellent predictive performance. And I am very interested in continuing to do so, particularly through coupling ENMs with phylogeographic analyses, and/or using them to inform phylogeographic hypotheses testing. However, I am skeptical of all models to some degree, and I am wanting to learn whether other techniques exist that would be more suitable for freshwater fish ecological niche modeling and paleoclimatic modeling, other than MAXENT (which is obviously most convenient for me). I am also interested in what the best data layers are for ENM analyses of freshwater habitats. I always want to learn more about these topics, so I figured I would ask here.

    So, first, do such 'better' ENM models exist that could/should be used instead of or in combination with MAXENT? And, if so, what is required to run such other models, and how would the assumptions of these potentially 'better' models differ from those of MAXENT in different cases?

    Second, it seems that a limitation of ecological niche modeling for freshwater taxa is a lack of sufficiently high resolution data layers for aquatic habitats. However, I am unsure about geospatial data repositories or resources for generating more suitable layers, and I would like specific advice about GIS procedures and data layers for making better data coverages. I am aware that some people are already doing this, but usually at very fine spatial scales. The broader community of ecologists and evolutionary biologists interested in fishes would therefore benefit much more from more comprehensive coverages.

    FYI, I should indicate that I am not really interested in using masks over bioclimatic variables to restrict model output to the boundaries of stream and river networks, because many pilot analyses I have run on North American species suggest this does not add much or produce different results relative to running the models without such masks. So, I would prefer to avoid such discussion unless you know or can show me that doing so improves model performance. Thanks in advance for your replies. Take care.
    Ursula Torres

    Hi Justin,

    I agree with you, we are lacking global datasets for freshwater habitats and I feel that they are bit ignored compared to terrestrial habitats.

    I am really interested to know if you managed to find ANY data layers for aquatic habitats.

    Thanks in advance


  • Valiollah Yousefi added an answer:
    Do you have any experience or any papers about the quartet method?

    The quartet method is one of the methods for a molecular clock. The method has some advantages over the other methods and some weaknesses against other methods. So, I search about strengths and weakness of the quartet method, if you have any experience or paper, explain about it, please.

    Valiollah Yousefi

    These papers are useful for your work.

    + 6 more attachments

  • Caner Aktas added an answer:
    Is there a convenient software package for drawing minimum spanning networks for phylogenetic studies?
    I have struggled to do this with Powerpoint, but am sure that there must be simpler, better, more efficient software programmes for this purpose. Any suggestions? How have others done this? This is a good examples:
    Caner Aktas

    My pleasure, feel free to ask further questions.

  • Robert A D Cameron added an answer:
    Why are common ancestors extinct?
    Look at any phylogenetic tree we will find the internal nodes. However, do we have the common ancestors which are occupying those internal nodes in the real world? Sometimes we come up with fossils of missing links. But why the common ancestors of the missing links are mostly extinct?
    Robert A D Cameron

    When we deal with terminal branches, species alive now, the question seems to me to be a matter of definition only. An ancestral species may split into two or more lineages over time. We can separate the lineages, but since we do not have the relevant data for the actual ancestor, we do not know to what extent each has changed. It is likely that the various lineages have changed to different extents, as well as in different ways, relative to their common ancestor, especially if selection has influenced which changes persist. Whether you call any terminal lineage "ancestral" depends on your assessment of the amount of change that has happened, and on your sense of how much (or how little) change permits you to regard forms as being the same or different species. And you don't have real data for the actual ancestor!

  • Alejandro Gonzalez-Voyer added an answer:
    How does one interpret the output from a 'PGLS-Relationships' analysis in Compare 4.6b?

    The output includes results for PGLS at a given alpha value, Felsenstein's independent contrasts and TIPS (where phylogeny is not accounted for). In my case, mean PGLS carries high alpha values and both FIC and TIPS have alpha values of 0. Should I simply focus on the PGLS and treat the others as hypothetical scenarios? Should I use the PGLS alpha as my guide for acceptance/rejection of the other two? Or alternatively, should trust be given to the set with the highest likelihood? Also am I correct in stating that R^2 measures the extent to which phylogeny accounts for the observed correlation?

  • Olutolani Smith added an answer:
    Does anyone know a software to carry out rarefaction analyses on microsatellite data sets?

    I would like to calculate and compare allelic richness (or similar parameters) of 10 populations with different sample sizes. I thought to use a rarefaction method, in order to adjust the estmates for the different sample sizes. My dataset deals with codominant markers (6 microsat loci with more than 20 alleles/locus). Can anyone indicate a program for this purpose? Thanks.

    Olutolani Smith

    Adze also works well and can handle batch processing of multiple files:

  • Don Anushka Sandaruwan Elvitigala added an answer:
    What is a possible reason for having significant sequence similarity between evolutionary distant proteins ?

    I generated a phylogenetic reconstruction of protein homologous belongs to different taxa using NJ method with 5000 bootstrap support . However, two sequences which showed relatively high sequence similarity (Obtained by MatGat software) clustered separately in the tree diagram, while there are closely clustering with relatively less similar sequences, respectively.  Is there any reasonable explanation behind this type of observations?

    Don Anushka Sandaruwan Elvitigala

    Thanks all for joining in the discussion. Its really helpful to find new ways to look at the scenario....

  • Ferruccio Maltagliati added an answer:
    How does BEAST determine the length of x-axis in mtDNA-based Bayesian Skyline Plot analysis?

    I have a question concerning the time scale (x-axis) of Bayesian Skyline Plot (BSP) analysis as implemented in BEAST. I am not very familiar with the model underlying this analysis.

    I run BSP analyses on three different dataset (one each population) for the mitochondrial COI (537bp). The first population included 97 sequences, the second population 167 sequences, and the third population 48 sequences. The programs (Beauti + BEAST + Tracer) produced 3 BSPs with different time scales: the first population, 0 to 450 Kya; second population, 0 to 1800 Kya; and third population 0 to 350 Kya. My question is “how the more ancient value of x-axis is obtained? It seems related to the number of sequences in the original dataset, does it? If so, which is the relation between the number of sequences and the length of x-axis (time) of the BSP?

    Any suggestion will be greatly appreciated.



    Ferruccio Maltagliati

    The paper by Atkinson et al (Proc Roy Soc B, 276:367-373, 2009) shed light on the problem addressed in my question. The answer is that Bayesian Skyline Plots are truncated to the median estimate of each population's TMRCA. Problem fixed! Thanks!

  • Yue Li added an answer:
    What is the difference between species-specific loci and the loci under positive selection?

    Hi colleagues,

    I plan to seek genetic differentiation between two recently divergent lineage (maybe sister lineage) by nuclear markers. Can anyone tell me what is the difference between species-specific loci and the loci under positive selection?


    Yue Li

    Hi Pablo,

    Thank you for you kind response.  I feel it is an interesting article.

  • Michael Cunningham added an answer:
    In phylogenetic studies of snakes, besides Cytochrome b, which sequence is most appropriate, ND1, ND2, ND3 or ND4?

    In phylogenetic studies of snakes, different sequences is used, including the cytochrome b with another sequence such as ND1, ND2,.  Besides Cytochrome b, which sequence is most appropriate? ND1? ND2? ND3? ND4? 

    Michael Cunningham

    The mitochondrial coding genes are all similar in terms of phylogenetic signal and mutational pattern. ND4 has greater representation in Genbank, although ND2 has also been used extensively.

  • Jenő Nagy added an answer:
    How do you add a missing species into a phylogenetic tree for a PGLS analysis?

    I have a phylogenetic tree in Newick format for species from two sister genera. I would like to use this tree to do a PGLS analysis on some physiological traits I have measured. However, one of my 13 study species is not included in the published phylogeny. Is there a way to add it in manually by estimating branch length somehow and rooting it on a common ancestor with the other 3 species I have from the same genus?

About Phylogeography and Phylogenetic Biogeography

Study of taxa relationships and their distributions

Topic followers (4,007) See all