Science topics: Biological ScienceGeneticsGenomicsFunctional Genomics
Functional Genomics - Science topic
Explore the latest questions and answers in Functional Genomics, and find Functional Genomics experts.
Questions related to Functional Genomics
I want to learn how to work with these databases? I do not know how to learn them step by step, and there is no instructional video.
Thanks for all your help.
There is very little publication where functional characterization(cloning, overexpression, silencing, etc.) of genes identified through GWAS has been performed. However, most of the publications on functional characterization are on genes identified through transcriptome. Why is this? I doubt whether there is any usefulness of GWAS on crop improvement or not? if yes then give me some successful publication examples?
I need to compare the human gut microbes with human to identify functional genome similarity. Is there any online tool for comparing the genome between different species?
How to find the candidate genes to validate their role through functional genomics experiments such as cloning, transgenics, over-expression, localization, and its interaction with other proteins and DNA, etc.
1)Do we need to study a lot of literature and see which genes role is not deciphered in particular traits e.g. drought stress?
2.) Do we need to perform our own transcriptome or comparative genomic studies or analyze already published studies from literature?
3. ) Do we need to perform our own marker traits association(QTLs) study or already published studies?
4.) Some people functionally characterize already known genes(say arabidopsis) to plant of their interest (legumes). But is it a significant or novel research problem to work upon?
5. all of the above.
I look forward to implying a good CRISPR/Cas9 system with a selective maker for functional genomics of important genes from environmental fungi and yeast.
I have several ORF's that I am trying to find the function of, I am wondering what are the newest tools abvailable are or if there are any obscure tools I can use. I have already tried the following Tools: BlastP, Interproscan, PredictProtein, ITasser, DeepGoPlus.
Any help or advice would be much appreciated
I have blasted an amino acid sequence in Blast P nd recieved multiple partiol matches (multiple helix turn helix domain, mulitple DNA binding region, multiple replication protien matches) all with low expect values. However when I search the same amino acid sequence on interproscan I recieve no results. Why would this be the case?
Hi all, I want to download a gene sequnce from Genome Browser, but I am not so sure about the direction of the sequence I get. Does it always show the coding strand sequence of one gene in the 5' to 3' from left to right?
In NCBI, we can identify that one specific gene on chromosome is on plus or minus strand and thus can select "showing reverse complement" or not depending on the direction. I wonder if there is similar function on Genome Browser?
Thank you for answering!
It's known that the yeast Saccharomyces cerevisiae does not have P450 enzymes required for biotransformation of exogenous compounds.
However, many pharmacological and toxicological studies utilize yeast as a model organism.
Does anyone have an idea on how yeast cells metabolize these chemical agents before they exert their biological activity?
I would like to perform gene diversity analysis to study their ecological role and function from the bacterial whole genome data sets. This will be comparative study of different bacterial genomes will be published as a research review article.So, can I download and use genome data sets submitted by some one else in the GenBank? Can it be legal to use those genome data sets to use for my analysis?
Valuable suggestions and comments are welcome.
Thanks in advance.
I have performed a codon based-Z test (MEGA) on a protein coding gene alignment from 30 species to compare the dN-dS values used as a measure of selective pressure indicating neutral mutation rates (dN/dS = 1), positive (dN/dS > 1) and purifying selection (dN/dS < 1). My Z-test rejected neutral and purifying selection and indicated a positive selection with significant p-value (p less than 0.05). I was further interested to find, which codons are under positive selection, so I used HYPHY and various tools within it to find the sites. But HYPHY found no evidence of positive selection at the codon positions. I have also used BUSTED for gene-wide episodic diversifying selection finding no signs, and FUBAR on the alignment to identify sites which have experienced pervasive diversification over the entire phylogenetic tree. This also gave negative results. Then I run aBSREL to test for episodic diversification, this gave strong evidence that episodic diversifying selection was found on one characteristic branch of the tree. Could you give an a reason why the codon based Z tests are indicating positive selection, while other tests don't and instead pointing to episodic diversifying selection on one branch of the tree and possibly what this could mean?
I want to be able, in some quantitative way, to compare several plant species with respect to how much is known about the functions of all known loci in their genomes.
It seems like looking at the tags for each locus in different functional annotation or protein databases (i.e. Pfam, PANTHER, KOG, GO BP) might just be reflective of the species' evolutionary relationship to Arabidopsis, for which most of what we know about plant protein function has been determined. Are there any publicly available datasets that I could use to approximate the sum of all functional work done in each plant that would be comparable across species? How could I analyze them to roughly answer the question: "what proportion of the genome has been functionally characterized?"
How can the same modification of the standard genetic code happen in two organelle genomes (chloroplasts and mitochondria) and in the same organism?
Hi, I am characterizing the transcriptional regulator of Mycobacteirum tuberculosis .
Because we don't have facility for M.tuberculosis culture, I am trying to over express (using PVV16 which has Hsp60 promoter) it in M.smegmatis. but I never got colonies though the clone and vector back bone are fine. So I thought over expression might be toxic to M.smegmatis. so to check its toxicity I did frame shift mutations and found its over expression is not lethal.
However for the same protein Chip-seq data available in MTB Network Portal. By going through their publication in NAR, I realised they over expressed corresponding transcription factors in M.tuberculosis using Tet-inducible system and did Chip-seq analysis.
My question is, is over expression of my protein not toxic to M.tuberculosis but toxic to M.smegmatis or am I simply doing something wrong?
How can I get a over expression strain of M.smegmatis?
Functional genomics is a field of molecular biology.
What are the common and uncommon tools between them?
I am studying fungal whole genome sequence specifically (basidiomycetes) secondary metabolite biosynthesis gene clusters. Antismash and SMURF have identified several terpenes, PKS, NRPS clusters but did not provide any putative end products from such clusters. But NapDos specified some end products and one of them (epothilone) was confirmed through analytical studies. When the biosynthesis cluster of epothilone was compared using local blast, it showed almost a 2kb similarity out of the 80 kb cluster for epothilone. However, the 2kb is not a continuous sequence (an average of 50 bp), sparsed among all the PKS clusters identified by ANtismash and SMURF. It may be noted that the biosynthesis gene cluster of epothilone till date is only confined within the bacterial kingdom.
Is sparsed sequence similarity relevant? Am I on the right track in searching sequence similarity among different kingdoms? It may be mentioned that the problem is similar for several other clusters. A valuable suggestion and comments will be highly appreciated.
Is there a reference file (bed) for enhancer regions in the mouse genome (mm9) ?
I want to clone an enhancer that is translocated upstream of a regulatory gene by a translocation. How can I do that? Which protocol to follow?
I could see the upregulation of the gene by RNA sequencing, now I need to figure out which enhancer is the responsible one for this upregulation.
I want check the similarity of a list of genes' 5′-UTR regions across different species. Anybody could recommend me to do it in an efficient way, please?
I'm confused whether UCSC includes the promoter region or starts at the TSS
Does anyone know were I can download CGAP: A new comprehensive platform for the comparative analysis of chloroplast genomes? The link in their paper is broken.
i am working on s.aureus genomics. I have sequenced genome on illumina next seq 500 platform.
i have with me raw sequence files generated from illumina next seq 500.
i am looking for work flow for the analysis of genome and comparative analysis.
other wise suggest tools and server .
I have a list of genes ranked according to their expression. I want to know which gene encodes transcription factor. I cannot search one by one, because there are too many genes. Any suggestions to find genes encoding transcription factors? Any tools? Thanks.
I have RNA seq data for testis and ovary samples of 2 Drosophila species. I don't have biological or experimental replicates. How do I quantify differential expression for this. I have used reads from species X and species y testis data to build one assembly using TRINITY and aligned the raw reads of both species for abundance estimation using RSEM. Later used EdgeR to perform differential expression. EdgeR has done pairwise comparison for the samples. Is this a right approch to go about. From literature it appears this is the only possible way to compare and quantify genes if replicates are not available. Any suggestions .
The 'Viral Informatics Resources for Metagenome Exploration (VIROME)' uses CD - HIT 454 algorithm during its screening, does it mean that this portal is more useful when we have our data set generated by a 454 platform?
I' try to detect the differentially expressed gene using human RNASEQ data. And I don't care about the alternative splice and so on right now.
Could I just use tophat to map my reads and cuffdiff to deal with the bam file and get the DEG?
Or I must follow the tophat-cufflink-coffmerge-cuffdiff pipeline strictly?
I have got the gtf file from UCSC and iGenome.
i am currently deciding on whether to use DNA microarray or RNA-seq for my project. I am interested in studying the mechanosensitive genes: genes which expression level changes upon mechanical stimuli. I was told that RNA-seq can identify unknown transcripts while DNA microarray is limited to probes for only known sequences.
My question is, arent all the DNA sequences for human genes are known? what is this unknown transcripts referring to?
Why segmented genomes like bipartite, tripartitie etc are not seen in double stranded DNA plant viruses? any unique reason for this?
This article represents the beautiful research you all do. Thanks Pat. You have amassed the strongest and creative minds around and this article does a great duty to them and itself by occupying the space and depth of the research. From cover to end we follow the drama of life. The figures are a great way to give the reader a progression from top to bottom.
Here the familia extensa circumdederunt plays out arm chair style. First the cover art sets the scene; Ancient Andes fauna where eons of the universe have played out, and we tell the story of one little flower and the marriage of life. From the start we were intrigued by the question of speciation, and asked. Can we capture the meaning of this by exploring biological and genetic mechanisms at play in the reproduction of Solanum pennellii? The physical space this organism occupies is the isolating and chaotic habitat of the Atacama desert. A dry niche. However rife with physical barriers to survival, this organism is a successful example of the Atacama experience. Bright ornaments are on display seeking and listening for a link... Action occurs but its not what was expected. Thru a market of energetic exchange a message was delivered.... But that's only the first space, the physical. As biologist read on. So many messages what do they say, what is the genetic space? Some messages of unknown content, others very familiar, and even still those we wrote. The flux of life is photographed and the courtship is a puzzled plot, answering some questions and raising more. Life is a struggle with ones own self (SI), and that of another (UI). None other than the true recipient and that of the original writer may know for certain, but we've opened the envelope and have peeked inside, inside the drama of life.
I recently downloaded the histone modifications BAM and broadPeak files for GM12878 cells from UCSC (ENCODE histone modifications > Broad histone) from the link provided below:
However, the 'Additional Details' column for some of the files states the origAssembly=hg18 while it is hg19 for the others but the 'Alignment' sub-section in the 'Methods' section at the bottom of the page says GRCh37/hg19.
So my question is which human genome assembly (hg18 or hg19) was used for generating these files?
I have interval.list file with columns including chr, start, end, gene_name, TSS, Strand, as shown below:
chr1 958864565 58865165 A1BG 58864865 -
chr1 9 58863035 58863635 A1BG-AS1 58863335 +
chr10 52645135 52645735 A1CF 52645435 -
If any variant fall in to listed interval, How can I add gene name, TSS and Strand info relevant to that interval as well as variant to the VCF.
I want to study the expression of certain histone deacetylase (HDAC) genes and the expression of two pro-inflammatory genes – RIPK2 and COX2. Please let me know the best tools and methods for the same.
Hi, I haso worked on candidate genes exon regions in sugarcane and prepared a manuscript, would you be available for my article review ?
I need a BAC that spans the entire human FLI1 locus. There is one annotated on the UCSC genome browser but, as bad luck would have it, this clone is no longer available. How can I get a BAC That spans an interval of the FLI1 locus at ~200-kb in size?
Dear RG users,
I know that a series of genes are expressed in the same tissue of an eukaryotic organism; these genes map in close proximity in the genome, that is they lay hundreds of base pairs far away in the same chromosome; now my question is, is it possible to predict if all of the genes or at least some of them are under the control of the same promoter or if they are expressed in a polycistronic mRNA? any advice in this way will be highly appreciate
I am doing drug discovery work related to DNA methylation level in cancer cells. I wonder if there are new & good assay that allow me to efficiently track DNA methylation status on a specific promoter?
I know that I can do bisulfite sequencing, but its not fast and I cannot track many compounds. I wonder if there are newer assays out there that provide high throughput capacity?
I find it confusing how the super enhancer can be defined practically. There has been 'standards' set to define mouse embryonic stem cell super enhancers using specific transcription factors. Is there any more general markers (like the enhancer markers H3K27ac or H3K4me1 and 2) that can be used to define super enhancers in the whole genome level in all cell types?
I want to develop a research proposal to investigate a possible hybridization of a bird species. Also need to understand the direction of the gene flow. If anybody knows the correct procedure to do this or any reference that I can use please let me know..
What design of molecular techniques says about the functional quality of a plant? Say for example if the plant is a timber yielding by its economics?
I want to explore the transcription factor candidate for a human promoter gene. I want to mean that, let I do not know about any possible factor. So, now how can I find a transcriptional factor candidate for the promoter gene? What are the methods- using software only? Is there any manual method through which I can explore my query?
Please share as I am new in this cellular biology field.
for example how can I find it on NCBI?
I've been wondering about gene model quality coming out of eukaryotic genome sequencing projects. Could anyone direct me to some good reviews on gene model prediction approaches and resulting prediction quality metrics? In short: what makes a eukaryotic gene model a good model?
Identifying bacteia present in healthy verses disease conditions in a given biological sample
Illumina Hi-Seq 2000 and Illumina Hi-Seq 2500
Taxonomy, genetics, genome and something-else.
Drosophila Testes transcriptome sample show a very high number of transcript counts, 26-29 thousand transcripts.
1.Can one expect this high transcript count from a testis sample?
2.After de novo assembly, we see that the number of annotated transcripts which has shown a homology with known gene of drosophila melanogaster to be around ~8000. Others are not showing any homology. What about the remaining transcripts? this is a huge number to be neglected.
how do I choose a genome to be used as reference if i want to order the contigs once they are assembled?I know that it is supposed to be the most closely related . But, for example, in my case I am seqeuncing a clinical isolate of Klebsiella pneumoniae but I have no information about this bacterium and the most related species, Therefore, based on what do I choose a reference that is the most similar to my bacterium?
I am looking for genes that have altered expression levels. When I use the GEO database I find multiple results for the same gene within one dataset (for example MCM4 in dataset "GSE9750"). In this example MCM4-results appear 4 times: 2x MCM4 is significantly elevated and 2x MCM4 shows no significant elevation. I'm wondering why one dataset contains multiple results for the same gene that also appear to contradict each other.
I appreciate your help!
KBM-7 cells are a suspension myeloid cell line that are near-haploid. They are often used for gene-trap mutagenesis screens, or other functional genomic screens.
Can anybody suggest me appropriate tool for aligning and assembly of 574 sequences of 68 Mb data together?
I am planning to overexpress a transcription factor in INS1 cells and at the same time I like to knockdown another receptor protein in the same cells that is over expressed of transcription factor (let say transcription factor X). For over expression lipofectamine LTX plus and for knockdown lipofectamine RNAiMAX will be used. my protocol will be : First transcription factor X overexpression will be performed and after 4 hours media will be replaced with complete media and knockdown of the 2nd gene will be done and incubate cells for 48h. Do you think this procedure will work? constructive criticism and suggestions are welcome
Can someone tell me if epperndorf biospectrophotometer with micro cuvette is really better than Nanodrop in nucleic acids quantification, even for small concentrations?
I am new in learning chip data and methylation data. I want to know if I am given the methylation data and rna_seq data then how can I find that this TF has affinity for this methylated promoter or TF will not bind to this methylated site?
For knowing this first I need to have some info regarding TF binding affinity for different transcription factors or RNA-seq data and methylation data processing will itself give me this info?
I have an amino acid sequence of 110 which codes for a functional transporter. Utilizing which online tool and what are the possible way by which I can confirm the structure, functioning, or any other aspect related to the stretch of amino acid? Is 110 a.a. codes for a transporter? Please give your valuable suggestion.
I have amplified a functional gene with my environmental samples and already did the cloning and sequencing. I'm trying to cluster similar genes and I came across two term commonly used but I do not know the difference between the two as essentially both cluster similar genes into one consensus gene right? Can anybody explain to me the difference between the two?
Classical definition of SPECIFICATION says:
"cell type is not yet determined and any bias the cell has toward a certain fate can be reversed or transformed to another fate"
Do you agree to this definition?
If yes. Can anyone tell me how do we quantitate the difference between SPECIFICATION and DETERMINATION with an experiment having a perfect positive and negative control and a perfect Null hypothesis to test this concept?
If anyone has come across any paper which specifically test this question, please do tell me about that.
We have many samples and so want to set up 3 plates with an eBioscience kit and we have the Bio-rad machine and software. 1st plate has standards and samples and the other 2 are samples only (we assume that if we prepare all at once, there should be minimal variation between plates). Can the software handle this? or do we always need to run a standard curve on each plate?
The information encoded in DNA is more complex than previously thought:
• Alternative promoters.
• Products of RNA splicing.
• Non-coding repetitive DNA.
• Separate gene products of DNA sequences overlapping
• Gene sense and antisense transcripts.
• Multiple gene products combine to form a functional protein.
• Trans Action of gene enhancers located in different chromosomes.
The question is whether or not there is already a definition that encompasses these facts?
How does "junk DNA" help in regulation of gene expression ?
In my lab, a target gene was knockdowned (RNAi). Both RNAi and Control samples were separately prepared to construct libraries of small RNAs that were deep sequenced using Illumina platform (36 cycles). The sequencing results (performed in triplicate) showed a clear differential pattern in the length distribution of reads when Control and RNAi samples were compared. Most reads from RNAi samples ranges from 10 to 17 nt and many of them are rRNA. On the other hand, while reads generated from Control samples are longer than 20 nt. What these differential distribution patterns mean? How to explain the high abundance of ribosomal RNA in the RNAi samples and low abundance in Controls? Have you similar results and/or advices on how to interpret these data? Do you know any article discussing it? Please, share your expertise and beliefs.
As epigenetic studies have significantly been increasing in Eukaryotes , I would ask if some bioinformaticians did create new databases or rather software to predict the epigenetic modifications and changes of a specific gene in comparison to other homologous or orthologous genes beside chromatin database (ChromDB: The Chromatin Database), EPIC https://www.plant-epigenome.org/links or even NCBI http://www.ncbi.nlm.nih.gov/epigenomics may have limited information and data on plant epigenetics? OR should we dig and retrieve these speculations by coming back to their experimental assays?
Long read sequencing technologies are on the rise. Therefore it would be great to have a cost effective method for target selection using long DNA fragments.
Could molecular inversion probes be applied to capture long (>5kb) target fragments? Those would add the benefit that molecular indexing would be included.
What kind of (established?) capture technology would be best for such long fragments?
Complete genome of my standard bacterial strain is not available and I saw some people using scaffold genome of this strain for RNA-seq. Is it really ok because some part of sequence might be roughly known. some fragments might be unmapped and lost during mapping. what are your opinion
thank you in advance for your answers
For example, if a tRNA gene's predicted "secondary structure score" is under 20 and a "cove score" is under 10 (applying tRNAscan-SE, Lowe, v. 1.21) could one assume this gene is fully functional? The gene is expressed as a typical clover leaf structure.
When we run a known protein in blast, we can see many homologous proteins, some of them have name of proteins and even name of organisms but others have hypothetical proteins and name of organisms. My question here is during the collection of homologous proteins is that possible? Can we collect hypothetical proteins?
Recently, I obtained three RNA-seq pair-ended libraries for transcriptome assembly using trinity. The read number in total is nearly 290M while my server has a limited memory for 64G, which could not meet the requirement.
Could I do transcriptome assembly for each libraries and then merge them later somehow? Or other tools better than trinity in terms of memory usage efficiency?
In huntingtin gene, 67 exons regions are available. In which exon does the disease causing variable number of CAG repeats occur?
I am scoring SSR markers for genetic diversity analysis in sugarcane genotypes and I found some of markers with more than 4 alleles, but when I analysed them I got negative values for PIC, Anybody have some suggestion for me?
Due to absence of well annotated gene boundaries, I trained a gene predictor in the absence of UTR training parameters. In such condition what can be the minimum acceptable accuracy of gene-predictor on 'gene level'?
I have a very basic question. In genomic analysis we deal with coordinates. But I am little confused with the understanding about coordinates numbering system between positive and negative strand. I am aware of the 5'-3' orientations of positive and negative strands. But can someone explain me how coordinates are numbered for these strands. For example, If i have a position of nucleotide A at 432990 in Chromosome 11 positive strand, will the same position 432990 corresponds to the nucleotide T in the negative strand? Though I am not sure I guess there is no coordinates number for negative strand and the coordinate always refer to positive strand. However I expect someone to give me little more detail on this. Thanks.
I am interested in soil metagenomics and wish to do shotgun sequencing in order to see changes in populations' diversity and in functional genes. I want a library of 2.4 Gbp. I will start with ~200bp fragments.
I am Juan Garcia, from the Marine Biology department at the University of Vienna. I have found a very interesting article, attached.
I am working with Nitrosopumilus genomes and I would like to create a figure similar to your figure S2, so I wonder if you would be so kind to help me with that.
I know it has been done with Circos but I am not sure which option has been used to create one of the inner circles, in concrete, the one showing the regions of genomic plasticity compared to N maritimus and C symbiosum.
Is it possible that you used a circular stacked bar plot? I appreciate your help.
Thanks a lot for your attention
Are there kits available? If yes, which ones can you recommend?
Tumor genotype affect therapeutic outcome, I wonder if there is a way to modulate tumor genotypes favorable to therapy.
I have an average of 1.5 ul/ml total dna quantity on my vine dna samples. In this case, how should I calculate a touchdown PCR reaction for 20 - 25ul?
I want to investigate the functional genes of fungi responsible for N2O emission. Who can tell me how to do it? Many thanks in advance