Science topics: StatisticsNormalization
Science topic
Normalization - Science topic
Explore the latest questions and answers in Normalization, and find Normalization experts.
Questions related to Normalization
Hi all,
my lab has been using the SequalPrep Normalisation Plates from Invitrogen for a couple of years now, however we have never been able to get the expected concentration of 1-2ng/ul when using 20ul elution volume. We usually get a concentration in the range of 0.2-0.8ng/ul. The starting material we used is >250 ng amplicon per well. We have contacted the customer support before but it could not be explained why our concentration was so low.
I would just like to hear from others who have used the plates and what your experiences were and whether there any tips or tricks when performing the normalisation?
Thanks.
Greetings. Probably the question is not complex at all, but can't find an answer.
If I have RT-qPCR data of gene expression in a sample with multiple analitycal replicates - to compare it to data obtained in other experiments I need to normalize the expression of genes of interest to the expression of reference gene (which is constitutievely expressed)..
How to perform it if there are replicates and expression of both genes of interest and reference gene are in a form of Expression and Standard Error of the Mean?
Is there a formula to adjust GOI SEM using RG SEM?
1. For the gene expression data (microarray dataset which is been extracted from the Gene Expression Omnibus (GEO) platform), which of the following normalisation techniques are suggested as the best in order to handle the outliers: quantile, log, z score,… As I was following articles where they were normalising by combining quantile and log, but when I check for the dataset I’m working on, there are outliers which are then negatively skewed after normalising. Is it normal to have skewness even when they are normalised? If not, are there any other ways where we can normalise them without any skewness?
2. I was using the Student t-test and Fold change values, to identify the DEG for two different cores, where I ended up getting 202 genes in total, where 44 are common between these two cores. Is it normal to get some common differentially expressed genes for two different conditions ? If not, what mistake probably would have occured?
3. Any precise formula to calculate the fold change values from the gene expression values? All over the internet, there are plenty of formulas. So, I'm confused about which formula to use.
This might be a basic question but I have a question on how to normalise a western blot when you have multiple controls.
I have some western blots using human samples and they go roughly like this:
Control, Control, disease 1, disease 1, disease 1, disease 2, disease 2, disease 2.
I have 2 controls on there, I want to normalise to HSP90 as a loading control, but as I have 2 control samples do I take the average of these? and then when I calculate fold change is it again from the average of the 2 normalised controls?
Thanks
Dear ResearchGate community,
I am fairly new to RNASeq analysis & wanted to ask for your input regarding accounting for different sequencing depth across my samples. I am aware that there are several normalization techniques (e.g. TMM) for this case, however, some of my samples have considerably higher sequencing depths than others. Specifically, my samples (30) range from 20M to 46M reads/sample in sequencing depth (single-end). Can I still normalize this using the tools provided in the various packages (DESeq2, limma etc) or is it preferable to apply random subsampling of the fastq files prior to alignment (I am using kallisto)?
Many thanks in advance!
Best,
Luise
In data clustering, what is the best normalization method? And, what is the influence of each method on the results?
As is known, for AI/ML applications, operations such as data transformations/normalization are performed in the data preprocessing stage. In fact, Normalization aims to transform data so that features/values of different scales/sizes are on the same scale. At first, this is an innocent start. Generally, in AI and ML applications, Min-Max or Z-score data transformation techniques are frequently preferred for data normalization. Although the traditional trend is "Which normalization method to choose may vary depending on the characteristics of the data set and the context of the application", one of these two methods is selected automatically/without questioning. However, when there are alternative solutions such as Vector, Max, Logarithmic, Sum, why are these two enough?
Hi,
I have a question on qRT-PCR gene expression data analysis. I have a gene panel with 300 genes (292 test genes) + 8 housekeeping genes. The samples were run in batches over the period of time (first batch of samples = 90 + 3 Pooled Controls, Second batch = 50 + 3 Pooled Controls, Third batch = 70 + 3 Pooled Controls). Please let me know how should I normalize this type and handle batch effect.
Does the below approach makes sense?
- Combine the data (Ct values) from all the 3 batches (90 + 50 +70 samples) and save in *.csv file.
- Calculate Delta Ct = Difference between the Gene of interest and Arithmetic Mean of 8 housekeeping genes or Negative Delta Ct Ct(reference genes)- Ct(gene of interest).
- Plot PCA, heatmap etc.
Thank you,
Toufiq
We are trying to nanocoat cells to protect the cells. Normally the cell surface is negative charge, so we treated cells with positive charged material as the first layer, then treated cells with negative charged material as the second layer. How can I measure the charges of cell surface to demonstrate that the nanocoating of cells is successful? We ever tried to use zeta-potential to measure the charges of cell surface, but it didn't work.
Dear all,
I am currently trying to determine the number of archaea present in an ecosystem with the 16S rRNA gene.
I am normalizing the mcrA gene with the total archaea 16S rRNA gene. The mcrA gene measures the methanogenic acivity.
The problem that I am currently facing is that some archaea have multiple 16S rRNA genes, which makes it difficult to normalize to the mcrA gene.
Does anyone know how to solve this problem?
Thanks in advance!
Set effects are usually quite imponent and mask sample characteristics when dealing with human samples and TMT-labeled non-targeted proteomics.
What is in your view the best approach to preserve the experimental differences while flattening down set effects (technical artifacts)?
The Background of the Question and a Suggested Approach
Consider that, e.g., a tensile strength test has been performed with, say, three replicate specimens per specimen type on an inhomogeneous or anisotropic material like wood. Why do the strength property determinations typically not consider the number of collected data points? As a simplification, imagine, e.g., that replicate specimen 1 fails at 1.0 % strain with 500 collected data points, replicate 2 at 1.5 % strain with 750 data points and replicate 3 at 2.0 % strain with 1 000 data points. For the sake of argument, let us assume that the replicates with a lower strain are not defective specimens, i.e., they are accounted for in natural variation(s). Would it not make sense to use the ratio of the collected data points per replicate specimen (i.e., the number of data points a given replicate specimen has divided by the total number of data points for all replicates of a given specimen type combined) as a weighing factor to potentially calculate more realistic results? Does this make sense if one were to, e.g., plot an averaged stress-strain curve that considers all replicates by combining them into one plot for a given specimen type?
Questioning of the Weighing
Does this weighing approach introduce bias and a significant error(s) in the results by emphasising the measurements with a higher number of data points? For example, suppose the idea is to average all repeat specimens to describe the mechanical properties of a given specimen type. In that case, the issue is that the number of collected data points can vary significantly. Therefore, the repeat specimen with a higher number of data points is emphasised in the weighted averaged results. Then again, if no weighing is executed, then, e.g., there are 500 more data points between replicates 1 and 3 in the above hypothetical situation, i.e., the averaging is still biased since there is a 500 data point difference in the strain and other load data and, e.g., replicate 3 has some data points that neither of the preceding replicates has. Is the “answer” such that we assume a similar type of behaviour even when the recorded data vary, i.e., the trends of the stress-strain curves should be the same even if the specimens fail at different loads, strains, and times?
Further Questions and Suggestions
If this data point based weighing of the average mechanical properties is by its very nature an incorrect approach, should at least the number of collected data points or time taken in the test per replicate be reported to give a more realistic understanding of the research results? Furthermore, when averaging the results from repeat specimens, the assumption is that the elapsed times in the recorded data match the applied load(s). However, this is never the case with repeat specimens; matching the data meticulously as an exact function of time is tedious and time-consuming. So, instead of just weighing the data, should the data be somehow normalised concerning the elapsed time of the test in question? Consider that the overall strength of a given material might, e.g., have contributions from only one repeat specimen that simply took much longer to fail, as is the case in the above hypothetical example.
I have quantified the intensity of bands obtained in gels in which I have loaded experimental samples with WT enzyme and several mutants. My goal is to compare whether the WT enzyme or the mutants are better based on the intensity obtained for each band (the higher intensity I measure, the more activity the enzyme has).
I have several replicates of the experiments, I have normalized the values with respect to WT in each case, and then I have calculated the mean and standard error values. So I get a value of 1 for the WT enzyme and values below 1 or above 1 for the mutants. Would this be a correct way to express the results?
I wonder if it would be more appropriate to apply log2 to the values obtained, so that values close to 0 will indicate that the mutants are similar to WT, positive values will indicate that it is better, and negative values that it is worse.
ADVANTAGE:
Applying log2 maybe is useful because the same absolute value is obtained when an enzyme is 10 times worse or 10 times better (compared to the case of not applying log2, that the enzyme 10 times worse will have a value of 0.1 and the 10 times better will have a value of 10)
PROBLEM:
There are cases in which the mutants have a value of 0 because no band to quantify is obtained. So I can't apply log2 to 0 and the graph is incomplete for those mutants. And there are also times when the intensity is very low and when applying log2 a very high value comes out that distorts the graph.
ADDITIONAL PROBLEM: HOW TO NORMALIZE WHEN REFERENCE VALUE IS 0
Under certain conditions I obtain values of 0 or very close to 0 for the WT enzyme, which is the one I use as reference to calculate the ratios. What can I do in this case? Would it be correct to do this: I normalize the data with respect to a mutant that has high values, calculate the mean of the ratios, and normalize the means with respect to WT. In the Excel file “Problem” I have put an example of this under the heading “Situation 1”.
I'm not sure if it's correct to work with the data in this way. In the spreadsheet I have written an example called "Situation 2" in which I have normalized the data in both ways (with respect to WT and with respect to mutant 1) and I see that the final results are completely different.
What do you recommend to do to display the data correctly?
I'm doing a study based on compare two orbital sensors data, and on the study i'm basing on there is this normalization formula for the rasters: ((Bi<= 0) * 0) + ((Bi >= 10000) *1) + ((Bi >= 0) & (Bi < 10000)) * Float((Bi)/10000), Where "Bi" means "band". Is there someone who understad e could explain this formula? Thanks very much.
I'm planning to implement controlled cooling method for forged parts instead of normalizing process. I need to know how to design metallic box and implementation method.
Kindly share your valuable feedbacks.
Regards,
Vignesh
My question concerns the problem of calculating odds ration in logistic regression analysis when the input variables are from different scales (i.e.: 0.01-0.1, 0-1, 0-1000). Although the coefficients of the logistic regression looks fine, the odds ratio values are, in some cases, enormous (see example below).
In the example there were no outlier values in each input variables.
What is general rule, should we normalize all input variables before analysis to obtain reliable OR values?
Sincerely
Mateusz Soliński
I am working in a project where it's necessary to do a LED board to replace UVA 340/UVA 351 fluorescent tubes (represented in the graphic by the blue line).
I have to use LEDs with wavelengths from 280 to 420 nm and achieve a maximum point of 1 W/m² at 340 nm.
How can I normalize and distribute my LEDs in order to achieve the goal?
EDIT: Please see below for the edited version of this question first (02.04.22)
Hi,
I am searching for a reliable normalization method. I have two chip-seq datas to be compared with t-test but the rpkm values are biased. So I need to fix this before the t-test. For instance, when a value is high, it doesn't mean it is high in reality. There can be another factor to see this value is high. In reality, I should see a value closer to mean. Likewise, if a value is low and the factor is strong, we can say that's the reason why we see the low value. We should have seen value much closer to the mean. In brief, what I want is to eliminate the effect of this factor.
In line with this purpose, I have another data showing how strong this factor is for each value in the chip-seq datas (with again RPKM values). Should I simply divide my rpkm values by the corresponding RPKM to get unbiased data? Or is it better to divide rpkm values by the ratio of RPKM/ Mean(RPKMs) ?
Do you have any other suggestions? How should I eliminate the factor?
Dear collegues,
I try to a neural network.I normalized data with the minimum and maximum:
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
maxmindf <- as.data.frame(lapply(mydata, normalize))
and the results:
results <- data.frame(actual = testset$prov, prediction = nn.results$net.result).
So I can see the actual and predicted values only in normalized form.
Could you tell me please,how do I scale the real and predicted data back into the "unscaled" range?
P.s. minvec <- sapply(mydata,min)maxvec <- sapply(mydata,max)
> denormalize <- function(x,minval,maxval) {
+ x*(maxval-minval) + minval
doesn't work correct in my case.
Thanks a lot for your answers
I have raw data for global/untargeted mass spectrometry metabolomic data. I have processed that data and now have with me the peak intensities of all the m/z values. I had also spiked the samples with an internal standard. Can anyone tell me how can I normalize my data using the internal standard?
Hi, I have some DTI data and I want to use ANTs to make a b0 group wise template to normalize my data to the IIT_mean_b0 image. I want to use buildtemplateparallel.sh and SyN algorithm ,but I'm the amateur one. How am I supposed to run this and what are the inputs?
Apologies if this has been answered, I was not able to find anything similar to my question.
In 3 separate days, I have analysed cells from 10 WT and 10 KO animals by flow cytometry. Below I show some made up MFI values of a fluorophore to make my point. You can see that between each experiment the readings are shifted but in all 3 experiments, there is an approx 40% increase in the MFI values of the KO compared to the WTs. For this experiment, a 40% increase is biologically relevant but without any normalization, a T-test does not show a statistical significance.
If I was comparing two different treatments made on the same cells, I would run a paired T-test in the unnormalized data, but since here my KO and WT cells are not coming from the same animal, can I also do that?
Since I cannot repeat the experiments, my way of going around this is by dividing all values from each experiment by the average of the controls of that experiment (last 3 columns). The average of the control group will always be one but there is SD as well. The T-test ran with these normalized values now shows statistical significance, but is this correct?
I'm trying to follow through using the hyperbolic tangent for score normalization as here:
Conference Paper An Evaluation of Score Level Fusion Approaches for Fingerpri...
It states there that the final values should be between 0 and 1, however my final output is in the range between 0.47 and 0.51 for a number of sets of scores.
Most of these sets are already between the range [0,1] - although some have quite a different range of separability between genuine and mismatch scores.
The process I am performing is to calculate the mean of all genuine match scores, and the standard deviation of the entire set (as described in the paper) - and then I parse it into a tanh normalization function. I notice some other papers use a different set of means/standard deviations, but all combinations I try end up with similar results.
Here is my normalization code (written in Rust). Constants is just a struct containing all stats for genuine/mismatch/all.
pub fn tanh_normalization(score: f32, constants: &ScoreConstants) -> f32 {
let internal = 0.01 * (score - constants.genuine.mean) / constants.all.standard_deviation;
return 0.5 * (internal.tanh() + 1.);
}
Does anyone have any ideas that could help me? Or any other papers related to this that might help?
Thanks in advance.
Hey,
I recently have a confusion about single cell ATAC-seq integration analysis between samples. I have read many discussions about that issue. So, I summarized them into two solutions as follows:
SOLUTION 1. (data QC ignored here) find the union feature set from different samples -> generate count matrix for each sample -> merge them into one large count matrix -> normalization/Scaling/cell clustering/ cluster annotations……
SOLUTION 2. generate the count matrix for each sample -> normalization/Scaling/cell clustering/ cluster annotations for each sample -> find common features among all samples -> generate count matrix against the selected common features for each sample -> merging data using pipelines, e.g. Signac/Harmony, to perform cell clustering, cluster annotation and other following analysis (which usually with give a new assay for common features).
My questions:
Either one selected, I will have cell clusters now. So the next plan for me is retrieving differential features for each cell type/cluster, which will be the key to the further investigation of biological functions.
Q1. I know that batch effect indeed exists between samples, but for SOLUTION 1, will normalization and scaling for a single large count matrix work for differential enrichment analysis between samples?
Q2. If SOLUTION 1 is not reasonable, SOLUTION 2 will give rise to a new assay only contain the selected common features, based on which the batch effect should be well corrected and the cell might be better clustered. However, how to perform the differential analysis for non-common features in each clusters? (That's to say, will the batch effect correction in the newly integrated assay by SOLUTION 2 will work for total differential feature detection in raw assays at the sample level?)
Thanks and best regards!
The interactive wavelet plot that was once available on the webpage of colorado (C. Torrence and G. P. Compo, 1998) does not exist anymore. Are there any other trusted sites to compare our plot? And, in what cases we normalize our data by the standard deviation to perform continuous wavelet transform (Morlet)? I have seen that it is not necessary all the time. Few researchers also transform the time series into a series of percentiles believing that the transformed series reacts 'more linearly' to the original signal. So, what actually should we do? I expect an explanation by mainly focusing on data-processing techniques (standardization or normalization or leaving as it is).
If in a multivariate model we have several continuous variables and some categorical ones, we have to change the categoricals to dummy variables containing either 0 or 1.
Now to put all the variables together to calibrate a regression or classification model, we need to scale the variables.
Scaling a continuous variable is a meaningful process. But doing the same with columns containing 0 or 1 does not seem to be ideal. The dummies will not have their "fair share" of influencing the calibrated model.
Is there a solution to this?
I am running an RNA-sequencing experiment where I am analyzing the differential gene expression of oysters collected from different locations. I plan on using the Cyverse DNA Subway Greenline platform which utilizes Kallisto and Sleuth. Since I will be conducting multiple comparisons (ie oysters from Site 1 vs oysters from Site 2 vs oysters Site 3 etc.), I understand that this could run into significant statistical issues involving inferential and individual variation of each sample. Will the Kallisto and Sleuth algorithms correct for this? I imagine I will need to run all of my samples simultaneously through Kallisto so that normalization is done across all samples. Will this be sufficient to mitigate the noise from individual sample variation and make biological variation more significant? Or would I need to employ normalization methods such as TMM via edgeR? I am pretty new to this and learning along the way so any feedback is much appreciated!
Thanks in advance.
Hello every one
Based on my studies, I have found that it is not possible for all researchers to make the same conditions for producing samples. Therefore, each researcher by reporting working conditions and effective parameters, trying to produce a sample and perform a series of experiments to extract a series of response data based on their experiment design. The issue in such reports is when one intends to study and then compare the results. Due to differences in parameters, comparisons between studies are not possible. The reason for this is the difference of several parameters during the comparison. My question is, is there a general way to normalize response data based on multi-independent parameters?
I am a layman in metabolomics or LC-MS. And I am confused that which is the suitable internal standards for metabolomics study, like in urine、fecies or plasma samples, if I don't use isotape label metabolites as internal standards. Or, how to do normalization of peak intensities without any standards. Sorry for the stupid questions. Thank you all!
I have used geolocation grid data and interpolated the values between to get the incidence angles, but I am not able to assign the incidence angles to specific pixels on the image. Is there any way to get a 2d matrix of the incidence angles or any other way that may help me get to this?
Hi everyone,
I would really appreciate it if someone can tell me when and why a raw fluorescence starts below zero. In the picture attached below the blue colour curve starts from a negative value. My delta Rn values are quite low and I belive the negative start of the blue curve (raw data) is effecting the normalisation.
Can anyone give some insight?
I am trying to analyze Alzheimer and healthy (in case of Alzheimer) human brain slices. After loading the data and normalizing the data through default options of SCTransform command, when I try to plot the the expression levels of some genes through vln command I see that the expression values are changed to integers or they are categorized in some defined levels and not a continues period of numbers.
I have rechecked this issue with the default mouse brain tutorial and I have got the same problem.
Here is vln plot from the mouse brain dataset which is provided by Seurat as a tutorial. So it could be reproduceable for you as well.
> brain<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
assay = "Spatial",
slice = "slice1",
filter.matrix = TRUE,
to.upper = FALSE)
>brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE, do.scale=TRUE)
>VlnPlot(brain, features = 'APP')
here is the result of my sessioninfo():
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
sctransform_0.3.2 dplyr_1.0.2 patchwork_1.1.1 ggplot2_3.3.2 stxBrain.SeuratData_0.1.1 panc8.SeuratData_3.0.2
SeuratData_0.2.1 Seurat_3.2.3
I have metabolite concentrations from mammalian cells, and also total protein concentration in each replicate. I do not have an internal standard.
Please could someone describe or provide a literature reference/software method etc. of the best statistical method to normalize metabolite concentrations to total protein?
Many thanks
Dear to whom it may concern,
I would like to ask you about the normalization methods used to remove non-biological variations from the metabolomics data.
Because there have been many normalization methods reported until now, I am so confused about based on what criteria to select the best normalization method for a particular metabolomics data? Also, what is the meaning of each normalization method?
I hope that you may spend your little time clearing my questions and if convenient for you, may you show me the documents or tips in this case, please?
Thank you so much,
Pham Quynh Khoa.
Hello,
This questions pertains to comparing peaks from spectra of 2 or more different samples in a fair way through normalizing
I know Origin normalizes based on the highest peak however if I have PL or EL spectra that has two peaks for many samples, what is the best way to compare?
One method I observed was dividing the spectral intensity across the wavelengths by area (after integration) of the peak. Would this be the best way?
In our hands Drosophila Spike-in often fails to generate enough reads for a statistically relevant normalization and it is pretty expensive. This computational spike-in free method has recently been published and seems to produce similar results to the drosophila spike-in and ChIP-Rx data they benchmarked it against. Does anyone else have an opinion of this method? Any thoughts, concerns? Is anyone willing to try it against their current ChIP-seq normalization methods?
Hi all,
I have done qPCR with house keeping gene and GOI.
My goal here is to check the expression of a particular gene and not to compare them with treated or untreated control.
Is there a way other than absolute quantification, to calculate the CT value with house keeping and GOI data?
Thanks in advance.
Hi.
How should I normalize noisy spectra? Normalization by 0 to 1 will change the spectral shape while normalization by dividing by maximum will not change the spectral shape, however, the spectra will remain noisy and difficult to combine with other spectra (e.g., spectra for NIR and VIS regions). Thank you.
I am planing to apply EMS on upper extremity muscles and observe muscle using sEMG and MMG. I am wondering how to do the normalization in case of EMS? is it same as in the case of normal sEMG and MMG i.e. dividing the sub-maximal signals with the signals obtained during MVC?
I build collaborative filtering recommender system using surprise library in python. My dataset contains of three columns ( 'ReviewerID', 'ProductID', 'Rating') , the rating scale [-30,40] , I calculate the RMSE and it equals to 0.9 , Then I make a normalization process for the rating to change the scale to [-0.4, 0.4] when I calculate the RMSE it equals to 0.003 .. The difference in the RMSE is big and not reasonable, is it wrong to normalize the rating scale in CF?
Hi,
In LogNormalize function feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor prior to log transformation. My question is what does exactly multiplying by the scale factor do and why it is by default 10000?
Thank you
I have a keystroke model which is one of the modes in my multimodal biometric system. The keystroke model gives me an EER of 0.09 using Manhattan Scaled Distance. But then I am normalizing this distance to fit in the range of [0, 1] using tanh normalization. And when I run a check on the normalized scores I am getting an EER of 0.997. Is there something I am doing wrong? The tanh normalization I am calculating based on the mean and std dev of matching scores for genuine users.
I would like to use ordinary one-way ANOVA. I have 4 groups with 4 samples and more than 10 variables. Most variables show nonskewed distribution based on the QQ plot but some do not fit well.
I get the expected results if I logarithmize only the skewed data. Is it right to transform only those variables - of course in each group - that are skewed by the QQ plot?
In this case, I will have log-transformed variables (where necessary) and variables in their original form. The samples and variables are independent. Is this statistically acceptable?
Can anyone show me a similar situation in the published literature?
(MANOVA and Kruskal-Wallis are excluded.)
Thanks for the help!
Is there any standard procedure/sequence of tools to process the hyperspectral tabular data before PLSR regression modeling.
example of tools are 1) De-resolve 2) second derivative 3)normalize 4) de-trending 5)baseline etc.
Application is for field spectroradiometer data of soil and crop.
Or the sequence of tools differ for different datasets ?
Hi all,
I have gone through previous threads and numerous publications trying to find the best gene to use for qPCR with LPS activated macrophages (BMDMs) and microglia, and I understand everyone has their own preferences.
Between HMBS, ActB, HPRT, and 18S rRNA, which one is the most popular and dependable among macrophage and microglia researchers? I work primarily with primary mouse cells and sometimes ips-derived microglia.
I have even noticed some recent publications still using GAPDH..
Any advice will be appreciated!
Tanya
I am using the Lagrange Multiplier for maximizing the Normal/Gaussian distribution of objective functions under some constraints. So, I am searching for better techniques other than lagrange multiplier.
I have a task that delivers several measures of metacognition. I want to check that the task aligns to my model of metacognition using CFA. However, the results are on very different scales - 0-1; 0-10,000; 0-10; etc. SEM will not work with such widspread scales. What is the best way to prepare the data for CFA? Thank you for your help.
Hi, I'm running an antibody-dependent phagocytoisis assay, where we add serum to fluorescent antigen-coated beads, then donor neutrophils, and subsequently measure the amount of phagocytoisis as MFI (fluorescence) within our neutrophil population to give us a phagocytosis score.
As we run the assay on different days with different neutrophil donors we are also running a titration of a serum standard. I was wondering how best to utilise this standard curve?
One method we're using is similar to what you would do with an ELISA, whereby you interpolate the values of our diluted sample from the standard curve. But this is only giving me a somewhat arbitrary value of the dilution required (which I can convert to phagocytic units, but again this is just done arbitrarily). The other method I've tried is min max normalisation using the top and bottom of the standard curve values.
What I would ideally like is a method of normalisation that gives an output as phagocytic score, as this seems biologically most relevant. Any input is appreciated.
Cheers,
Mari
Does normalizing observed variables (i.e. bringing them to zero mean and unit variance) influence the fit statistics in SEM? I am running maximum likelihood confirmatory factor analysis with Satorra_Bentler Correction for the measurement model. SB RMSEA is 0.042 (non corrected RMSEA is 0.055) and SRMR is 0.056 which with a sample size of 267 signal good model fit (Hu & Bentler, 1998, 1999) but my SB CFI is 0.92 (non corrected CFI is 0.896) and SB TLI is 0.906 (uncorrected value is 0.877). The latter two values remain under the 0.95 treshold. I'm looking for ways to improve model fit. Already looked at MIs and nothing can be changed. Any suggestions? Can normalization of the observed variables help? Can you direct me to any readings relevant to this question? Thank you for your answers.
I'm currently preparing for some in situ hybridisation, which requires Maleic Acid Buffer with Tween (MABT) for some of the wash steps. The recipe i have been given is for 1L of 5x concentrated MAB, but it seems to be taking ages to dissolve. I have to have the solution prepared for autoclaving by 10am as my lab can only autoclave once per day at the moment. Is it normal for MAB to dissolve slowly? Does anyone have any recommendations?
I am working on some code where I am using intervals and I have to pass these intervals in softmax function e^x/sum(e^x). So if pass intervals as they are I got infinite primals and infinte partials. So I want to know a way how I can normalize them.
I want to measure BCA Protein for normalization purposes.
Let's say I have two flask. With cells irradiatied with a big dose (>10 Gy) and non-irradiated cells.
Number of cells in non-irradiated flask will increase
And a lot of cells in irriadiated flask will certanly be dead
So if I'm gonna measure BCA for example after 3 days of incubation will I also measure protein in cells who already dead?
Dear experts,
I am dealing with the synthesis and characterization of PIR foams. In particular, I am monitoring the kinetics of these foams by FTIR (please find the spectra attached below). As well-known from the literature, the asymmetric CH stretching band at 2972 cm-1 (which remains constant during the reaction) is typically used as internal reference band to correct the density changes during the foaming process. In the same way, my question is if you know from the literature some reference band that may be used for PIR to the same purpose.
Please note that for PU a polyether polyol is used, while for PIR is used a polyester polyol.
Thanks in advance.
Hi,
I have a question on qRT-PCR data analysis. The gene panel has more than 200 genes along with the multiple reference/housekeeping genes. I have 50 paired samples (i.e Before vs After Vitamin supplementation) and 6 pooled controls (3 Pooled controls in one plate and 3 in another plate as interplate controls). In other words, 50 samples in plate 1 (Before) + 3 Pooled Controls and 50 samples in plate 2 (After) + 3 Pooled Controls.
I would like to see the gene expression changes in the Before vs After Vitamin supplementation. Please let me know if the workflow followed looks fine.
- Calculation of Delta Ct = Difference between the Gene of interest and Geometric Mean of Multiple housekeeping genes
- Calculation of Delta Delta Ct = Difference between the samples (before and after vitamin supplementation) and average of pooled control samples
- Calculation of the 2 to the power of (negative Delta Delta Ct) to evaluate fold gene expression levels
Following the calculation of the Delta Ct, does the Delta Delta Ct calculation looks fine? I am bit confused here, if the average of the pooled controls should be considered and subtracted with the individual samples or difference of Before and After samples should be considered individually?
In addition, what values are considered for the statistical analysis like paired t-test, PCA, Scatter Plot and other visualization (Negative Delta Ct) or Delta Delta Ct?
Best Regards,
Toufiq
I have a number of time series (different lengths, different amplitudes) that I want to cluster according their similarity. Does anyone know about any method for normalizing DTW scores to make the scores comparable between them? Thanks.
Hello all,
I am data mining Wikipedia to discern which titles are edited in the most countries by geolocating edits performed with IP addresses. I am only interested in the top 100 titles edited in the most countries. I am arguing that these titles represent global ideas because their edits are the most spatially widespread. With these counts, I can then measure per country how many of these global titles are edited in that particular country. This can then be used to create a type of globalization index per country (e.g. Germany edited 95 of the titles edited in the most countries). I eventually would like to do a correlation of this index with a well established globalization index that relies on counting objects crossing borders (e.g., import/export). My argument is that the higher the connectivity of a country, the higher the globalized title index. I am only interested in the subject matter and discourses in the top 100 titles, so I need my same to be manageable.
My question is regarding the normalization of data. The number of individual editing IP's do effect the number of titles edited per country. However, this is not a normal per capita situation, for example a murder rate is all murders/population. In my case, I am arbitrarily selecting only the top one hundred titles on a list of titles per number of countries in which they are edited. It would be analogous to setting a murder rate per capita to the 100 most gruesome murders/population. A title that might be 101st in rank on the list could still be considered global is this aspect, but it just didn't make it to the top 100. So, I am uneasy about normalizing the data.
What would be the best way to normalize/standardize this data by number of individual editing IP's within Wikipedia per country given the situation that the numerator is an arbitrarily delimited group of a phenomenon?
Your help is greatly appreciated,
Tom
Do you think 'data value conflict' issue can be resolved using data normalization techniqueS? From my understanding de-normalization is a suggested by practitioners for DW development, but, normalizing a database includes amongst other aspects arranging data into logical groupings such that each part describes a small part of the whole, also normalization implies modifying data in one place will suffice, it also minimizes the impact of duplicate data. What do you suggest?
Hi,
I am using geNorm, Normfinder, RefFinder, and BestKeeper to identify and select the appropriate reference genes for the normalization in the RT-qPCR analysis. I was trying to identify the tools to calculate the pairwise variations (Vn/n + 1) to determine the optimal number of reference genes to calculate the V value such as V2/3, V3/4, etc. Should this calculations needs to be performed manually, and how can this be performed or any tools could specifically perform this task. Please assist me with this.
Best Regards,
Toufiq
I measured behavioral data (human body inclination) of two different groups (each, 15 subjects) in three different moments (sessions), the data was not normal (Shapiro-Wilk test) and did not respect homogeneity of variance (Levens' test). I ran non-parametric test (Mann-Whitny U test for independent comparisons and Friedman and Wilcoxon sign-rank test for paired sample comparisons).
The reviewer of my article asked me to transform data to be normal and then run parametric test (rmANOVA). I applied many transformation, only one of them was good: the LMS approach proposed by Cole and Green (1992). It is also known as LMS quantile regression with the Box-Cox transformation to normality as well as it is known as Box-Cox Cole-Green (BCCN) transformation. The formula is: Z = ((y/μ)^L-1) / (S*L), where L is a constant parameter, μ is the mean value and S is generalized coefficient of variation (i.e., σ/μ and σ is standard deviation). It is not so common transformation.
My question is: can I use any kind of transformation in the literature to transform data to meet the assumptions of parametric test in statistical analysis like rmANOVA? Or, I should just use the transformations that are famous in the field that I am working?
Many thanks in advance for your comments!
Cole TJ, Green PJ. Smoothing reference centile curves: the LMS method and penalized likelihood. Stat. Med. 1992;11:1305–1319.
I do not think the data I have requires log-transformation to carry out PCA as the distribution is not skewed. Should I normalise my data using scale = TRUE (in R studio) before carrying out PCA. This lowers the variance compared to raw data and does not seem to affect separation drastically.
My goal is to filter a polyline shapefile eliminating all the features with a complex shape. So I calculated, for each feature, the length and the number of dangling endpoints. My goal is now to create a new field that relates these two parameters, so that, for example, a feature with 3 dangling endpoints and a length of 1 meter is deleted, but a feature with 3 dangling points and a length of 50 meters is maintained. I think I need to normalize values but I'm not expert in statistic.
How to normalize numeric ordinal data from [1, Infinity) to [1,5]? I want to normalize relative scores which is ordinal by nature. But range of score can be from [1, Infinity). So, I need to bring it on scale of [1,5]. Can anybody help me figure it out?
Data values are double (float) type values.
Do I right understand that scRNA seq data in RPKM format are normalized in the expression within one sample, but do not normalized between the samples? And, therefore, they can contain all the copies of one gene due to PCR amplification step?
Do you know any good recommendations or articles which can help with RPKM data normalization and downstream analysis?
Thank you in advance!
Hello everyone, I performed an Interferon Y ELISA after a coculture of my cell lines transduced with vectors containing antigens and pbmcs transduced with 8 t cell receptors recognizing the antigens. My problem is that transduction efficiency between the tcrs is very different from 1% to 94%. I would like to analyze the Interferon y secretion on the basis of transduction efficiency, but I think I have to normalize my data. How can I do this? And what is the easiest way to do it? I am writing my master thesis and I don't have any experience in normalization yet. So it would be great, if somone has an idea. Thank you so much!
HI,
I am working on breast cancer and I\m trying to use MCF12F as normal breast cell line. I purchased it from ATCC and Iam having problem growing it. It dosent seem to attach and the few cells that have attached are floating after few days. The cells are like this and have added everything to the medium as recommended by ATCC. They replaced the cell twice for me but still its the same. Does anyone have this problem. Suggestions are welcome.
Hello,
I am trying to do normalization the data of GSE8397 with MAS5.0 by using R:
setwd("D:/justforR/GSE8397")
source("http://bioconductor.org/biocLite.R")
biocLite()
library(affy)
affy.data = ReadAffy()
However, the data used to 2 platforms: Affymetrix Human Genome U133A and Affymetrix Human Genome U133B Array.
The code gave me the warning message: "Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, :
Cel file D:/justforR/GSE8397/GSM208669.cel does not seem to be of HG-U133A type"
So, how can I keep normalizing the data when they are in both U133A and B? Should I try another method of normalization (RMA or GCRMA?)
DO you have any ideas about this problem?
Thank you so much!
I am working on dataset in which almost every feature has missiong values. I want to impute missing values with KNN method. But as KNN works on distance metrics so it is advised to perform normalization of dataset before its use. Iam using scikit-learn library for this.
But how can I perform normalization with missing values.
During meta-analysis, we have to do normalization data in preprocessing data steps and then we will analyze the different expressed genes. But after we do the normalization, how can we know that the data we have is good or not? I mean do we have any validation step for normalization? If yes, how do we do validation for normalization?
Thank you!
I am studying about how to do normalization. At first, I think that people have to calculate the log2 ratio as a primary step to convert the T=R/G ratio in microarray into another simple number to show the difference of up and down regulated of the genes, but then I see that in some case, people do log2 transformation as a step of normalization. So, is it log2 transformation a normalization method? Is it log2 transformation the same as log2 ratio calculation? (or they are different?
Thank you for your attention. Please ask me if my question confuse you!
It's a comparison of liposomal vitamin C to non-liposomal vitamin C. There were 21 geometric mean data points taken from each test group. I have the SD for each data point, there is some significant inter-subject variation. Do I need to dose-normalise this data before presenting the geometric mean and SD datasets on the graph? Is normalisation of data in this case necessary and why?