Science topic
Data Analysis - Science topic
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Questions related to Data Analysis
I have collected my data in November 2021 for my phd thesis in banking given that it was collected after covid happened. I had to take a break last year due to personal circumstances and starting my data analysis now to submit this year. Is there any need for further data collection? Please suggest.
I've been trying to install Xcalibur in my office to analyze data. My computer is a Macmini M1 2020, up to date, using Parallels and Windows 11. I cannot make it work!
I have downloaded the .NET Framework, Java is up to date, everything...
I have Xcalibur in my Intel based MacPro, but with the M1 it is impossible so far.
Suggestions are appreciated!
Hi all,
I've done a CLF test and could you recommend what's the threshold value of a good CLF (if possible could you pls add some references)? I used '0.2' recommended by James Gaskin (on YouTube) and found 4 out of 17 exceeded 0.2 (ranged from 0.2-0.24), is it acceptable?
p.s. the test of common method variance is the last step of my data analysis, other tests all have good results. (preliminary analysis and SEM)
or do you recommend any other threshold value?
MANY THANKS!!!!!!!!!!!!!!
how to use data analytics to detect and mitigate fraud online
One of the most significant steps in solving multi criteria decision-making (MCDM) problems is the normalization of the decision matrix. The consideration for the normalization of the data in a judgment matrix is an essential step as it can influence the ranking list.
Is there any other normalization method for the "nominal-is-better" case besides the normalization that is possible through gray relational analysis (GRA)?
To measure sustainability and food security, what parameters need to be included while surveying urban agriculture? or is it possible to assess the data regarding the food security of a specific metropolitan area and its sustainability by a survey? If yes, what kind of data and models must be incorporated?
I am conducting a descriptive survey study with a population of less than 20 respondents. what is the right statistical tool for the data analysis?
I included a few secondary data analyses in my systematic review. As I am filling in my Prisma flow chart, I would like to know if these are considered unique studies. The data and inquiry proposed in these are different than those of the parent study, though the data used to answer the question is from previously done unique studies.
I have a set of data from three groups across 3 time points (2 groups are subjected to an intervention and the third acts as control). I want to evaluate intra and inter-group differences between these time points for a given variable. However, at baseline evaluation, one group data is significantly lower compared to the others. I thought maybe I could analyze data from a percentage standpoint, say first assessment is 0% and afterwards intragroup difference is 15% or whatever. I believe this is possible for intragroup analysis but I´m not sure if it is possible for inter-group analysis.
On the other hand, I could just analyze raw data instead of transforming it but this may lead to some issues that I´ll probably have to adress in the discussion.
Any idea on what can I do?
Thanks in advance!
Dear Sir
I'm using openair packages in R studio for air quality data analysis. Does anyone know if the polar map function in openairmaps package ( i.e. tools to create maps of air pollution data) works for the longitudes and latitudes of middle east countries?
I'll be so grateful if anyone could help me with this issue
Regards
How to deal with quantitative data? Which methods are there for data analysis?
In our study, it was recommended by our professors to set a deadline and if at least 20% of the total population has responded within the time allotted, we may proceed with data analysis and interpretation. However, we cannot find a reliable source to support this method.
I have tried to separate a direct coculture of MSCs (mesenchymal stromal cells) and macrophages to do bulk RNA seq on macrophages, as I want to find out how MSCs change the genetic expression on macrophages. I have tried different methods to separate the coculture as much possible, but I can only manage to retrieve a cell population with 95% macrophages, and 5% MSCs still present.
Therefore, I want to know if anyone has experience with analyzing data when the population is not completely pure with one cell type and how do I handle such data?
Is it wise to proceed with bulk RNA seq when 5% of my cells are still MSCs, well aware that the expressed genes observed could come from the 5% MSCs?
Does anyone have experience with analyzing data from a survey based on the original questionnaire from TAM3, in Stata? What kind of analysis is relevant to do, so you can use it in the TAM3?
Allso, since some of the items are negative in its phrasing, do you recode it in Stata so that every item are positive?
Exploratory questions:
1. Is there a relationship between empathy concern and fantasy?
2. Is there a difference in general scores and task-specific scores (before and after reading)
3. Do individuals who spend more time reading have a higher empathy score?
This is a correlational study, therefore would i need to use SPSS, and would it be a T-Test?
Also, would thier be an independent variable, if I'm looking for a difference and since I'm not changing any variables?
Thank you.
Dear All,
I have results for the pH, Temperature, TDS, Total Solids, Salinity, Total Hardness, Ca-Hardness, Mg-Hardness, Turbidity, and EC of the water samples collected from the Ponds and borewells of my targeted area.
- Please suggest to me how to analyze this data in a report format.
- Can a Water Quality Index (WQI) be prepared using these parameters?
Thanks in Advance.
I'm recently trying to perform an RNA seq data analysis and in 1st step, I faced a few questions in my mind, which I would like to understand. Please help to understand these questions.
1) In 1st image, raw data from NCBI-SRA have marked 1&2 at the ends of the reads, What is the meaning of this? are those meaning forward and reverse reads?
2) In the second image I was trying to perform trimmomatic with this data set. I chose "paired-end as a collection" but it does not take any input even though my data was there in "fastqsanger.gz" format. Why is that? Should I treat this paired-end data as single-end data while performing Trimmomatic?
3) in the 3rd and 4th images, I collected the same data from ENA where they give two separate files for 1 and 2 marked data in SRA. Then I tried to process them in Trimmomatic by using "Paired-end as individual dataset" and then run it. Trimmomatic gives me 4 files for those, Why is that? which one will be useful for alignment ??
A big thank you in advance :)
I am detecting an additional band aside from the expected VP1, VP2, and VP3 bands. This is AAV9, and the extra band is appearing at 100-110 Kda. Ideas?
I have been working with MS data with a Bruker TIMS TOF machine, and on Bruker's Data Analysis software.
I would like to export the data out (e.g. .mzXML, or just .XML), but currently the Data Analysis software only allows doing this one spectrum at a time, so bulk exporting takes ages.
I know Data Analysis supports Visual Basic scripts, so perhaps a loop or other script may enable bulk exporting. Unfortunately I have absolutely no experience with Visual Basic, and have no idea how to even get started, even after reading the manual...
Has anyone out there already written a script to do this? Or have an idea/template of how to start?
Hi,
Kindly help me to understand when to use AMOS or SMART-PLS? for data analysis.
Regards,
I am currently working with a study titled "Knowledge, Competency, Adoptability and Sustainability of Artificial Intelligence (AI) Technology Among Physician Entrepreneurs in GCC Countries". I am in dilemma that which SEM model AMOS/SmartPLS will suit for the study. The total responses received so far is 220 out of the population size of 400. The main objective is to find out knowledge , competency, adoptability and sustainability of Artificial intelligence among Physician entrepreneurs.
I will be much thankful to you, if you can extend your kind feedback on this.
Regards
Dr.Sharfras Navas
I have data from Likert scale responses at pre and post test and having an experiment and control groups. I don't have any experience using ANCOVA and data from Likert scales. I have read many pdfs and posts, but I am not certain about whether ANCOVA can be used to compare means at pre and post test when the responses are not paired. Can I compare the means of control vs experiment group at pretest (posttest)?
Thank you in advance
I am currently working with a study titled "Knowledge, Competency, Adoptability and Sustainability of Artificial Intelligence (AI) Technology Among Physician Entrepreneurs in GCC Countries". I am in dilemma that which SEM model AMOS/SmartPLS will suit for the study. The total responses received so far is 220 out of the population size of 400. The main objective is to find out knowledge , competency, adoptability and sustainability of Artificial intelligence among Physician entrepreneurs.
I will be much thankful to you, if you can extend your kind feedback on this.
Regards
Dr.Sharfras Navas
Although the preprints submissions can be interesting to check the repercussions, criticisms and suggestions pointed out, I think that a not very good choice of platform can have limited results. Among the platforms researched, I observed that OSF, Scielo and Elsevier preprints offer a reasonable structure for preprints submissions and analysis. In this sense, I kindly ask for suggestions on which platform may be the most indicated for a preprint submission in the field of social sciences.
I think such applications have limitations and data analysis by them is QUESTIONABLE?
I used LCMS to detect metabolites and after a PCA plot was generated, I noticed that there were some sample outliers. What could be the cause of this?
I'm starting a discussion on a topic that has bothered me for some time. Many missiological research pieces, especially doctoral dissertations, use qualitative research methods. Mission scholars collect rare and precious data related to missions from various cultural settings around the world. We mostly do a good job of collecting, organizing, and perhaps summarizing the data, but it seems to me that when it comes to data analysis, we often fall flat. Our analyses tend to remain at descriptive level, often not going beyond mere descriptions and summaries. It is rare to find missiological qualitative inquiries that reach a more conceptual, interpretive stage of data analysis. There seem to be a variety of reasons for these, but I wonder if the curriculum and structure of PhD and DMiss programs are the primary reasons for the lack of depth in analysis?
I would welcome your inputs on the issue, the cause, and solutions, or your outright disagreement with my argument as well! Thanks!
Dear all
I'm starting on fNIRS data analysis and I'm planning to use open source toolboxes.
I found that Homer and Nirslab are the best for MATLAB and MNE for Python.
Which one do you prefer and why?
Best
Lucas
This discussion focuses on two questions:
1. How to technically teach the process of writing code and what are the implications of generative AI on this process?
2. Would you begin teaching data analysis with GUI tools (like excel or google spreadsheets) or dove straight into programming?
I am very interested in hearing your opinion
I will be compare males and females, that are unemployed and employed or those who drink vs. those who don't in regards to there overall life satisfaction. I am undecided on what data analyse is best to use?
Dear colleagues
Hi
I am a PhD Candidate in Microbiology.
would you kindly please to receive the practical details of PCR & qPCR methods in the form of training exercises in a useful and comprehensive way?
Thanks in Advance
- A latest Book holding details on every important aspect of Next Generation Sequencing and Genomic Data Analysis with newest available tools
- Book with examples in Command Line
Hi everyone,
I prepared data prior to actual data analysis (CFA and SEM). First, I looked at the univariate outliers. To do that, I checked univariate outliers which belong to each sub-scale one by one through box plots and histograms. I diagnosed univariate outliers in box plots. I checked them. They ranged from the lowest to the highest values of sub-scales. For example, participants rate their answers from 1 (lowest) to 7 (highest). When I checked the outliers, all asterisk outliers range from the lowest and the highest values. There were not extreme or absurd values (e.g., 100) in the data.
Here, I have some questions.
1. Do you think should I remove these outliers? If I should, why I should remove these outliers?
2. How can I remove these outliers?
3. Should I remove them one by one or together?
4. Should I remove this outlier from all items or only the item which belongs to this outlier?
5. If I remove only the item which belongs to this outlier, this situation will cause no value or missing value, how can I deal with in this situation? Should I left it as missing value?
Best,
I will measure the statistical significance of different treatment groups exposed to certain toxins and also see their deputation kinetics/ pharmacokinetics model/ log regeetion line diagram and the graphs mentioned in the question. How can I start ? which is the best way. which books, site, tutorials etc can help me to learn and do that. I have no previous coding experience. Please see the pictures of the graphs I want to produce like.
Long story short--the Chromeleon software I am currently using is old (and on an old computer), which is making it buggy and difficult to actually analyze data and generate attractive chromatograph overlays.
I would like to use OriginLab; however, I don't really see any easy way to do this. I figured out how to bulk data input; however, it seems that I would have to write a bunch of very long IF() functions to incorporate information from fit equations of standard curves and identify/label peaks based on retention time windows.
Is there a more reasonable way to do this such as built in app or standardized workflow similar to that included in Chromeleon?
Hello,
I recently have started in a new position analyzing seagrass and other submerged aquatic vegetation (SAV) within an aquatic preserve. Our data is collected using a modified Braun-Blanquet (BB) score where if a species is present in our quadrat we give it a score of1 (less than 5%), 2 (5-25%), 3 (25-50%), 4 (50-75%), and 5 (75-100%). Historically they had been analyzing this data using average BB scores and percent occurrence. In each system we collect four quadrats from 25 sites and then the system as a whole is analyzed. I have not used BB methods before so I was looking into ways to analyze and learned basically that averaging these scores is not a good method for analyzing BB scores. Does anyone have any suggestions or insight on the best way to analyze this data? I've been reading from many different sources that all suggest different methods. I was wondering if anyone can help point me in the right direction for starting out, we have large datasets with 5 species of seagrasses and multiple species of algae. Any advice or suggested reading is highly appreciated.
Thank you
Phenomenology. Van Manen method of data analysis
How to choose SEM-AMOS or SEM-SmartPLS for data analysis?
are both of them acceptable international?
Chi-Square Tests
Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 125.476a 4 .000
Likelihood Ratio 107.862 4 .000
Linear-by-Linear Association 53.751 1 .000
N of Valid Cases 247
a 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.23.
Greeting
im a bit confused of how we select the best statistical method to start my data analysis ? When can use ANCOVa or ANOVA? And how i can do the ANCOVA by prism ?thanks for replying in advance
Enlist all with some relevant examples/links, Thank you!
Hi dear all
im new in this nvivo and i want to use it in a quanitative statistical data analysis ? And also using it on making litriture?
here is my question is there a possibilites to use nvivo in such ways? Thanks
Hi, I kindly need your help, I am doing research on Experiential Marketing, I am using a structured qualitative survey questionnaire. How would my methodology and data analysis process be since due to this Covid 19 era, my school has discontinued the use of face to face interview for this year's thesis. Thank you
Hello, I am currently analyzing data on insect counts. I am comparing insect orders (counts) across four seasons and three elevations collected using three different methods. My primary question is "How do insect order counts vary with season and elevation?" I am using 'glmmadmb' function in R to fit a model. I came across so many combinations and tried to choose the best model based on AIC, BIC, overdispersion, and log-likelihood values. However, I am confused if the model indicated by these parameters is appropriate or not. For e.g. the best model indicated is:
mod <- glmmadmb(counts ~ season+(elevation|order), family="nbinom", data =x)
But I believe mod1 should be used:
mod1 <- glmmadmb(counts ~ season+elevation+(1|order), family="nbinom", data =x)
Further, what if I want to incorporate the variable "method of the collection" into the model? Would it be something like this?
mod3 <- glmmadmb(counts ~ season+elevation+(1|order)+(1|method), family="nbinom", data =x) or even more complicated?
After reading so many papers, I am confused about the various combinations of random effects like e|g; (1|e)+(1|g); (1|e/g); (0|e/g) etc.
I have spent days trying to figure this out. The more I read, the more confused I am. Any help would be highly appreciated.
Hi guys,
I raised a question when conducting the Two-Sample Kolmogorov-Smirnov Test. So, I have two sets of data, and I'd like to compare their similarity in distribution. After I ran the K-S test, I got a P < 0.001, which reject the hypothesis and suggests two datasets are not similar in distribution (please see attached the figure). However, I also got a D = 0.121 from the result. As far as I know, when D close to 1, it indicates two datasets have different distributions, and when D close to 0, it indicates two datasets have similar distributions. When looking at this very low D value, does it mean the cumulative distributions between two datasets are actually similar? It seems controversial.
So, how should I interpret the result? And is it reasonable to use the D value to quantify the similarity between these two datasets?
Thanks in advance
Hi there, I'm so appreciated you pay attentions to this question.
Nowadays, it is quite common that we consider masking or padding when dealing with the data including missing values on sequential data on RNN (or perhaps on image data on CNN).
However, I have not found the paper to propose/establish the exact method.
The reason why I'm looking for it is that I would like to learn what exact happens in RNN to LSTM layer when it is masked as 'Skip the input' .
If you know of any, please let me know.
Thanks from Japan.
---I would make a few additions---
When reading about masking technique, we can often see the descriptions like “ignore the missing value” or “skip the input” with masking. On the other hand, there are few references in the literature that explain in mathematical formulas the skipping of input missing values.
Then, I wonder what exactly happens inside the RNN layer when input is masked.
As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.
And, if the input is masked to be ignored, how is z^t calculated?
Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?
Or, is z^t not calculated and exported as NaN value?
I'm sorry but I'm not looking for an imputation method but the mechanism inside RNN or LSTM when masked to ignore the input.
Again, thank you from Japan.
Hi, everyone. i am calculating NRI and IDI using STATA. I want to compare the discrimination ability of two seperate models (Model A and Model B). I think the nri program of stata seems only to calculate NRI that reflects the discrimination between a base model and the model in which a new marker is added to the base model. Now, how should i calculate a NRI that reflects the discrimination of two models which includes different variates.
Thanks in advance!
Can anyone support us in doing Balanced isometric log ratios for the geochemical data of marine phosphorite.
I mean including the selection weights of the statistical units in the data analysis;
When the sample is probability and stratified. It is necessary to consider the weight of each clustering during the analyses, the regressions, and the calculation of the errors, to obtain statistics that are representative of the population. Otherwise, the results are all biased.
I haven't seen an article that deals with this in any detail.
In fact, I have here the data collected on 16 countries by two different institutions last year. They used probability and stratified sampling for this survey. But they did not specify the weights of the strata, nor did they document the methodology. These data need to be merged with the data collected this year to move forward with the work.
Hello all,
I am a graduate student preparing my research proposal on this project idea that I would share below.
I would like to conduct a research on an incumbent company facing threats to their business by new entrants into the market as well as other established competitors.
My question is, what kind of data should I be looking to get for this research topic as well as how to analyze the data??
Any help would be well appreciated.
Thank You
Dear, Im having trouble with analyzing data using two-way repeat-measured ANOVA. The reason that I use this technique is that my experiment measure cell survival of the the treated cell with different concentrations of drug at 3 time points, continually. The experiment was repeated for 5 times. Therefore, the time of incubation is the dependent factor since I measured cell survival continually. I found that in some of cell which were treated with one concentration of drug at one time point, the data from 5 experiments aren't distributed normally. I have been trying to find the alternative method like Scheirer-Ray-Hare test but the assumption of this test doens't include data dependency. Moreover, I find the discussion about this problem or similar that some assumption can be violated by ANOVA but those are quite controversial and confusing. Im very novice to statistics. Any suggestion would be humbly appreciated :')
Hello, I'm currently working on my data analysis but I'm not sure what statistical test to use.
My research objective is: To determine the effect of age, gender, and GPA on the work readiness of graduating students
My hypotheses are:
- H1: Age significantly affects students' work readiness
- H2: Gender significantly affects students' work readiness
- H3: GPA significantly affects students' work readiness
In my study, work readiness is measured through a Likert-scale instrument (from 1 to 5), and I'll derive the mean scores to interpret work readiness.
My lab developed a method for analysis in the LC-MS using the qualitative Agilent mass hunter software; however, we find it very hard to generate reports with it, especially when it comes to find concentrations of our analyte in the samples. I want to start using the quantitative software, but I do not know if there is a way to use the method already used for the quantitative software or if I would need to create one from zero.
Qualitative data indicates interviews, open-ended questions etc
Dear Colleagues and researchers
I'm working on a longitudinal study in which some distinct variables and constructs are being investigated simultaneously. I have a challenge with how to explore the relationship between or among them. I'll be grateful if anybody could help me to design the framework of my study and how I can test the expected relations, and also which kind of statistical tests and analytic procedures such as cluster analysis, multi-categorical multiple mediation analysis, or ..... I should use to be able to explore those relations.
It is highly appreciated if anybody who is an expert in research design (preferably EFL) or statistics can be a help
Sincerely Yours
I used three data analysis methods (Descriptive statistics, ANOVA, and linear regression in my research. Can I say I used mixed methods analysis?
Origin Lab is such a great tool for data analysis and plotting graphs. Why has not it attracted interest to the software developers to develop a mac version so that we don't have to go through tedious and less efficient way of using Virtualization Software or Boot Camp?
In the JBI checklist for critical appraisal of prevalence studies, How can I know that data analysis of a study paper was conducted with sufficient coverage of the identified sample?
I am looking forward to the answer.
Thank you!
For making data analysis like logistics regression, linear regression etc which softwares is easily available and free to analyse.
Dear connections
Recently I was analyzing data in AMOS. While calculating reliability and validity, the values of AVE for a few constructs were less than 0.50, and CR was less than 0.70. These values come after eliminating indicators that were reporting low factor loadings. I was wondering if I could report the AVE, which was 0.469, and CR was 0.689. However, I was afraid to report these values due to a lack of evidence.
So can we report these values in the model? If yes, then can you share any paper supporting the findings?
Regards
Dr. Bhuvanesh
Hello researchers
Please suggest me any freeware for data analysis.
Currently i am using originlab software , with 21 days license. Any other software please suggest me
Dear colleagues,
What can you suggest in order to fill the missing values, as you can see in the picture?
Best
Ibrahim
I have bachelors in Software Engineering and I am interested in Data Analytics. Now I am applying for a master's degree program in Canada, but I am unable to find any good research problem in data analytics or data science.
Actually, I want to use data from some telecom industries and work in this field but I couldn't find any specific problem statement. Secondly, I have some raw ideas for combining the big data analytics field with IoT or networking (correct me if I am wrong, am not so sure)
Kindly suggest to me some good research topics in this regard.
Thanks in advance.
Dear scholars,
I have a database with 100 geolocated samples in a given area, each sample contains 38 chemical elements that were quantified.
Some of these samples contain values Below the Detection Level of the instrument (BDL), clearly, when we have 100% of the samples with BDL values there is not much to do, but what can be done when, for example, when there is only 20% BDL, what do we do with them, with what value do we replace a BDL sample?
Some papers show that a BDL sample can be replaced by the detection level (for the instrument's minimum detection level for that chemical element) divided by 0.25, others show that you have to divide it by 0.5... What would you do in each case, and is there any literature you would recommend? If it matters, I am mostly interested in Copper and Arsenic.
Regards
Dear scholars,
I have a database with 100 geolocated samples in a given area, each sample contains 38 chemical elements that were quantified.
Some of these samples contain values Below the Detection Level of the instrument (BDL), clearly, when we have 100% of the samples with BDL values there is not much to do, but what can be done when, for example, when there is only 20% BDL, what do we do with them, with what value do we replace a BDL sample?
Some papers show that a BDL sample can be replaced by the detection level (for the instrument's minimum detection level for that chemical element) divided by 0.25, others show that you have to divide it by 0.5... What would you do in each case, and is there any literature you would recommend? If it matters, I am mostly interested in Copper and Arsenic.
Regards
Hi, I performed a ChIP-Seq experiment of potato samples! The sequencing company did the library preparation, sequencing and data analysis. In the result I found too many target genes. When I looked at the peaks in the genome browser using the relevant files received from the company, I noticed that there are too many peaks distributed throughout the genomic region that resulted in too many targets, and hence I am not able to decide which ones to trust! Further, peaks are distributed everywhere including the intron, exons, and intergenic regions (which could be true). But I am wondering why do I get high number of peaks in the intron-exon and coding regions of the genes? And also what could be reason of getting too many peaks? The peaks in the immunoprecipitated DNA samples has been normalised with the control input DNA samples. I am attaching a picture for the reference! In the picture, middle sample peak is for the input control and other two row peaks are for two independent IP experiments.
I will appreciate your suggestions!
Qualitative research has its own benefits and issues. So I want to know about Any online workshop or course for qualitative research, and data analysis.
I would need a (tabular, i.e. not imaging or text) dataset with a hierarchically structured outcome to use as an example dataset in a new R package (but the dataset can be of any format, e.g. txt, csv or arff). It should be single-label and tree-structured, e.g. first level: classes 1, ..., 4, second level: 1.1, 1.2, 1.3, 2.1,2.2, third level: 1.1.1, 1.1.2, 1.2.1, 1.2.2, 1.2.3, 1.3.1, 1.3.2., ... .
Hi,
I am conducting a public questionnaire asking about people's decision-making. The dependent variable is binary, whether choosing to do action X or not. I
In the questionnaire, I asked two major questions about the reasons for choosing to do action X or, if they did not do that, the possible reason as well. I used a five-degree Likert scale.
Questions such as, I did action X because:
- o 1
- o 2
- o …
And I did NOT do action X because:
- o 1
- o 2
- o …
What is the most appropriate method to analyze the data? Can I use binary logistic regression for this data set?