Science topic

Data Analysis - Science topic

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Questions related to Data Analysis
  • asked a question related to Data Analysis
Question
2 answers
I have collected my data in November 2021 for my phd thesis in banking given that it was collected after covid happened. I had to take a break last year due to personal circumstances and starting my data analysis now to submit this year. Is there any need for further data collection? Please suggest.
Relevant answer
Answer
The data can still still be used for analysis and any conclusion drawn should be in references to the period / time the data was collected.
In short, it all depends on your research questions, research problems & research objectives
  • asked a question related to Data Analysis
Question
6 answers
I've been trying to install Xcalibur in my office to analyze data. My computer is a Macmini M1 2020, up to date, using Parallels and Windows 11. I cannot make it work!
I have downloaded the .NET Framework, Java is up to date, everything...
I have Xcalibur in my Intel based MacPro, but with the M1 it is impossible so far.
Suggestions are appreciated!
Relevant answer
Answer
Thanks!!! I will try once more.... with your suggestion.
  • asked a question related to Data Analysis
Question
4 answers
Hi all,
I've done a CLF test and could you recommend what's the threshold value of a good CLF (if possible could you pls add some references)? I used '0.2' recommended by James Gaskin (on YouTube) and found 4 out of 17 exceeded 0.2 (ranged from 0.2-0.24), is it acceptable?
p.s. the test of common method variance is the last step of my data analysis, other tests all have good results. (preliminary analysis and SEM)
or do you recommend any other threshold value?
MANY THANKS!!!!!!!!!!!!!!
Relevant answer
Answer
so nice of you Sir
Thanks
  • asked a question related to Data Analysis
Question
3 answers
how to use data analytics to detect and mitigate fraud online
Relevant answer
Answer
Detecting and mitigating fraud requires two things:
1. Identifying normal patterns
2. Identifying deviations from the norm
The specifics will look different depending on what type of data you are looking to detect and mitigate fraud in. Let's take financial banking as an example. In this field, the size of the transaction is one of the largest indicators of fraud. Researchers have determined that the majority of transactions are relatively small. They have also determined that the majority of fraudulent transactions are large (given that it is more profitable for the fraudulent actor). Thus, one factor financial fraud mitigation algorithms look at is the size of the transaction, seeking to identify transactions that are outside of the normal range. Of course, there are hundreds of other factors that go into these algorithms as well. However, the premise of almost all fraud detection and mitigation can be broken down into those two steps.
In terms of the specific technologies used, Bayesian filters are very common. This is the form of algorithm used in email spam detection. Machine learning models are also very common, particular for detecting and mitigating fraud in big data.
  • asked a question related to Data Analysis
Question
2 answers
One of the most significant steps in solving multi criteria decision-making (MCDM) problems is the normalization of the decision matrix. The consideration for the normalization of the data in a judgment matrix is an essential step as it can influence the ranking list.
Is there any other normalization method for the "nominal-is-better" case besides the normalization that is possible through gray relational analysis (GRA)?
Relevant answer
Answer
Hi Adnan,
Here are a few common ones:
  1. Min-max normalization: This method involves scaling the data so that the minimum value is mapped to 0, and the maximum value is mapped to 1. All other values are then scaled proportionally between 0 and 1. This method assumes that all criteria have equal importance.
  2. Vector normalization: This method involves dividing each value in a row by the Euclidean length of the row vector. This method assumes that all criteria are of equal importance and that they are independent of each other.
  3. Standardization: This method involves transforming the data so that it has a mean of 0 and a standard deviation of 1. This method assumes that the criteria are normally distributed and that they are of equal importance.
  4. Logarithmic normalization: This method involves taking the logarithm of each value in the matrix. This method is useful when the data has a large range of values or when the values are highly skewed.
It's important to note that the choice of normalization method should depend on the nature of the data and the decision problem at hand.
  • asked a question related to Data Analysis
Question
3 answers
To measure sustainability and food security, what parameters need to be included while surveying urban agriculture? or is it possible to assess the data regarding the food security of a specific metropolitan area and its sustainability by a survey? If yes, what kind of data and models must be incorporated?
Relevant answer
Answer
you may include the other realted of parameters that have a main purpose for the urban agriculture in different measures .
  • asked a question related to Data Analysis
Question
7 answers
I am conducting a descriptive survey study with a population of less than 20 respondents. what is the right statistical tool for the data analysis?
Relevant answer
Answer
It depends you research questions and nature of the variables.Identifying these you can apply the appropriate non parameteric test. For each parameteric test there is a counter part non-parameteric. so, choice the appropriate one for you data type nature.
  • asked a question related to Data Analysis
Question
2 answers
I included a few secondary data analyses in my systematic review. As I am filling in my Prisma flow chart, I would like to know if these are considered unique studies. The data and inquiry proposed in these are different than those of the parent study, though the data used to answer the question is from previously done unique studies.
Relevant answer
Answer
Secondary data analysis is a research method that involves the use of existing data that was collected for a different purpose or by another researcher. While it may involve original research and analysis, secondary data analysis is generally not considered a unique study in the sense that the data being analyzed has already been collected and is not unique to the researcher conducting the analysis.
Instead, secondary data analysis is often used as a way to answer new research questions or test new hypotheses using existing data sources. It can be a valuable method for researchers who want to study a topic that would be difficult or impractical to study using primary data collection methods, or for researchers who want to compare their findings to those of previous studies.
That being said, secondary data analysis still requires careful planning, analysis, and interpretation, and can lead to novel and important findings. Therefore, while it may not be considered a unique study in the traditional sense, it is still a valuable research method that can contribute to the advancement of knowledge in a particular field.
  • asked a question related to Data Analysis
Question
3 answers
I have a set of data from three groups across 3 time points (2 groups are subjected to an intervention and the third acts as control). I want to evaluate intra and inter-group differences between these time points for a given variable. However, at baseline evaluation, one group data is significantly lower compared to the others. I thought maybe I could analyze data from a percentage standpoint, say first assessment is 0% and afterwards intragroup difference is 15% or whatever. I believe this is possible for intragroup analysis but I´m not sure if it is possible for inter-group analysis.
On the other hand, I could just analyze raw data instead of transforming it but this may lead to some issues that I´ll probably have to adress in the discussion.
Any idea on what can I do?
Thanks in advance!
Relevant answer
Answer
There are other choices in the same verin as Bruce suggested. I attached one I found useful. Best wishes David Booth
  • asked a question related to Data Analysis
Question
3 answers
Dear Sir
I'm using openair packages in R studio for air quality data analysis. Does anyone know if the polar map function in openairmaps package ( i.e. tools to create maps of air pollution data) works for the longitudes and latitudes of middle east countries?
I'll be so grateful if anyone could help me with this issue
Regards
Relevant answer
Answer
Yes like Omar said, it should work. Also, check your format for lon and lat are correct according to the manual.
  • asked a question related to Data Analysis
Question
8 answers
Ans?
Relevant answer
Answer
IMO R is the best. To get started with R see the book R for Everyone by Lander which contains research grade code. Best wishes David Booth PS there are many more books but I found this one to be best for me. It included download information too.
  • asked a question related to Data Analysis
Question
4 answers
How to deal with quantitative data? Which methods are there for data analysis?
Relevant answer
Answer
Hi. It would depend on the type of data you have and the sample size. There are many approaches depending on what you wish to find out. For example, you have frequency statistics that is for basic descriptive presentation. Then you have inferential statistics such as Correlations to determine if there are directly proportional relationships between variables. Factor analysis can reduce data to common factors. The more complex one is Structured Equation Modeling, and this is used to 'Test' a theory through your results.... these are just some. There are many more.
  • asked a question related to Data Analysis
Question
2 answers
In our study, it was recommended by our professors to set a deadline and if at least 20% of the total population has responded within the time allotted, we may proceed with data analysis and interpretation. However, we cannot find a reliable source to support this method.
Relevant answer
Answer
Results are liable to be highly biased. Those who respond and respond quickly may represent a stratum or subpopulation which may be quite different from the unknown population in general.
If you have good covariate data, you might use that. See "Comparing Alternatives for Estimation from Nonprobability Samples," by Richard Valliant, December 2019, Journal of Survey Statistics and Methodology 8(2),
DOI: 10.1093/jssam/smz003
However, that seems unlikely.
  • asked a question related to Data Analysis
Question
3 answers
I have tried to separate a direct coculture of MSCs (mesenchymal stromal cells) and macrophages to do bulk RNA seq on macrophages, as I want to find out how MSCs change the genetic expression on macrophages. I have tried different methods to separate the coculture as much possible, but I can only manage to retrieve a cell population with 95% macrophages, and 5% MSCs still present.
Therefore, I want to know if anyone has experience with analyzing data when the population is not completely pure with one cell type and how do I handle such data?
Is it wise to proceed with bulk RNA seq when 5% of my cells are still MSCs, well aware that the expressed genes observed could come from the 5% MSCs?
Relevant answer
Answer
Dear Kian,
have you tried improve your purity by FACS? It´s fairly easy to choose markers to distinguish MSC & macrophages and sort highly pure populations.
  • asked a question related to Data Analysis
Question
2 answers
Does anyone have experience with analyzing data from a survey based on the original questionnaire from TAM3, in Stata? What kind of analysis is relevant to do, so you can use it in the TAM3?
Allso, since some of the items are negative in its phrasing, do you recode it in Stata so that every item are positive?
Relevant answer
  • asked a question related to Data Analysis
Question
3 answers
Exploratory questions:
1.   Is there a relationship between empathy concern and fantasy?
2.   Is there a difference in general scores and task-specific scores (before and after reading)
3.   Do individuals who spend more time reading have a higher empathy score?
This is a correlational study, therefore would i need to use SPSS, and would it be a T-Test?
Also, would thier be an independent variable, if I'm looking for a difference and since I'm not changing any variables?
Thank you.
Relevant answer
Answer
Laerd Statistics is considered gold standard to assist test selection, test procedure and interpreting and reporting results. This resource uses SPSS and STATA etc
  • asked a question related to Data Analysis
Question
9 answers
Dear All,
I have results for the pH, Temperature, TDS, Total Solids, Salinity, Total Hardness, Ca-Hardness, Mg-Hardness, Turbidity, and EC of the water samples collected from the Ponds and borewells of my targeted area.
  • Please suggest to me how to analyze this data in a report format.
  • Can a Water Quality Index (WQI) be prepared using these parameters?
Thanks in Advance.
Relevant answer
Hello,
I just calculated the WQI number through pH, temperature, and TDS/TSS parameters in Microsoft Excel some days ago.
Would you like to try my formula in Excel file? I can send you through your email.
The Excel file is courtesy to my study in Water Quality Assessment from my University.
Thank you,
regards.
  • asked a question related to Data Analysis
Question
1 answer
I'm recently trying to perform an RNA seq data analysis and in 1st step, I faced a few questions in my mind, which I would like to understand. Please help to understand these questions.
1) In 1st image, raw data from NCBI-SRA have marked 1&2 at the ends of the reads, What is the meaning of this? are those meaning forward and reverse reads?
2) In the second image I was trying to perform trimmomatic with this data set. I chose "paired-end as a collection" but it does not take any input even though my data was there in "fastqsanger.gz" format. Why is that? Should I treat this paired-end data as single-end data while performing Trimmomatic?
3) in the 3rd and 4th images, I collected the same data from ENA where they give two separate files for 1 and 2 marked data in SRA. Then I tried to process them in Trimmomatic by using "Paired-end as individual dataset" and then run it. Trimmomatic gives me 4 files for those, Why is that? which one will be useful for alignment ??
A big thank you in advance :)
Relevant answer
Answer
I would highly recommend you that before directly jumping into using certain tools for the analysis, please try to understand the basics behind the data, data types & structures, whys and whats of data processing. And for galaxy, it is a very good platform with very good tutorials. Please go through tutorial before asking question which can easily be solved with minimal self inputs.
  • asked a question related to Data Analysis
Question
3 answers
I am detecting an additional band aside from the expected VP1, VP2, and VP3 bands. This is AAV9, and the extra band is appearing at 100-110 Kda. Ideas?
Relevant answer
Answer
It's very unlikely that you get such a strong defined band as a contaminant and nothing else (like the smear above) inbetween the three expected VP bands and this band. It could be an artifact of the gel system (dimerized or oligomerized VPs). Did you try with fresh DTT or b-ME in your SDS-PAGE loading buffer? Western Blot could show if it's VP or not.
  • asked a question related to Data Analysis
Question
1 answer
I have been working with MS data with a Bruker TIMS TOF machine, and on Bruker's Data Analysis software.
I would like to export the data out (e.g. .mzXML, or just .XML), but currently the Data Analysis software only allows doing this one spectrum at a time, so bulk exporting takes ages.
I know Data Analysis supports Visual Basic scripts, so perhaps a loop or other script may enable bulk exporting. Unfortunately I have absolutely no experience with Visual Basic, and have no idea how to even get started, even after reading the manual...
Has anyone out there already written a script to do this? Or have an idea/template of how to start?
Relevant answer
Answer
I have found a workaround - export the native ".d" file and use ProteoWizard's MSConvert to convert it to a ".mzxml" file that contains all spectra.
  • asked a question related to Data Analysis
Question
5 answers
Hi,
Kindly help me to understand when to use AMOS or SMART-PLS? for data analysis.
Regards,
Relevant answer
Answer
It depends on your aim: if you want to test a model and show a model fit, AMOS is the best option. If your aim is exploratory study and theory development, PLS SEM is the best option.
Please be careful, small sample size is a myth in PLS SEM and researchers always should justify the minimum required sample size.
Please see the book of Hair et al. 2017.
  • asked a question related to Data Analysis
Question
3 answers
I am currently working with a study titled "Knowledge, Competency, Adoptability and Sustainability of Artificial Intelligence (AI) Technology Among Physician Entrepreneurs in GCC Countries". I am in dilemma that which SEM model AMOS/SmartPLS will suit for the study. The total responses received so far is 220 out of the population size of 400. The main objective is to find out knowledge , competency, adoptability and sustainability of Artificial intelligence among Physician entrepreneurs.
I will be much thankful to you, if you can extend your kind feedback on this.
Regards
Dr.Sharfras Navas
Relevant answer
Answer
Sharfras Navas Based on the title of your study, either Amos or SmartPLS could be appropriate for analyzing the relationships between knowledge, competency, adoptability, sustainability, and AI technology among physician entrepreneurs in GCC countries. However, the choice between the two software programs depends on various factors such as the complexity of the research questions, the sample size, and the nature of the data. Amos is better suited for analyzing more complex models and larger datasets, whereas SmartPLS is suitable for smaller sample sizes and more exploratory research. It is recommended to consult with a SEM expert or a statistician to help determine which software would be most appropriate for your specific research study.
  • asked a question related to Data Analysis
Question
13 answers
I have data from Likert scale responses at pre and post test and having an experiment and control groups. I don't have any experience using ANCOVA and data from Likert scales. I have read many pdfs and posts, but I am not certain about whether ANCOVA can be used to compare means at pre and post test when the responses are not paired. Can I compare the means of control vs experiment group at pretest (posttest)?
Thank you in advance
Relevant answer
Answer
Reynaldo Senra you are not explicit here, but there are several models, which could be termed "ANCOVA", since they follow the same principle, but in all cases, you need to know which value of your DV is paired with your CV (and of course to which independent group this subject belongs). But to clarify the term "paired data":
1) In your case (when you would have the paring information) you have a paired repeated measure AND between measure (a split plot design). Here, you have the pairing in your focal DV, which has been measured twice (t1 and t2). In case of a random assigment to the between factor, an ANCOVA with t2 as DV, Group as between factor and t1 as CV is appropriate and should have more power as a Split-Plot ANOVA with t1-t2 as within factor and Group as between factor.
2) Imagine a pure factorial between design, where the possible CV is not the focal variable, e.g. 1. factor sex and 2. factor experimental condition. Possible covariables could be income or any other variable you are interested in "to control for" (the problem here is beyond the scope of this topic but look for directed acyclic graphs DAG, if interested). Here is no paired or repeated measures design itself, but you still need to know the pairing of the DV and the CV of course, they have to be paired to account for the (co-)variance of the CV.
3) It gets more complicated if you have the split plot design mentioned above with a within variable and a between variable, but as in case 2) you want to account for non focal CVs. You could analyse your t1-t2 plus between factor designed with a split plot ANOVA AND incorporate an additional non focal CV. I would not recommend this, since the interpretation may get complicated, but it is possible in princinple. It would be something like a split plot ANCOVA, with a reapeated measurs factor, a between factor and additional, non focal CVs.
Therefore, if you want to incorporate CVs, they always have to be "paired" with the DV, but this is not the same as a "paired" or "repeated measures" design. I hope this clarifys it a bit and is in accordance to what you read so far.
  • asked a question related to Data Analysis
Question
4 answers
I am currently working with a study titled "Knowledge, Competency, Adoptability and Sustainability of Artificial Intelligence (AI) Technology Among Physician Entrepreneurs in GCC Countries". I am in dilemma that which SEM model AMOS/SmartPLS will suit for the study. The total responses received so far is 220 out of the population size of 400. The main objective is to find out knowledge , competency, adoptability and sustainability of Artificial intelligence among Physician entrepreneurs.
I will be much thankful to you, if you can extend your kind feedback on this.
Regards
Dr.Sharfras Navas
Relevant answer
Answer
The choice of SEM model in Amos or SmartPLS depends on the research design, research questions, and hypotheses, as well as the type of variables involved in the study (e.g., continuous, categorical, ordinal).
Amos is commonly used for structural equation modeling (SEM) and can handle both confirmatory and exploratory models. It can also handle models with latent variables, and has a graphical user interface to specify models and inspect results.
SmartPLS is a software specifically designed for partial least squares (PLS) SEM. It is often used for predicting relationships between latent variables and measured variables in both small and large datasets. It can also handle models with formative and reflective constructs.
Both Amos and SmartPLS have their strengths and limitations, and the choice between the two will depend on the specifics of the research study. It is recommended to consult SEM literature and experts to determine the best fit for a particular study.
  • asked a question related to Data Analysis
Question
9 answers
Although the preprints submissions can be interesting to check the repercussions, criticisms and suggestions pointed out, I think that a not very good choice of platform can have limited results. Among the platforms researched, I observed that OSF, Scielo and Elsevier preprints offer a reasonable structure for preprints submissions and analysis. In this sense, I kindly ask for suggestions on which platform may be the most indicated for a preprint submission in the field of social sciences.
Relevant answer
Answer
https://psyarxiv.com/ is common for psych, but if your goal is to get a lot of readers, with any of these it will be helped by posting links on social media. As since we are on ResearchGate, you can post here too.
  • asked a question related to Data Analysis
Question
1 answer
I think such applications have limitations and data analysis by them is QUESTIONABLE?
Relevant answer
Answer
All of the programs you mention are very effective for marking and retrieving segments of text. Since those are the core activities in most forms of content analyses, they are useful tools for those goals. I think there are reasonable questions about the application of these programs in more interpretive approaches, but less so for content analysis.
  • asked a question related to Data Analysis
Question
3 answers
I used LCMS to detect metabolites and after a PCA plot was generated, I noticed that there were some sample outliers. What could be the cause of this?
Relevant answer
Answer
There are several causes for the presence of outlyers. Impurities, errors during the extraction, the needle that for some reason did not inject the same amount of sample, and samples that are naturally different from the other ones.
You can solve the problem using this checklist.
- check the look of the chromatograms and verify that all of them look more or less the same.
- check IS intensities and verify that their peak area is within an acceptable range for all the samples.
- normalize the data across samples.
- scale the data using UV scaling.
- evaluate the presence of outlyers using RobustPCA.
good luck
  • asked a question related to Data Analysis
Question
3 answers
I'm starting a discussion on a topic that has bothered me for some time. Many missiological research pieces, especially doctoral dissertations, use qualitative research methods. Mission scholars collect rare and precious data related to missions from various cultural settings around the world. We mostly do a good job of collecting, organizing, and perhaps summarizing the data, but it seems to me that when it comes to data analysis, we often fall flat. Our analyses tend to remain at descriptive level, often not going beyond mere descriptions and summaries. It is rare to find missiological qualitative inquiries that reach a more conceptual, interpretive stage of data analysis. There seem to be a variety of reasons for these, but I wonder if the curriculum and structure of PhD and DMiss programs are the primary reasons for the lack of depth in analysis?
I would welcome your inputs on the issue, the cause, and solutions, or your outright disagreement with my argument as well! Thanks!
Relevant answer
Answer
I agree as well. In my dissertation frankly the faculty was not entirely prepared to deal with a quantitatively analyzed thesis which actually required taking course work from another institution. There were not sufficient courses offered in statistics there to fulfill or meet the expected level of expertise offered to write such a thesis.
  • asked a question related to Data Analysis
Question
1 answer
Dear all I'm starting on fNIRS data analysis and I'm planning to use open source toolboxes. I found that Homer and Nirslab are the best for MATLAB and MNE for Python. Which one do you prefer and why? Best Lucas
Relevant answer
Answer
Open-Source fNIRS Tools
KARUNAMURTHY A
Open source fNIRS data analysis, Homer, NIRSLab or MNE?
All of the options you mentioned, Homer, NIRSLab, and MNE, are open-source tools for analyzing functional Near-Infrared Spectroscopy (fNIRS) data.
Homer (HyperOxford multivariate analysis of fNIRS data) is a software tool for processing, analyzing and visualizing fNIRS data.
NIRSLab is a MATLAB-based toolbox for processing and analyzing fNIRS data.
MNE (Meg/EEG and Neuromag) is a Python-based software for analyzing MEG, EEG, and fNIRS data, providing a comprehensive and unified framework for the analysis of neurophysiological data.
In terms of specific capabilities and suitability, it depends on your specific needs and preferences for data processing, analysis, and visualization. You may want to try out each of these tools and see which one works best for your particular use case.
  • asked a question related to Data Analysis
Question
4 answers
This discussion focuses on two questions:
1. How to technically teach the process of writing code and what are the implications of generative AI on this process?
2. Would you begin teaching data analysis with GUI tools (like excel or google spreadsheets) or dove straight into programming?
I am very interested in hearing your opinion
Relevant answer
Answer
I would start right away to teach data analysis using GNU R, without spreadsheets. There are good learning materials, as https://r4ds.had.co.nz/ (Tidyverse approach) or https://github.com/matloff/fasteR (base R). Generative AI is not relevant here in my opinion, first you need to understand data summarization, filtering, grouping, ploting, all the basics. Then statisitical inference. Maybe in your next courses it would be good for students to teach them SPSS / SAS / whatever is used in the industry. Spreadsheets are great for accounting tasks.
  • asked a question related to Data Analysis
Question
4 answers
I will be compare males and females, that are unemployed and employed or those who drink vs. those who don't in regards to there overall life satisfaction. I am undecided on what data analyse is best to use?
Relevant answer
Answer
As Christian Geiser mentioned, you can adopt two approaches: investigating variables separately or collectively.
For the former, you can use non-paametric tests as well (like Mann-Whitney test). However, Regression is the most common model for the second approach.
  • asked a question related to Data Analysis
Question
4 answers
Dear colleagues
Hi
I am a PhD Candidate in Microbiology.
would you kindly please to receive the practical details of PCR & qPCR methods in the form of training exercises in a useful and comprehensive way?
Thanks in Advance
Relevant answer
Hi Saliha,
Maybe the MIQE guidelines can help you with some information about PCR/qPCR procedures.
  • asked a question related to Data Analysis
Question
1 answer
  1. A latest Book holding details on every important aspect of Next Generation Sequencing and Genomic Data Analysis with newest available tools
  2. Book with examples in Command Line
Relevant answer
Answer
Have a look at this one which I've previously purchased:
  • asked a question related to Data Analysis
Question
5 answers
Hi everyone,
I prepared data prior to actual data analysis (CFA and SEM). First, I looked at the univariate outliers. To do that, I checked univariate outliers which belong to each sub-scale one by one through box plots and histograms. I diagnosed univariate outliers in box plots. I checked them. They ranged from the lowest to the highest values of sub-scales. For example, participants rate their answers from 1 (lowest) to 7 (highest). When I checked the outliers, all asterisk outliers range from the lowest and the highest values. There were not extreme or absurd values (e.g., 100) in the data.
Here, I have some questions.
1. Do you think should I remove these outliers? If I should, why I should remove these outliers?
2. How can I remove these outliers?
3. Should I remove them one by one or together?
4. Should I remove this outlier from all items or only the item which belongs to this outlier?
5. If I remove only the item which belongs to this outlier, this situation will cause no value or missing value, how can I deal with in this situation? Should I left it as missing value?
Best,
Relevant answer
Answer
Outlier is a fuzzy term and it helps to be clear what you mean. Personally I'd define outliers as observations that aren't from the distribution/population of interest. So checking for impossible values and data entries make sense. Some people define in terms of extremeness (which I think is not useful). Others in relation to a model - in which case outliers have extreme residuals rather than being extreme (its possible for an outlier in this sense to be too average/central rather than too extreme).
Generally there isn't much reason to remove outliers or potential outliers and I'd avoid it. If the problem is extremeness it may make sense to switch to a more robust model rather than delete data. More importantly I'd rather look at influence than the extremeness of a residual. Quantities like Cook's distance (other influence statistics are available) give an idea of whether an observation changes the parameter estimates in the model. Generally you want all Cook's distances to be below 0.5 and might worry if you have values above 1. In these cases it can be helpful to check the parameter estimates to see if the model and findings change much if you remove high influence observations. (I'd probably not delete these but report what happens in the alternative model as a potential limitation).
Quick summary:
1) Do check for anomalies/data errors
2) Consider whether there's a better model that fits the data with fewer extreme residuals/more reasonable assumptions
3) Look at influence rather than extreme values
4) Avoid dropping data but consider reporting alternative analyses with/without extreme/influential observations
  • asked a question related to Data Analysis
Question
4 answers
I will measure the statistical significance of different treatment groups exposed to certain toxins and also see their deputation kinetics/ pharmacokinetics model/ log regeetion line diagram and the graphs mentioned in the question. How can I start ? which is the best way. which books, site, tutorials etc can help me to learn and do that. I have no previous coding experience. Please see the pictures of the graphs I want to produce like.
Relevant answer
Answer
To analyze data and create various types of plots in Python, you can use a variety of libraries and tools. Here are a few options to consider:
  1. Matplotlib: Matplotlib is a widely-used library for creating static plots in Python. It provides many options for customizing the appearance of the plots and for adding features such as regression lines. You can use Matplotlib to create boxplots, whisker plots, and other types of plots.
  2. Seaborn: Seaborn is a library built on top of Matplotlib that provides additional functionality for creating and customizing plots. Seaborn is particularly good for creating statistical plots and visualizing relationships between variables. It also makes it easy to create plots with regression lines.
  3. Pandas: Pandas is a library that provides data structures and data analysis tools for Python. It includes functions for creating plots, such as boxplots and histograms, directly from dataframes.
  4. Plotly: Plotly is a library for creating interactive plots in Python. It can be used to create a wide range of plots, including boxplots, whisker plots, and regression plots.
  • asked a question related to Data Analysis
Question
1 answer
Long story short--the Chromeleon software I am currently using is old (and on an old computer), which is making it buggy and difficult to actually analyze data and generate attractive chromatograph overlays.
I would like to use OriginLab; however, I don't really see any easy way to do this. I figured out how to bulk data input; however, it seems that I would have to write a bunch of very long IF() functions to incorporate information from fit equations of standard curves and identify/label peaks based on retention time windows.
Is there a more reasonable way to do this such as built in app or standardized workflow similar to that included in Chromeleon?
Relevant answer
Answer
Update chromelon software or get originlab are your two best choices it seems to me David Booth
  • asked a question related to Data Analysis
Question
6 answers
Hello,
I recently have started in a new position analyzing seagrass and other submerged aquatic vegetation (SAV) within an aquatic preserve. Our data is collected using a modified Braun-Blanquet (BB) score where if a species is present in our quadrat we give it a score of1 (less than 5%), 2 (5-25%), 3 (25-50%), 4 (50-75%), and 5 (75-100%). Historically they had been analyzing this data using average BB scores and percent occurrence. In each system we collect four quadrats from 25 sites and then the system as a whole is analyzed. I have not used BB methods before so I was looking into ways to analyze and learned basically that averaging these scores is not a good method for analyzing BB scores. Does anyone have any suggestions or insight on the best way to analyze this data? I've been reading from many different sources that all suggest different methods. I was wondering if anyone can help point me in the right direction for starting out, we have large datasets with 5 species of seagrasses and multiple species of algae. Any advice or suggested reading is highly appreciated.
Thank you
  • asked a question related to Data Analysis
Question
1 answer
Phenomenology. Van Manen method of data analysis
Relevant answer
Answer
There are several research question and response forums that may be of interest to you, particularly if you are interested in phenomenology and Van Manen's method of analysis. Here are a few options that you might consider:
  1. The Qualitative Research Forum: This forum is hosted by the Journal of Phenomenological Psychology, and it provides a platform for researchers to discuss qualitative research methods, including phenomenological approaches.
  2. Phenomenology and Qualitative Research: This forum is hosted by the Journal of Phenomenological Psychology, and it focuses specifically on qualitative research methods and techniques in the field of psychology, including phenomenological approaches.
  3. ResearchGate: This is a general research forum that allows researchers to ask and answer questions, share research findings, and connect with other researchers. You can search for specific topics or methods, such as "phenomenology" or "Van Manen method," to find discussions that are relevant to your interests.
  4. LinkedIn: LinkedIn is a professional networking site that allows researchers to connect with one another and engage in discussions about research topics and methods. You can search for groups or communities that are focused on qualitative research, phenomenology, or Van Manen's method to find discussions that may be of interest to you.
  5. Reddit: Reddit is a general discussion forum that has many different subforums or "subreddits" on a wide range of topics. You can search for subreddits that are relevant to your research interests, such as "qualitative research" or "phenomenology," to find discussions and ask questions of other researchers.
It is worth noting that while these forums can be a useful source of information and guidance, it is important to carefully evaluate the quality and credibility of the information that you find, as not all discussions and responses may be based on rigorous research or reliable sources.
  • asked a question related to Data Analysis
Question
12 answers
How to choose SEM-AMOS or SEM-SmartPLS for data analysis?
are both of them acceptable international?
Relevant answer
  • asked a question related to Data Analysis
Question
6 answers
Chi-Square Tests
Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 125.476a 4 .000
Likelihood Ratio 107.862 4 .000
Linear-by-Linear Association 53.751 1 .000
N of Valid Cases 247
a 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.23.
Relevant answer
Answer
As I assume you know, with such a large value you have to reject the assumption of randomness (if you cared to test it).
Regarding any errors, only you can verify them by carefully reviewing the data matrix.
  • asked a question related to Data Analysis
Question
5 answers
Greeting
im a bit confused of how we select the best statistical method to start my data analysis ? When can use ANCOVa or ANOVA? And how i can do the ANCOVA by prism ?thanks for replying in advance
Relevant answer
Answer
One typically would use an ANCOVA when one has a continuous independent variable, along with the categorical independent variable. For example, say you are studying the effect of a Drug upon recovery of patients. With an ANOVA you can study the recovery with or without the Drug. If, in addition, you have information about say about the Age of your patients, you can run an ANCOVA to study the recovery patients "controling for" Age.
  • asked a question related to Data Analysis
Question
4 answers
Enlist all with some relevant examples/links, Thank you!
Relevant answer
Answer
An advanced data analysis method for time-to-event data (e.g., response times, saccade latencies, fixation durations, etc.) that replaces ANOVA on mean RTs, is a longitudinal or distributional method known as event history analysis, survival analysis, hazard analysis, transition analysis, and duration analysis:
  • asked a question related to Data Analysis
Question
4 answers
Hi dear all
im new in this nvivo and i want to use it in a quanitative statistical data analysis ? And also using it on making litriture?
here is my question is there a possibilites to use nvivo in such ways? Thanks
Relevant answer
Answer
Yes, qualitative software programs such as NVivo are frequently used in constructing literature reviews. In that case, each item in the literature review its treated as a document and and you can apply codes, just as you would with any text document. But NVivo would be expensive for this purpose, so you might take a look at either Mendeley or Zotero.
NVivo can create a schematic diagram that links any set of coded texts, but if all you want is a "concept map," then there are any number of free programs that will do that.
  • asked a question related to Data Analysis
Question
3 answers
Hi, I kindly need your help, I am doing research on Experiential Marketing, I am using a structured qualitative survey questionnaire. How would my methodology and data analysis process be since due to this Covid 19 era, my school has discontinued the use of face to face interview for this year's thesis. Thank you
Relevant answer
Answer
It is possible to collect the qualitative data from the non-sampled respondents like Key Informants Interview and Focus Group Discussion in the open environment of the villages or concern settings where these respondents have very indepth knowledge about the historical events and to recall quickly while asking the questions.
Having collected those data from these groups of people that you can enter the data in Atlas Ti-9 software will help you to bring the expected results for your research.
All you need to have a good practice as well as good rapport with those community could help you to get the validity and reliability of the research.
  • asked a question related to Data Analysis
Question
1 answer
Hello, I am currently analyzing data on insect counts. I am comparing insect orders (counts) across four seasons and three elevations collected using three different methods. My primary question is "How do insect order counts vary with season and elevation?" I am using 'glmmadmb' function in R to fit a model. I came across so many combinations and tried to choose the best model based on AIC, BIC, overdispersion, and log-likelihood values. However, I am confused if the model indicated by these parameters is appropriate or not. For e.g. the best model indicated is:
mod <- glmmadmb(counts ~ season+(elevation|order), family="nbinom", data =x)
But I believe mod1 should be used:
mod1 <- glmmadmb(counts ~ season+elevation+(1|order), family="nbinom", data =x)
Further, what if I want to incorporate the variable "method of the collection" into the model? Would it be something like this?
mod3 <- glmmadmb(counts ~ season+elevation+(1|order)+(1|method), family="nbinom", data =x) or even more complicated?
After reading so many papers, I am confused about the various combinations of random effects like e|g; (1|e)+(1|g); (1|e/g); (0|e/g) etc.
I have spent days trying to figure this out. The more I read, the more confused I am. Any help would be highly appreciated.
Relevant answer
Answer
You are using count data so I suggest that you read the attached R information. I'm also attaching a paper of ours that uses similar methods for binomial data that may be of some help to you. If you have questions please ask. Best wishes David Booth
  • asked a question related to Data Analysis
Question
3 answers
Hi guys,
I raised a question when conducting the Two-Sample Kolmogorov-Smirnov Test. So, I have two sets of data, and I'd like to compare their similarity in distribution. After I ran the K-S test, I got a P < 0.001, which reject the hypothesis and suggests two datasets are not similar in distribution (please see attached the figure). However, I also got a D = 0.121 from the result. As far as I know, when D close to 1, it indicates two datasets have different distributions, and when D close to 0, it indicates two datasets have similar distributions. When looking at this very low D value, does it mean the cumulative distributions between two datasets are actually similar? It seems controversial.
So, how should I interpret the result? And is it reasonable to use the D value to quantify the similarity between these two datasets?
Thanks in advance
Relevant answer
Answer
Usted esta comparando dos grupos de datos en cuanto a normalidad , usando el esatdistico de K-s , kolmogorov smirnov, , primero deben ser muestras grandes, mayores que 30, vamos a las hipotesis:
ho: las observaciones son normales
h1: las observaciones no son normales
sin hablar de alfa ni de los otros pasos que faltan en el test, si ambos grupos de datos son normales me quedo con el mayor p-valor , siendo ambos mayores que alfa, es decir, ambos p-valores deben ser mayores que alfa, para no rechazar ho, y luego me quedo con el set de datos que posea mayor p-valor, por que !!!, bueno esto lleva otro analisis mas profundo !!!!, pero sencillo, se lo puedo explicar , digame usted si desea la explicación !!!
  • asked a question related to Data Analysis
Question
4 answers
Hi there, I'm so appreciated you pay attentions to this question.
Nowadays, it is quite common that we consider masking or padding when dealing with the data including missing values on sequential data on RNN (or perhaps on image data on CNN).
However, I have not found the paper to propose/establish the exact method.
The reason why I'm looking for it is that I would like to learn what exact happens in RNN to LSTM layer when it is masked as 'Skip the input' .
If you know of any, please let me know.
Thanks from Japan.
---I would make a few additions---
When reading about masking technique, we can often see the descriptions like “ignore the missing value” or “skip the input” with masking. On the other hand, there are few references in the literature that explain in mathematical formulas the skipping of input missing values.
Then, I wonder what exactly happens inside the RNN layer when input is masked.
As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.
And, if the input is masked to be ignored, how is z^t calculated?
Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?
Or, is z^t not calculated and exported as NaN value?
I'm sorry but I'm not looking for an imputation method but the mechanism inside RNN or LSTM when masked to ignore the input.
Again, thank you from Japan.
Relevant answer
Answer
Thank you all for responding to my question.
I would make a few additions:
When reading about masking technique, we can often see the descriptions like “ignore the missing value” or “skip the input” with masking. On the other hand, there are few references in the literature that explain in mathematical formulas the skipping of input missing values.
Then, I wonder what exactly happens inside the RNN layer when input is masked.
As you can see in the attached image of the formula, RNN state z^t in timestep t can be expressed with the input x^t, recurrent input z^{t-1}, and the wights W.
And, if the input is masked to be ignored, how is z^t calculated?
Is z^t calculated with imputing x^t (like, same values in previous timestep is inputted) ?
Or, is z^t not calculated and exported as NaN value?
Thank you very much.
  • asked a question related to Data Analysis
Question
6 answers
Both the software is used for data Analysis
Relevant answer
Answer
Such questions don't make much sense.
Each tool or software has its pro and its cons, and it always depends on the special kind of problems you try to use a tool for. Generally, it does not harm to know a couple of tools to select the one that is best suited for a particular problem.
What is more relevant to a beginner to choose a (first) programming language is which language is typically used in the field and if there are accessible resources and collegues to get help from.
After knowing one programming language well, you start understanding the programming principles, you start thinking like a programmer, what considerably fascilitates learning another language.
  • asked a question related to Data Analysis
Question
3 answers
Hi, everyone. i am calculating NRI and IDI using STATA. I want to compare the discrimination ability of two seperate models (Model A and Model B). I think the nri program of stata seems only to calculate NRI that reflects the discrimination between a base model and the  model in which a new marker  is added to the base model. Now, how should i calculate a NRI that reflects the discrimination of two models which includes different variates.
Thanks in advance!
Relevant answer
Answer
Would you please tell us, how did you calculate nri and idi in stata?
  • asked a question related to Data Analysis
Question
1 answer
Can anyone support us in doing Balanced isometric log ratios for the geochemical data of marine phosphorite.
Relevant answer
Answer
Dear Ambil,
Do you mean you have some data and you need to transform the data using ilr-transformation? If so please read the following post, please
You need to install the package "comopisitons" in R software.
Any further question is welcomed.
Good luck,
Hamid
  • asked a question related to Data Analysis
Question
5 answers
I mean including the selection weights of the statistical units in the data analysis;
When the sample is probability and stratified. It is necessary to consider the weight of each clustering during the analyses, the regressions, and the calculation of the errors, to obtain statistics that are representative of the population. Otherwise, the results are all biased.
I haven't seen an article that deals with this in any detail.
In fact, I have here the data collected on 16 countries by two different institutions last year. They used probability and stratified sampling for this survey. But they did not specify the weights of the strata, nor did they document the methodology. These data need to be merged with the data collected this year to move forward with the work.
Relevant answer
Answer
Yes of course, you have to use the proper weights. In my regression software this is done automatically. Just enter your measurements and their standard deviations (or any other estimation of the uncertainty).
  • asked a question related to Data Analysis
Question
5 answers
what do you think about the distribution above. Is it normal?
Relevant answer
Answer
According to Kolmogorov-Smirnov test and also the shap of plot. I think yes.
  • asked a question related to Data Analysis
Question
5 answers
Hello all,
I am a graduate student preparing my research proposal on this project idea that I would share below.
I would like to conduct a research on an incumbent company facing threats to their business by new entrants into the market as well as other established competitors.
My question is, what kind of data should I be looking to get for this research topic as well as how to analyze the data??
Any help would be well appreciated.
Thank You
Relevant answer
Answer
Conduct a literature review:
  • To familiarize yourself with the current state of knowledge on your topic
  • To ensure that you're not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic
Following are a few acceptable sources for literature reviews:
1. Peer reviewed journal articles.
2. Edited academic books.
3. Articles in professional journals.
4. Statistical data from government websites.
5. Website material from professional associations
  • asked a question related to Data Analysis
Question
6 answers
Dear, Im having trouble with analyzing data using two-way repeat-measured ANOVA. The reason that I use this technique is that my experiment measure cell survival of the the treated cell with different concentrations of drug at 3 time points, continually. The experiment was repeated for 5 times. Therefore, the time of incubation is the dependent factor since I measured cell survival continually. I found that in some of cell which were treated with one concentration of drug at one time point, the data from 5 experiments aren't distributed normally. I have been trying to find the alternative method like Scheirer-Ray-Hare test but the assumption of this test doens't include data dependency. Moreover, I find the discussion about this problem or similar that some assumption can be violated by ANOVA but those are quite controversial and confusing. Im very novice to statistics. Any suggestion would be humbly appreciated :')
Relevant answer
Answer
Have you tried using logariths or a gamma- or quasi-Poisson GLM?
  • asked a question related to Data Analysis
Question
5 answers
Hello, I'm currently working on my data analysis but I'm not sure what statistical test to use.
My research objective is: To determine the effect of age, gender, and GPA on the work readiness of graduating students
My hypotheses are:
  • H1: Age significantly affects students' work readiness
  • H2: Gender significantly affects students' work readiness
  • H3: GPA significantly affects students' work readiness
In my study, work readiness is measured through a Likert-scale instrument (from 1 to 5), and I'll derive the mean scores to interpret work readiness.
Relevant answer
Answer
If your the IV have sub groups e.g.
Age = in range (14-16, 17-19, 20-22)
Gender = m x f
GPA in range (1.5-2.00, 2.1-2.5, 2.6-3.00, 3.1-3.5, 3.6-4.00, 4.1-4.5, 4.6 and above)
Then you use 3 way ANOVA
But if there are no groups, you can use linear multiply regression
  • asked a question related to Data Analysis
Question
3 answers
My lab developed a method for analysis in the LC-MS using the qualitative Agilent mass hunter software; however, we find it very hard to generate reports with it, especially when it comes to find concentrations of our analyte in the samples. I want to start using the quantitative software, but I do not know if there is a way to use the method already used for the quantitative software or if I would need to create one from zero.
Relevant answer
Answer
I would say you must build a new quantitative method in "MS Quantitative Analysis" software. It takes some time to build a method, however, if you do it once then you'll find it easy to do it another time.
  • asked a question related to Data Analysis
Question
6 answers
Qualitative data indicates interviews, open-ended questions etc
Relevant answer
Answer
Analysis tool NVIVO is recommended...thematic analysis or IPA is also used for qualitative data analysis
  • asked a question related to Data Analysis
Question
3 answers
Dear Colleagues and researchers
I'm working on a longitudinal study in which some distinct variables and constructs are being investigated simultaneously. I have a challenge with how to explore the relationship between or among them. I'll be grateful if anybody could help me to design the framework of my study and how I can test the expected relations, and also which kind of statistical tests and analytic procedures such as cluster analysis, multi-categorical multiple mediation analysis, or ..... I should use to be able to explore those relations.
It is highly appreciated if anybody who is an expert in research design (preferably EFL) or statistics can be a help
Sincerely Yours
Relevant answer
Answer
longitudinal studies are conducted over reasonably long period of time. For example, in medicines and testing forecasting models. These are conducted when on spot or single time measurement of data not relevant like routine commercial surveys or experiments on some variables in laboratories where lapse of time is not essential to derive conclusions.
Long term side effects of medicines are one such thing. Another is creating weather forecasting models which need reasonable long-time data. Si what you have to do is change various forms/levels of control variables in your study over, say, a semester. In such studies control variables are essential to identify. Please first research literature to see what relations ships of various variables is exist so you can set up constructs. More can advise after seeing your further inputs
  • asked a question related to Data Analysis
Question
8 answers
I used three data analysis methods (Descriptive statistics, ANOVA, and linear regression in my research. Can I say I used mixed methods analysis?
Relevant answer
Answer
Ok, let's say that some statistics obtained from regression models are sometimes used to describe a sample... (the sample mean being the most frequently used). If you also include quantile regression, all sample quantiles and derived quantities may also be called statistics from regression models.
  • asked a question related to Data Analysis
Question
4 answers
Origin Lab is such a great tool for data analysis and plotting graphs. Why has not it attracted interest to the software developers to develop a mac version so that we don't have to go through tedious and less efficient way of using Virtualization Software or Boot Camp?
Relevant answer
Answer
I guess origin is working on it . Now you have a simple origin tool where you can edit already existing .org projects in MAC( can be downloaded for free from origin labs website )
Alternatively you can try using Qt Grace .
  • asked a question related to Data Analysis
Question
6 answers
In the JBI checklist for critical appraisal of prevalence studies, How can I know that data analysis of a study paper was conducted with sufficient coverage of the identified sample?
I am looking forward to the answer.
Thank you!
Relevant answer
Answer
If we use a quantitative study, the sample size can be determined by using a certain calculation procedure taking into account the sampling error, the significance level and also the confidence level. We usually choose to use a sampling error of 5%, (significance level (α)- .05) with a confidence level of 95%. Usually to determine the sample size, we can refer to the sample size determination table that has been prepared such as the table of Krejcie & Morgan (1970) and the table of Cohen et al. (2001).
  • asked a question related to Data Analysis
Question
20 answers
For making data analysis like logistics regression, linear regression etc which softwares is easily available and free to analyse.
Relevant answer
  • asked a question related to Data Analysis
Question
2 answers
Dear connections
Recently I was analyzing data in AMOS. While calculating reliability and validity, the values of AVE for a few constructs were less than 0.50, and CR was less than 0.70. These values come after eliminating indicators that were reporting low factor loadings. I was wondering if I could report the AVE, which was 0.469, and CR was 0.689. However, I was afraid to report these values due to a lack of evidence.
So can we report these values in the model? If yes, then can you share any paper supporting the findings?
Regards
Dr. Bhuvanesh
Relevant answer
Answer
In my opinion, there's nothing that prohibits you from reporting values that fall below some "magical" threshold. Most of such cut-off values are more or less arbitrary anyway. "Negative" findings can be just as important as "positive" findings when it comes to scale evaluation.
What I see as more important is to figure out and explain in the paper why you are getting suboptimal values for your measurements. Perhaps the results indicate that the measures/scales are rather heterogeneous and/or multidimensional or that some of the items should be revised, dropped, or replaced.
  • asked a question related to Data Analysis
Question
4 answers
Hello researchers
Please suggest me any freeware for data analysis.
Currently i am using originlab software , with 21 days license. Any other software please suggest me
Relevant answer
Fellow researcher,
If you allow me the undeserved familiarity in my speech. Duuude, phyton and R are the best data analysis tools there is and both are free and have (freely available) tons of awesome tutorials.
  • asked a question related to Data Analysis
Question
9 answers
Dear colleagues,
What can you suggest in order to fill the missing values, as you can see in the picture?
Best
Ibrahim
Relevant answer
Answer
There are 2 primary ways of handling missing values: Deleting the Missing values. Imputing the Missing Values.
  • asked a question related to Data Analysis
Question
3 answers
I have bachelors in Software Engineering and I am interested in Data Analytics. Now I am applying for a master's degree program in Canada, but I am unable to find any good research problem in data analytics or data science.
Actually, I want to use data from some telecom industries and work in this field but I couldn't find any specific problem statement. Secondly, I have some raw ideas for combining the big data analytics field with IoT or networking (correct me if I am wrong, am not so sure)
Kindly suggest to me some good research topics in this regard.
Thanks in advance.
Relevant answer
Answer
Dear Iqra - Imtiaz,
Taking into account your research interests and scientific specialization, the discipline of your master's studies, I propose the following research topic for your thesis: Improving analytical methods based on Big Data Analytics used to analyze trends and forecast future behavior of citizens using the services of companies operating in the telecommunications industry and key information services available, among others on smartphones with Internet access. As part of this topic, you can use Internet user sentiment analysis in any specific field of Internet users' interests in combination with the research method based on Big Data Analytics and possibly also with the use of some other and available Industry 4.0 technologies. I described how to do it in my articles with the concept of Big Data Analytics in the title. These articles are available on my profile of this Research Gate portal. In addition, under the proposed topic, you can also explore the use of individual types of Industry 4.0 technologies, technologies typical of the current fourth technological revolution, including Data Science, artificial intelligence, machine learning, personal and industrial Internet of Things, cloud computing, web applications operating as robots, horizontal and vertical data system integration, multi-criteria simulation models, digital twins, additive manufacturing, Blockchain, smart technologies, cybersecurity instruments, Virtual and Augmented Reality, 5G, etc. in the field of improving Internet information services available from the level of a smartphone equipped with Internet access. On the basis of this type of research, you can point to those Industry 4.0 technologies that are likely to continue to be used in the next few years to improve Internet information services offerings. As part of this topic, you can also study the issue of development prospects for the continuation of the process of improving analytical methods based on Big Data Analytics with the help of the above-mentioned Industry 4.0 technologies.
Kind regards,
Dariusz Prokopowicz
  • asked a question related to Data Analysis
Question
5 answers
Dear scholars,
I have a database with 100 geolocated samples in a given area, each sample contains 38 chemical elements that were quantified.
Some of these samples contain values Below the Detection Level of the instrument (BDL), clearly, when we have 100% of the samples with BDL values there is not much to do, but what can be done when, for example, when there is only 20% BDL, what do we do with them, with what value do we replace a BDL sample?
Some papers show that a BDL sample can be replaced by the detection level (for the instrument's minimum detection level for that chemical element) divided by 0.25, others show that you have to divide it by 0.5... What would you do in each case, and is there any literature you would recommend? If it matters, I am mostly interested in Copper and Arsenic.
Regards
Relevant answer
Answer
What fraction of values below BDL is acceptable.
Why are you making the measurements? The why determines what is acceptable.
If you are concerned about an upper limit, then BDLs are of no concern.
If you are concerned about a lower limit, it will depend upon the nature of your concern.
There is no recommendation. There is no rule of thumb.
You decide from the criteria associated with WHY if you have enough information.
Too many BDLs might mean you need a different technique, but it always returns to WHY.
If , say, a customer wants an answer at each location, use the actual result and note the uncertainty. The result is usually meaningless, because of the high uncertainty.
  • asked a question related to Data Analysis
Question
3 answers
Dear scholars,
I have a database with 100 geolocated samples in a given area, each sample contains 38 chemical elements that were quantified.
Some of these samples contain values Below the Detection Level of the instrument (BDL), clearly, when we have 100% of the samples with BDL values there is not much to do, but what can be done when, for example, when there is only 20% BDL, what do we do with them, with what value do we replace a BDL sample?
Some papers show that a BDL sample can be replaced by the detection level (for the instrument's minimum detection level for that chemical element) divided by 0.25, others show that you have to divide it by 0.5... What would you do in each case, and is there any literature you would recommend? If it matters, I am mostly interested in Copper and Arsenic.
Regards
Relevant answer
Answer
You don't give a research question so I would simply report exactly what you said in full detail. You may want also to give information on the detection limits of the methods you used. Just let Prof Kan know detection limits are determined by Nature's own chemistry and not by personal choice. Best wishes David Booth
  • asked a question related to Data Analysis
Question
2 answers
Hi, I performed a ChIP-Seq experiment of potato samples! The sequencing company did the library preparation, sequencing and data analysis. In the result I found too many target genes. When I looked at the peaks in the genome browser using the relevant files received from the company, I noticed that there are too many peaks distributed throughout the genomic region that resulted in too many targets, and hence I am not able to decide which ones to trust! Further, peaks are distributed everywhere including the intron, exons, and intergenic regions (which could be true). But I am wondering why do I get high number of peaks in the intron-exon and coding regions of the genes? And also what could be reason of getting too many peaks? The peaks in the immunoprecipitated DNA samples has been normalised with the control input DNA samples. I am attaching a picture for the reference! In the picture, middle sample peak is for the input control and other two row peaks are for two independent IP experiments.
I will appreciate your suggestions!
Relevant answer
Answer
Hi Mathew, thank you for your suggestions! Yes, I used the Diagenode biorupture sonicator. I will go through your suggestions! Since I am a molecular biologist it takes some time to me to understand the bioinformatics related stuffs!
  • asked a question related to Data Analysis
Question
3 answers
Qualitative research has its own benefits and issues. So I want to know about Any online workshop or course for qualitative research, and data analysis.
Relevant answer
Answer
This list server has periodic announcements about online courses in qualitative research:
  • asked a question related to Data Analysis
Question
2 answers
I would need a (tabular, i.e. not imaging or text) dataset with a hierarchically structured outcome to use as an example dataset in a new R package (but the dataset can be of any format, e.g. txt, csv or arff). It should be single-label and tree-structured, e.g. first level: classes 1, ..., 4, second level: 1.1, 1.2, 1.3, 2.1,2.2, third level: 1.1.1, 1.1.2, 1.2.1, 1.2.2, 1.2.3, 1.3.1, 1.3.2., ... .
Relevant answer
Answer
Thank you, Amogh Shukla.
While I appreciate the large EdNet hierarchical dataset, it is too complex for my use case (example dataset for R package). I would need a simple tabular dataset (i.e., observations/units in rows and variables/features in columns).
Hierarchical *clustering* datasets are not suitable unfortunately because I'm interested in hierarchical *classification* and clustering is an unsupervised methodology without an outcome.
Actually, the stackoverflow question you cite is by me, but the answer given there does not help unfortunately because, with the data in that answer, the outcome is not hierarchical, only the covariates are.
For now, I use a simulated dataset from the collection given in https://github.com/jona2510/ADforHC . However, it would be better to have a real dataset as an example dataset.
  • asked a question related to Data Analysis
Question
3 answers
Hi,
I am conducting a public questionnaire asking about people's decision-making. The dependent variable is binary, whether choosing to do action X or not. I
In the questionnaire, I asked two major questions about the reasons for choosing to do action X or, if they did not do that, the possible reason as well. I used a five-degree Likert scale.
Questions such as, I did action X because:
  • o 1
  • o 2
  • o …
And I did NOT do action X because:
  • o 1
  • o 2
  • o …
What is the most appropriate method to analyze the data? Can I use binary logistic regression for this data set?
Relevant answer
Answer
yes, I would have thought logistic regression, but I'm testing my memory here as it is over a decade since I did something similar