Dataset - Science topic

How to read JRA 55 data? i have downloaded the datsets, but I can't open it ?

4 Recommendations

Sujan Bhattarai

asked a question related to Dataset

Question

4 answers

Aug 13, 2022

tried using different computer programs( notepad, excel.. ) , could not open it. Using the climate datasets first time, and I am literally confused to how to access the data.

Relevant answer

Sujan Bhattarai

Aug 18, 2022

Answer

Chukwudi Samuel Ekwezuo Hi

The JRA-55 data I have is actually in GRIB format (most probably). The file has no extension, but the JRA 55 documentation mentions that the data is in GRIB format.

Where to get a free standard 3D face dataset?

5 Recommendations

Araoluwa Filani

asked a question related to Dataset

Question

6 answers

Mar 4, 2020

Where to get a free standard 3D face dataset that is available for download for research purpose?

Relevant answer

Jaafar Rashid

Aug 17, 2022

Answer

BU-3DFE

What is the beneficial of adding code in paper with testing in some datasets from the largest available datasets in number of citations?

0 Recommendations

Osman Ali Sadek Ibrahim

asked a question related to Dataset

Question

2 answers

Aug 17, 2022

Al-Salamo Alikom;

What is the beneficial of adding code for research reproducible in paper with testing in some datasets from the largest available datasets in number of citations for researcher and journal ?

Kind regards,

Osman

Relevant answer

Osman Ali Sadek Ibrahim

Aug 17, 2022

Answer

According to No Free Lunch Theorem, there is no dominant algorithm in all dataset. Thus, making the code available in research paper with reproducible capability for several other research and other researcher point-of-view will increase the number of citation of the original paper and the rank of the journal

Reference Evapotranspiration Dataset for Malaysia?

8 Recommendations

Daniel Simonet

asked a question related to Dataset

Question

4 answers

Jul 22, 2022

Good morning,

Can anyone suggest a dataset presenting historic reference evapotranspiration in the different provinces?

Thanks a lot !

Relevant answer

Mohamed Elbessa

Aug 15, 2022

Answer

https://earlywarning.usgs.gov/fews/product/460

When doing a mean centre for moderation analysis in AMOS, do I use my original dataset or do I use the one after I have removed items with low factor?

0 Recommendations

Winifred Soribe

asked a question related to Dataset

Question

4 answers

Aug 12, 2022

Hello, I am currently conducting a moderated mediation analysis in AMOS and want to mean centre my IV and moderator. To calculate the mean, do I use the original dataset or the dataset where I have removed some items with low factor loadings.

Thank you.

Relevant answer

Winifred Soribe

Aug 15, 2022

Answer

Thank you all very much.

0 Recommendations

Where to download skin disease image dataset for diagnosis using machine learning?

asked a question related to Dataset

Question

12 answers

Jul 12, 2022

skin disease image dataset

Relevant answer

Aysha Naseer

Aug 13, 2022

Answer

www.kaggle.com

www.ieee-dataport.org

0 Recommendations

How to make google colab pro run faster with image dataset. I am already using with gdrive but it is too slow?

asked a question related to Dataset

Question

2 answers

Aug 4, 2022

How to make google colab pro run faster with image dataset. I am already using with google-drive but it is too slow?

Relevant answer

Ashutosh Karna

Aug 12, 2022

Answer

Ajay Krishan Gairola Are you getting slower performance when you train the model or when you load the data? Perhaps you should use Tensorflow 2.0 data pipeline which is specifically optimized for such scenarios.

0 Recommendations

What are the current research topics in medical image analysis with datasets?

asked a question related to Dataset

Question

5 answers

Jul 12, 2022

medical image analysis problems with datasets

Relevant answer

Shima Shafiee

Aug 12, 2022

Answer

Computer Vision. Discovery Radiomics. Evolutionary Deep Intelligence. Image Segmentation/Classification

General health and scientific research

NLM's MedPix. A free online Medical Image Database with over 59,000 indexed and curated images from over 12,000 patients.
The Cancer Imaging Archive (TCIA) ...
Re3Data. ...
V7 COVID-19 X-Ray dataset. ...
COVID-19 image dataset. ...
COVID-19 CT scans. ...
CT Medical Images. ...
Deep Lesion.
http://www.search.ask.com/?gct=hp&o=APN10644A&qrsc=2871&l=dis&sver=3&apn_ptnrs=^AG5&dateOfInstall=2014-11-24&d=533_114&v=1.1_541

GOOD LUCK

How do I group my data into subsets using R?

0 Recommendations

Elijah Oluwashegun

asked a question related to Dataset

Question

2 answers

Aug 9, 2022

I've done RNA-seq analysis on a dataset downloaded from GEO looking at immune gene expression in Asthmatic, COPD and normal epithelial lung cells. Trying to do a t-test for my statistical analysis, but I need to group my data into Asthmatic, Healthy and COPD samples/cells as it doesn't show up in R which samples belong to which group?

Relevant answer

Giusy del Giudice

Aug 11, 2022

Answer

Hi,

you need to compare the disease versus control samples when doing the statistical test. I didn't fully get if the problem is that you lack the information of which samples belongs to which category or it is a coding problem. As for the former, usually datasets have a metadata file in which the sample names you find in the gene expression table are present, and the treatment information is included. If it is a coding problem, you can index the sample names to divide the data into Asthmatic, Healthy and COPD.

How to use case-controlled dataset?

0 Recommendations

Bel Wong

asked a question related to Dataset

Question

2 answers

Aug 8, 2022

I have done 1:5 case-matching on my study, so my dataset have 100 intervention group and 500 control group.

When I run independent samples t test, do I use the 100 intervention group vs all 500 control group?

When I present participant characteristics, do I use 100 intervention group vs all 500 control group?

Thanks.

Relevant answer

Bel Wong

Aug 9, 2022

Answer

David Schmidt Thanks a lot :)

North American Estuarine and Coastal Benthic Datasets Suggestions?

0 Recommendations

Matthew Pruden

asked a question related to Dataset

Question

6 answers

Jun 8, 2022

Hello! I am putting together a dataset of benthic macroinvertebrate monitoring / count data from estuaries and coastlines along the North American Coast (Canada and US). I know of larger datasets like the NCCOS National Benthic Inventory and EMAP, but I was wondering if anyone knew of other regional datasets. It would be preferable if the data were collected using Young-modified Van Veen grab samplers along with information on water quality and sediment quality, but any dataset recommendations will be greatly appreciated!

Relevant answer

Matthew Pruden

Aug 8, 2022

Answer

Marcus W. Beck Yes, I am still on the hunt for more data! Thank you for sharing the links to the Tampa Bay monitoring data, I did not know about them. I will take a look at them!

Have attention embedding in cycleGAN?

0 Recommendations

Chen Yijia

asked a question related to Dataset

Question

2 answers

Aug 4, 2022

cycleGAN performs well on unpaired datasets, and the attention mechanism has become a hot topic in recent years, so can we combine attention and cycleGAN?

Is there such a project? Papers and code are preferably available, thanks.

Relevant answer

Aug 8, 2022

Answer

Dear Chen Yijia ,

The application of deep learning in the field of drug discovery brings the development and expansion of molecular generative models along with new challenges in this field. One of challenges in de novo molecular generation is how to produce new reasonable molecules with desired pharmacological, physical, and chemical properties.

Article Improving de novo Molecule Generation by Embedding LSTM and ...

Regards,

Shafagat

How can I get the average of curves on a plot?

3 Recommendations

Pankaj Rameshwar Jaiswal

asked a question related to Dataset

Question

4 answers

Aug 16, 2020

Hello!

I would like to get the average curve from several curves on a plot. Is there a way to do this in Excel?

(Background information: I have drawn three different curves for three different x-y data sets. However, each of these has a different amount of XY points and different in length and I can't simply take the average across the row. Any solutions?) See the picture

Thanks!

Captur
e.PNG
21.49 KB

Relevant answer

Kamyar Bagherinejad Shahrbijari

Aug 7, 2022

Answer

The following python code could Average Multiple Curves and plot them.

I've shared it on Github and it's available from the following link:

https://github.com/kamyarba/Average-Multiple-Curves

Figure
_1.png
57.43 KB

6 Recommendations

How to use image dataset (stored in my personal computer) in google colab?

asked a question related to Dataset

Question

8 answers

Aug 3, 2022

I want to use image dataset (that is stored in my personal computer) in google colab. Please help.

Relevant answer

Marco Antonio Oyarzo

Aug 3, 2022

Answer

Hi, you can use a dataset in Colab by using Google Drive.

Seasonal forecast daily and sub daily data on single levels (ECMWF, ERA5) ?

4 Recommendations

Ashrumochan Mohanty

asked a question related to Dataset

Question

1 answer

Aug 3, 2022

https://cds.climate.copernicus.eu/cdsapp#!/dataset/seasonal-original-single-levels?tab=form

Hello everyone!

I have a doubt regarding the forecasted data set. In this data set, the forecast of different lead hours is given with the first day of the month as the initialization day. How to find the lead hour forecasted data of the 2nd day of the month?

Do I have to purchase this or there are some other methods to get the data? Are there any alternate data sets?

Relevant answer

Ashrumochan Mohanty

Aug 3, 2022

Answer

Why date 2 to 30 or 31 is deactivate? how get the information of lead hour forecast of different dates of the month?

Screenshot 2022-08-03 1102
59.jpg
143.43 KB

How to download 3D Reflectivity datasets in U.S.?

0 Recommendations

Haochen Tan

asked a question related to Dataset

Question

2 answers

Jul 29, 2022

Dear all,

I tried to download some 3D reflectivity datasets over CONUS. The Earthdata Search has NEXRAD mosaic but is only available for 3 months in 2020. The radar data for NEXRAD is also station data. The only one I found that is 3D (mosaic) is the National Reflectivity Mosaic data but it is not available for download at https://www.ncei.noaa.gov/maps/radar/ . Does anyone know how to download this data or where to acquire the 3D reflectivity dataset?

Best regards,

Haochen

Relevant answer

Haochen Tan

Aug 2, 2022

Answer

@Shafagat

Dear Shafagat,

Thank you for your reply. I am wondering do you know any more recent data? Something like this but not station data.

Best regards,

Haochen

How to fit a data set by a fitting function containing multiple datasets?

0 Recommendations

Amit Chanda

asked a question related to Dataset

Question

7 answers

Jul 20, 2022

Suppose I have a dataset f(x). I want to fit this dataset with a fitting function

g(f1,f2,f3,x) = a*x +b*f1(x)+c*f2(x)+d*f3(x). Here, f1(x), f2(x) and f3(x) are three different datasets. Can anyone tell me how to fit f(x) with g(f1,f2,f3,x)? I tried to fit in Origin using this method: https://www.originlab.com/doc/Tutorials/Fitting-Datasets, but it didn't work very well. Is there any other suggestion? Thanks for your help

Relevant answer

Adam Krysztofik

Aug 2, 2022

Answer

I'm not sure if I understood correctly but I would use mathematica. One can simply make an interpolation for f1(x), f2(x) and f3(x):

(*Data to be fitted,coeffients like 0.005,0.12 etc to be found from the fitting*)

fxData=Table[{x,RandomReal[{0.997,1.003}] (0.005 x+0.12 Sqrt[x]+0.18 Exp[-x]-0.015 x^1.2)},{x,0,10,0.01}];

(*other data,for fxData,f1xData and so on one can use Import[],here are just random lists as an example*)

f1xData=Table[{x,Sqrt[x]},{x,0,10,0.01}];

f2xData=Table[{x,Exp[-x]},{x,0,10,0.01}];

f3xData=Table[{x,x^1.2},{x,0,10,0.01}];

(*Interpolation of f1x,f2x,f3x*)

f1x=Interpolation[f1xData];

f2x=Interpolation[f2xData];

f3x=Interpolation[f3xData];

(*fitting function*)

g[x_,a_,b_,c_,d_]:=a x+b f1x[x]+c f2x[x]-d f3x[x]

(*fitting*)

nlm=NonlinearModelFit[fxData,g[x,a,b,c,d],{a,b,c,d},x];

(*plot and parameter table*)

Show[{ListPlot[fxData,PlotRange->All],Plot[nlm[x],{x,0,10},PlotStyle->Red]}]

nlm["ParameterTable"]

Copy and paste should work :)

Can anyone help me to get an open access EEG or fNIRS dataset (MATLAB format) collected during human gait or hand movement ?

2 Recommendations

Syeda Umme Ayman Shoity

asked a question related to Dataset

Question

5 answers

Jul 20, 2022

Basically I have a great interest in brain signal processing and analysis. It would be great if anyone can help to find out an open access EEG or fNIRS dataset of hand movement or human gait.

Relevant answer

Kathy Burson

Aug 2, 2022

Answer

Check with the Georgia Tech Behavioral Medicine department. I know they were doing studies on individuals' gaits.

Mohammad Tavakol Sadrabadi

0 Recommendations

asked a question related to Dataset

If I have an imbalanced dataset, Should I train my model on the whole dataset or a balanced and subsampled version of the dataset?

Question

10 answers

Jul 29, 2022

I'm working on a supervised Classification task with seven classes. The problem is that the dataset is very large and hugely imbalanced, with the number of data points for the major class being 100 times the minor class.

First, I tried to subsample the dataset into a smaller balanced dataset randomly, and the highest accuracy I could obtain after tuning hyperparameters was around 90%.

Then I decided to train the tuned models over the whole dataset (70% of data for training and 30% for test), and surprisingly, the accuracy of the models reached 95% or higher.

My question now is, which procedure is the correct one? training over subsampled dataset and test for the large dataset or train and test over the whole dataset?

Relevant answer

Amir Kamel Rahimi

Aug 2, 2022

Answer

The imbalanced data is a common problem especially in Medicine.

I suggest you need to perform the following:

1- Split your entire data into train and test sets. The ratio is usually 80% for train set and 20% for test set. Make sure you stratify your split meaning that your train and test should have the same imbalanced ratio as your original data. As an example you can use sklearn library which does split your data and also perform stratify as below:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True, stratify=y)

2- Use up-sampling or down-sampling techniques to balance your data ONLY on your train set. DO NOT perform this on your test sets! Your test should remain intact!

3- Fit your ML model on your train set which is already balanced from the previous step.

4- Evaluate your model accuracy on your test set which is already imbalanced!

Selection of most important predictors for regression (absolute value of regression coefficients vs added Rsquared)?

12 Recommendations

Jan J Quets

asked a question related to Dataset

Question

4 answers

Aug 1, 2022

A bottom-up stepwise regression on a 140-variable, standardized dataset (all features have mean 0 and stddev 1), selected 10 variables as best predictors for a certain target.

The used stepwise regression first selected the predictor variable with the highest adjusted R2adj, then added the second predictor variable which increased R2adj the most, and so on, until R2adj started to decrease again (this happened after 10 added variables). All selected predictors needed to have p<0.05, or they where discarded. Hence, this stepwise regression implicitly ranked the 10 selected predictor variables, out of 140, from most important to less important, in terms of R2sq.

I expected that the associated absolute values of the regression coefficients of the above selected predictors would also decreased in value, together with the decline in added R2adj. This however turned not to be the case. For instance, the most important explaining predictor (in terms of R2adj), did not have the highest absolute value of its regression coefficient, when compared to the other 9 selected predictors. Remember that all predictors are standardized

What could be the reason for this?

Relevant answer

David Morse

Aug 1, 2022

Answer

Hello Jan,

If your goal was to identify some "optimal" subset of IVs to best explain differences on an outcome of importance to you, stepwise models are among the least dependable means to accomplish this goal. In point of fact, you are in no way guaranteed to identify the best ensemble of IVs (for a given proportion of the total available IVs) via step methods. As well, they suffer from a number of other technical issues.

There are far better options available. Have a look at this list: https://scikit-learn.org/stable/modules/linear_model.html (and note the adaptive lasso option, among others).

Finally, with respect to your question: Why do the relative sizes of regression coefficients change as you add IVs to a model?

Because of overlap/redundancy among IVs (when excessive, this is referred to as multicollinearity). Later-entered variables can supplant earlier-entered variables, even though they might not have been as potent an individual IV as the first IV entered. If all IVs were genuinely independent of each other, then sequence of entry into a model would not change the explanatory power of a given IV as the model grew in number of predictors.

Good luck with your work.

What is the best practice for preparing the dataset?

6 Recommendations

Ziyad R. Alashhab

asked a question related to Dataset

Question

2 answers

Jul 31, 2022

For example, If we collected the data (Dataset.CSV) with 7 million records, we want to take a sample from 7 million records, just 1 million records.

What are the first step, second step, and so on........if the dataset needs the following steps...

(labeling\ numeric\ normalization\ balance\ sample 1 million records\ cleansing).

Additional question: Is the balancing okay if we do it for the normal records, not for the attack records (there is any problem)?

Relevant answer

Arkan Kabla

Aug 1, 2022

Answer

To randomly select rows (packets) by using Pandas DataFrame:

df = df.sample() // Randomly select a single row

df = df.sample(n=500) // Randomly select a specified number of rows. For example, to select 500 random rows, set n=500

3 Recommendations

Tao Li

asked a question related to Dataset

Question

7 answers

Jul 31, 2022

When we are carrying out a scientific research, is it better to use the public data set in this field or the self-made data set collected by ourselves?

Relevant answer

Wolfgang R. Dick

Jul 31, 2022

Answer

That can hardly be said in such abstract terms. It depends on how good the public data set is and how good your own is. The best thing is to use both and compare the results.

Which features (acustic, cepstral, spectral, etc.) can determinate the gender and age of a voice?

6 Recommendations

Enrique Díaz-Ocampo

asked a question related to Dataset

Question

2 answers

Jul 29, 2022

I am interested in the study of features than can determine gender and age from short speech (from 1 to 9 seconds). The audios are from a public set (Mozilla common voice dataset), where the duration and the quality are variable.

Relevant answer

Suresh Thontadarya

Jul 30, 2022

Answer

1. F0 would be one indicator for gender influence on voice.

2. many parameters have been evaluated to answer this question.

3. By acoustic analysis, i assume you mean pitch and intensity analysis. the parameters may differ depending on analysis technique one uses.

4. commonly age wise values are established as well as gender wise values are established for F0, harmonic to noise ratio, Jitter, Shimmer to name few. we use such norms routinely in clinics while trying to identifying abnormalities in voice.

5. cepstral peak- CPP and related parameters are often used in cepstral analysis

6. routine search on pubmed should give you lots of articles on this.

How should you source and format data for academic research?

9 Recommendations

Zachary Brower

asked a question related to Dataset

Question

3 answers

Jul 28, 2022

I am close to submitting a paper for publication in an Economics Journal. My paper is based on my empirical cross-country analysis with a sample of 126 countries. This analysis includes around 20 variables averaged over the 10-year sample period. These variables come from multiple databases such as the World Bank's World Development Indicators and the IMF's International Financial Statistics. To create the dataset used in my analysis I simply downloaded each respective database into excel, removed the countries that are not in my sample, averaged each variable for the sample period, and then copy and pasted these variables into a column in my dataset. Is this an appropriate way to source and format data for academic research?

Relevant answer

Mahdi Movahed-Abtahi

Jul 28, 2022

Answer

No!

Do we have any statistical test to find out whether there is a correlation between ordinal level data and continuous data?

5 Recommendations

Kunasingam Piratheeban

asked a question related to Dataset

Question

4 answers

Jul 27, 2022

Pearson test can be used to find out the correlation between two continuous data sets.

Spearman test can be used to find out the correlation between two ordinal data sets.

Do we have any test to find out the correlation between ordinal data (mean score of Likert scale data set) and continuous data set (Academic performance in terms of exam scores)? If not, How can we do that?

Relevant answer

Blaine Tomkins

Jul 27, 2022

Answer

If you're using the mean of multiple Likert-scale items, you're already treating the data as continuous (i.e., interval), in which case you can go ahead with Pearson correlation.

An alternative option is to use the median or mode of the Likert-scale items instead and conduct an ordinal regression.

Are there any transportation/traffic engineering/planning related journals that publish qualitative and/or descriptive research?

4 Recommendations

Debjit Bhowmick

asked a question related to Dataset

Question

6 answers

Jul 24, 2022

Descriptive research such as research describing a large dataset such as a travel behaviour dataset via preliminary analysis.

Relevant answer

Faraed Salman

Jul 27, 2022

Answer

https://www.journals.elsevier.com/transportation-engineering,this link may be useful.

Sakib Rahman Siddique Shuvo

15 Recommendations

asked a question related to Dataset

Which pixel-based climate datasets are suitable for a micro climatic research?

Question

3 answers

Jul 22, 2022

Hello there,

I am searching for some freely available pixel-based (can be derived from satellites or mixed like CRU or IRI data library) datasets which have a resolution of less than 500m (preferably less than 100m). It would be nice if you name some!

Thank you so much for your attention and participation.

Relevant answer

Michel M. Verstraete

Jul 27, 2022

Answer

Dear Sakib,

There are literally thousands of freely available data sets worldwide. What exactly do you need? No instrument is perfect and each data set has its own advantages and drawbacks. You should select those that are most appropriate for your purposes, and in particular determine your accuracy requirements, as they will imply close looks at the calibration issues as well as considerations regarding post-processing. Once you have clearly identified the parameters you need, the spatial and temporal extents and resolutions required, and the minimum accuracy needed, then you can search for the best inputs for your purpose.

By the way, NASA does offer a wide range of data sets but it is not the only source of information: the European Space Agency (ESA), as well as national space agencies of Japan, China, France, UK or Brazil (and many others) also have worthwhile offerings. You will find useful links to those data sources by searching the web.

Best regards, Michel.

Machine learning based approach?

10 Recommendations

Rameez Hassan Pirzada

asked a question related to Dataset

Question

10 answers

Jul 19, 2022

In a machine learning-based approach, a dataset used to check the accuracy of model prediction can be part of the training set?

Relevant answer

Shima Shafiee

Jul 27, 2022

Answer

A dataset in machine learning is, quite simply, a collection of data pieces that can be treated by a computer as a single unit for analytic and prediction purposes. This means that the data collected should be made uniform and understandable for a machine that doesn't see data the same way as humans do.

Most data can be categorized into 4 basic types from a Machine Learning perspective: numerical data, categorical data, time-series data, and text.

training set—a subset to train a model. test set—a subset to test the trained model.

The “training” data set is the general term for the samples used to create the model, while the “test” or “validation” data set is used to qualify performance. Perhaps traditionally the dataset used to evaluate the final model performance is called the “test set”.

How to wavelet analysis for data with gaps using python?

0 Recommendations

Ranjan Kumar Sahu

asked a question related to Dataset

Question

4 answers

Jul 26, 2022

So, with some difficulties I have been able to do wavelet analysis for a time series datasets that I have. The thing is, all these data sets can be combined to form a year long dataset, with some gaps as big as a month.

A solution to this is to interpolate the data. But considering that my data has a sampling rate of 10 mins with one month gap would not allow to pursue this solution.

For discontinuous(unevenly ) spaced data, instead of FFT, Lomb-scargle periodogram is used. If someone can suggest a hack like that for wavelet analysis, it'd be highly appreciated.

Relevant answer

Brajesh Mishra

Jul 27, 2022

Answer

Dear Ranjan Kumar Sahu,

If you have sufficient datapoints. I would suggest you to analyze the period before and after gap separately.

Why SBERT works faster than BERT?

5 Recommendations

Niladri Chakraborty

asked a question related to Dataset

Question

2 answers

Jun 25, 2022

I would like to know why SBERT takes lesser time compared to BERT for large text data set

Relevant answer

Akhter Al Amin

Jul 25, 2022

Answer

A followup question: Doesn't BERT rely on context or embeddings of nearby words to produce embedding of a word?

Where can I find a Twitter dataset for Preliminary Flu Outbreak Prediction Using Twitter Posts Classification?

0 Recommendations

Fateme Mohseni

asked a question related to Dataset

Question

4 answers

Dec 8, 2020

Where can I find a Twitter dataset for Preliminary Flu Outbreak Prediction Using Twitter Posts Classification?

Relevant answer

Jul 25, 2022

Answer

The following is a list of some datasets which might be helpful for your work.

Dataset #1

MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak - The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 (date when the first case of this outbreak was detected) to 23rd July 2022. Link to the dataset - https://doi.org/10.5281/zenodo.6898178

Dataset #2

Twitter Conversations about the COVID-19 Omicron Variant - It presents a total of 522,886 Tweet IDs of the same number of tweets about the SARS-CoV-2 Omicron Variant posted on Twitter since the first detected case of this variant on November 24, 2021. Link to the dataset - https://doi.org/10.5281/zenodo.6893676

Dataset #3

Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave - The dataset comprises a total of 52,984 Tweet IDs of the same number of tweets about online learning posted since the first detected case of the Omicron variant. Link to the dataset - https://doi.org/10.5281/zenodo.6837118

Coronavirus (COVID-19) Tweets Dataset

0 Recommendations

Rabindra Lamsal

asked a question related to Dataset

Question

5 answers

May 24, 2020

Dataset: http://dx.doi.org/10.21227/781w-ef42

This dataset includes CSV files that contain the tweet IDs. The tweets have been collected by the model deployed here at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets, using filters: language “en”, and keywords “corona”, "coronavirus", "covid", "covid19" and variants of "sarscov2".

As per the Twitter Developer Policy, it is not possible for me to provide information other than the Tweet IDs (this dataset has been completely re-designed on March 20, 2020, to comply with data sharing policies set by Twitter). Note: This dataset should be solely used for non-commercial research purposes. A new list of tweet IDs will be added to this dataset every day. Bookmark the dataset page for further updates.

Dataset status as of May 24, 2020: 116,962,112 Global Tweets (EN)

Relevant answer

Jul 25, 2022

Answer

The following is a list of some datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Can any one suggest me the "vehicular and user mobility dataset"?

0 Recommendations

Manoj Kumar

asked a question related to Dataset

Question

6 answers

Apr 25, 2022

I am looking for the dataset where the users along with the vehicles are in motion. If the data set contains any social information that would help me alot.

Relevant answer

Jul 25, 2022

Answer

The following is a list of some more datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Where can datasets related to Educational Recommender Systems be obtained?

0 Recommendations

Abba Almu

asked a question related to Dataset

Question

1 answer

Jun 26, 2022

Collaborating Filtering or Content-based Recommender Systems.

Relevant answer

Jul 25, 2022

Answer

The following is a list of some datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Can anyone provide me a dataset for simulating an urban or sub-urban uneven terrain?

0 Recommendations

Indu Chandran

asked a question related to Dataset

Question

1 answer

Jul 1, 2022

I would like to simulate a disaster environment. Can any one provide me a dataset for urban/sub-urban uneven terrain please.

Relevant answer

Jul 25, 2022

Answer

The following is a list of some more datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Is there any website or web repositorty from where we can download the free dataset to train the deep learning model ?

0 Recommendations

Ajay Sharma

asked a question related to Dataset

Question

1 answer

Jul 21, 2022

Is there any website form where i can dowload the free data sets related to the biomedical image processing or some other text related data to build but the Deep lerning model.

Relevant answer

Jul 25, 2022

Answer

The following is a list of some datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Where can I find activity-based datasets?

5 Recommendations

Louis Herrmann

asked a question related to Dataset

Question

5 answers

Nov 1, 2021

Hello,

I am studying Computer Science and I am currently working on my Bachelor thesis. For that, I am looking for suitable datasets. My goal is to apply Process Mining to these datasets to identify and analyze interesting processes. However, the problem is that these datasets need to be in a certain format to be suitable for Process Mining. The data needs to have a Case Id, Activity, and Timestamp column. In other words, the data needs to be activity-based so that processes with different activity sequences can be found.

I wanted to ask if someone has any idea where I could find such datasets? I'd be most interested in datasets in sectors such as energy, waste management, public work (but other input would be helpful as well). So far I mainly could find the datasets from previous years' BPI challenges.

Here is a short page with more information about Process Mining and the desired format (including a brief example):

http://processmining.org/event-data.html

Any feedback would be highly appreciated.

Thanks in advance,

Louis

Relevant answer

Jul 25, 2022

Answer

The following is a list of some more datasets which might be helpful for your work.

Dataset #1

Dataset #2

Dataset #3

Different generation quality with same model, how can I locate the problem?

0 Recommendations

Eliyas Suleyman

asked a question related to Dataset

Question

2 answers

Jul 8, 2022

I am training a Beta-VAE using BDD-100k driving dataset. Here are my hyperparameters: Adam optimizer, 0.0001 learning rate, and my latent dimension is 16, loss function is reconstruction loss(MSE) and KLD loss multiplied to Beta factor. After a while of training, the model seems learned something, but with different samples the exact same model's performance is absolutely different. Can anyone give me some hint for how to understand what is going on there? Thanks! Here are examples of same model generating different results.

bad_samp
le.png
662.80 KB
good_samp
le.png
659.97 KB

Relevant answer

Fatima Abdalkadhum

Jul 24, 2022

Answer

color and clarity

Which python library is best to process EEG signals?

4 Recommendations

Shilpa Sj

asked a question related to Dataset

Question

4 answers

Jul 21, 2022

For a dataset like

1.BCI Motor imagery EEG signals(example: BCI competition IV),

2.SEED dataset

which python library is best suitable for processing and feature extraction tasks?

Relevant answer

Mohamed Alseddiqi

Jul 23, 2022

Answer

check this link

https://iq.opengenus.org/eeg-signal-analysis-with-python/amp/

you can also use dfa, numpy, scipy and sklearn.

I used those libraries in my EEG signal filteration and classification project

Can you help me to find Computer/Network/IoT Security dataset with challenges?

5 Recommendations

Wisam Makki Alwash

asked a question related to Dataset

Question

13 answers

Jul 1, 2022

I need a security dataset with challenges, I mean I want the accuracy to be low so that I can enhance it using ML techniques. I tried several datasets but they already have high accuracy without enhancement.

Relevant answer

Mohamed Amine Ferrag

Jul 22, 2022

Answer

Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022, DOI: 10.1109/ACCESS.2022.3165809

https://ieee-dataport.org/documents/edge-iiotset-new-comprehensive-realistic-cyber-security-dataset-iot-and-iiot-applications

Does anyone know of any resting state EEG datasets for psychiatric disorders?

10 Recommendations

Sandura Shumba

asked a question related to Dataset

Question

2 answers

Jul 17, 2022

I an looking for an resting state (eyes closed) EEG datasets for any kind of psychiatric disorder. These can include, but not limited to

Alcohol use disorder
Acute stress disorder
Addictive disorder
Anxiety disorder
Behavioral addiction disorder
Schizophrenia
Post traumatic stress disorder
Depressive disorder
Bipolar disorder

etc.

I would prefer if the datsets contain raw EEG data eg EDF files but. If anyone can assit i would really appreciate that. Thank you in advance

Relevant answer

Pragati Patel

Jul 22, 2022

Answer

check this out:

https://physionet.org/about/database/

Which is the best pre-trained model to use for Sentiment classification?

0 Recommendations

Rajat Rautela

asked a question related to Dataset

Question

1 answer

Jul 20, 2022

I am working on classifying sentiments for tweets dataset, in an unsupervised manner. I have used TextBlob Polarity, AFINN and Vader Sentiment Analyser for the Sentiment Classification. Among these, I have got relatively better results with Vader. However, the results are still not good enough in terms of accuracy. Vader gave an accuracy of around 50%.

Is there any way to improve the accuracy of Vader or, is there any other pre-trained model that can be used to provide a better classification?

Any help would be highly appreciated.

Thank You.

Relevant answer

Richard E Gilder

Jul 22, 2022

Answer

NVIVO

Also the text analysis engine in the IBM Modeler suite (similar to the one in PALANTIR) but much less expensive.

Richard E. Gilder, RN MS

Bioinformatics Scientist

The Gilder Company

TTUHSC Adjunct Faculty

richard.gilder@ttuhsc.edu

How to change a time variable (in 24-hours format) from NUMERIC to DATE/TIME without compromising your data?

0 Recommendations

Kelvin Leshabari

asked a question related to Dataset

Question

2 answers

Jul 20, 2022

Dear Ones,

We are in the process of data analysis for our research study on HYPOGLYCAEMIA using SPSS version 23. One of the challenges we have just faced after completion of data entry being data on "time of interview" that was coded as NUMERIC during tool design (i.e. variable design on SPSS) could NOT be transformed as time as it read as a NUMERIC. Well, that was our MISTAKE!

Solution: we decided to change the variable TIME OF INTERVIEW back to DATE/TIME format, with specification of time into hh:mm, on "variable view" inside SPSS thinking that it will change it and read as TIME variable (in 24-hours) but to no avail.

At present, it has recorded a completely different time (actually forwarding time by 6 hours for each entry) instead of the original time planned.

e.g. for entry number 1, instead of 0930hrs supposed to be read, it currently reads 1530hrs. It does so for all other entries.

We also tried to convert the same variable "time of interview" back to DATE/TIME using HH:MM:SS but we ended up with a new problem.

i.e. at present, instead of say 09:30 Hrs, it reads 00:15

How to correct our mistake without jeopardising our dataset for the named variable?

Relevant answer

Kelvin Leshabari

Jul 20, 2022

Answer

@shahab : thanks. I normally use SAS myself. It is a friend's research that she just incorporated me. She uses SPSS!

How to convert EEG DEAP dataset into a CSV file ?

0 Recommendations

Sambit Subhasish Sahu

asked a question related to Dataset

Question

9 answers

Jul 13, 2022

I have the EEG DEAP dataset in .dat format , by the process I can see the complete data of each candidates . but I want to store those data in a CSV file . can you please help me regarding this ?

Relevant answer

Sadeem Kbah

Jul 18, 2022

Answer

Sorry for replica, there were errors in the webpage..

Data sets of genes that cause blood and bone cancer

0 Recommendations

Younas Masih

asked a question related to Dataset

Question

3 answers

Jul 17, 2022

I want data sets of blood and bone cancer.I want to sequence it in the python by the use of Artificial Neural Networks

Relevant answer

1. https://github.com/rli012/BioinfoHub

Jul 18, 2022

Answer

Younas Masih Have a look:

2. https://github.com/Vedant-S/Gene-Cancer-Classification_Principal-Component-Analysis

3. https://github.com/topics/cancer-diagnosis

4. https://github.com/topics/cancer-genomics

5. https://github.com/haniffalab/FCA_bone_marrow

4 Recommendations

Israa K. Salman

asked a question related to Dataset

Download a dataset?

Question

3 answers

Jul 17, 2022

pls, is there anyone that can help me to download this dataset "Columbia MVSO Image Sentiment Dataset" I tried to use their link, which was mentioned in the paper, but it's not working!

Relevant answer

Jul 18, 2022

Answer

Israa K. Salman Take a look https://github.com/ColumbiaDVMM/ColumbiaImageSearch

How can I find graph embedding from the dataset of molecularnet benchmark?

1 Recommendation

Souvik Panda

asked a question related to Dataset

Question

1 answer

Jul 16, 2022

I need the python code for molecularnet benchmark datasets ' to find the graph embedding for each dataset

Relevant answer

Jul 16, 2022

Answer

Souvik Panda Each item (typically nodes) in our network is given a fixed length vector representation by a graph embedding. These embeddings are a lower-dimensional representation of the graph that preserves the topology of the graph.

Bangla corpus for sentiment analysis or emotion recognition?

5 Recommendations

Sajal Das

asked a question related to Dataset

Question

7 answers

Jul 15, 2022

Please let me know the name or URL of any comprehensive Bangla corpus data for SA or ER.

Relevant answer

2. https://ieeexplore.ieee.org/document/9225565

Jul 16, 2022

Answer

Sajal Das Take a look:

Article BEmoC: A Corpus for Identifying Emotion in Bengali Texts

3. https://github.com/shaoncsecu/BanglaEmotion

Is there a way to calculate MAF for each SNP of a large data set (~5000 SNPs)?

7 Recommendations

Giulia Trauzzi

asked a question related to Dataset

Question

2 answers

Jun 21, 2022

Hello,

I would like to compute the MAF of each SNP in my large data set. Is there a quick way to do so in R or in some module in bash?

Thanks,

Giulia

Relevant answer

Jul 16, 2022

Answer

Giulia Trauzzi Take a look https://academic.oup.com/bioinformatics/article/28/20/2615/201922

What is the tradeoff between number of gaps and number of sequences in ancestral reconstruction?

4 Recommendations

Collin Nisler

asked a question related to Dataset

Question

3 answers

Jul 13, 2022

Hello all, I have a sequence alignment of ~2000 sequences, which is likely more than is necessary. If I begin to remove sequences manually or using some software program I'm sure I can reduce the number of gaps, but this will of course reduce the size of the alignment (and may introduce some amount of bias/subjectivity). Is it better to keep the larger dataset at the expense of greater gap character? Is there a rough criteria for minimum amount of gaps an alignment should contain for reconstruction? Thanks very much.

Relevant answer

João Pedro Fernandes Queiroz

Jul 16, 2022

Answer

Dear Collin Nisler ,

There is a lot of things to consider. You have ~2000 sequences, but: are these sequences orthologs? Are there paralogs? Do you know the species tree for the organisms from which these sequences were obtained? Is the taxonomic sampling of sequences balanced across the species tree? What is the mean sequence length? What is the MSA length? What is your ASR approach (maximum-likelihood, Bayesian, or parsimony)? Those gap positions are due to true insertions and deletions (indels), or are some of those gap positions caused by incomplete sequences? If they are indels, have you considered to perform a MSA using a statistical model for indels (Poison Indel Process) as implemented in ProPIP MSA software? Have you compared different MSA softwares? Are these gaps present in important and conserved sites or are they only in variable sites?

I believe that the approach to dealing with the trade-off will vary depending on the answer to each of the above questions.

For instance, if you will use maximum-likelihood or Bayesian approaches for ASR, you should be careful with gap positions because the stochastic substitution models used in these approaches do not account for indels. If there are many incomplete sequences, I would remove them as much as possible. If the taxonomic sampling is biased toward one specific clade, and if many sequences of this clade are gappy, I would prune this clade.

I think that perhaps the closest thing to a general rule of thumb is to worry about the quality of your taxonomic sampling rather than the quantity.

Best wishes,

Pedro

How to interpret a condition of model have higher testing score than it's training score?

0 Recommendations

Irfan Ripat

asked a question related to Dataset

Question

5 answers

Jul 14, 2022

So I have calculated the accuracy of my model prediction using both training and testing datasets. And I found that it has higher testing accuracy than training accuracy. It is a normal condition? And how do I interpret this condition?

Relevant answer

Behzad Shomali

Jul 14, 2022

Answer

Hello Irfan Ripat,

as you already gussed it is not a normal situation to obtain a better accuracy on test data in comparison with the training data. However, there are some things that you can test that help you figure out this situation. The two most conventional ones are:

Use strong regularization: as you may know, regularization terms are used to prevent the model to perform too well on training data (give model sth. to do other than only fitting on the training data). In this case, since you've extremely limited the computational capacity of your model during training but using all of the model's power during the test, you may obtain such accuracies.
Did not split data correctly: there is a high chance that during splitting the data into train-test sets, the same training data sample appears in the test data (due to either bad splitting or the existence of duplicated data).

Besides the above-mentioned plausible reasons, I searched the similar issue and found the following links useful:

I hope it helps you!

Ground Glass Opacities in Lungs CT-scans: Do you know a database with pixel-wise labels?

4 Recommendations

Giacomo Benvenuti

asked a question related to Dataset

Question

3 answers

Jul 11, 2022

I want to train a CNN to segment Ground Glass Opacities (GGO) in Lungs CT-scans.

I would need a dataset with CT scans and corresponding masks indicating for every voxel if it is GGO or not (i.e. the ground truth for the segmentation).

Do you know any dataset like that?

Many thanks for your help!!

Relevant answer

Jul 14, 2022

Answer

Giacomo Benvenuti Take a look https://www.mdpi.com/2306-5354/8/2/26/pdf

What is the difference between relative weight and relative dominance analyses? Normally what is the difference between their coefficients?

0 Recommendations

Sarthak Mohanty

asked a question related to Dataset

Question

5 answers

Sep 23, 2019

I am applying multiple regression analysis to my datasets for prediction purpose. To calculate relative contribution of each predictor I want to know about the much suitable method.

Relevant answer

Yasser Ali Kamal

Jul 13, 2022

Answer

The relative weight analysis addresses the multicollinearity problems and helps calculate each predictor's importance rank.

Training ML classifier with 5G attack datasets

0 Recommendations

Nikhil Chowki

asked a question related to Dataset

Question

7 answers

Jul 5, 2022

Can we simulate an IoT kind of network using the NetSim 5G library? I would also want to model different kinds of attacks and generate data set to train an ML classifier.

Relevant answer

Faraed Salman

Jul 12, 2022

Answer

https://support.tetcos.com/support/solutions/articles/14000121446-combining-iot-and-5g-networks-protocols-in-netsim-simulations, https://www.researchgate.net/post/How_do_I_simulate_a_multi-BS_5G_network_in_NetSim kindly check these 2 links.

I have around 40% missing data, what do I need to do?

16 Recommendations

Jessica Berner

asked a question related to Dataset

Question

4 answers

Jul 12, 2022

Hello!

I need help-

I have a data set with around 35-40% missing.

I work with SPSS, what can I do?

I am looking at change in technology anxiety over time in older adults.

Thanks in advance!

Kind regards,

Jessica

Relevant answer

Christian Geiser

Jul 12, 2022

Answer

Under the assumptions of missing at random (MAR) or missing completely at random (MCAR) data, you can use multiple imputation. Alternatively, many other software programs offer full information maximum likelihood estimation which can be applied to many common statistical procedures such as regression and ANOVA and relies on the same assumptions as does multiple imputation.

Please, where can I find a dataset of Hyperspectral melanoma images?

16 Recommendations

Meddeber Lila

asked a question related to Dataset

Question

2 answers

Jul 4, 2022

Non-Invasive Skin Cancer Diagnosis Using Hyperspectral Imaging

Relevant answer

Meddeber Lila

Jul 12, 2022

Answer

Thank you very much for your help but I am looking for multispectral or hyperspectral skin images, however the ISIC images are of the RGB type. Thanks again

PCA -LIBS in MATLAB. How can i get the scripts?

0 Recommendations

Victoria C. Jiménez

asked a question related to Dataset

Question

2 answers

Jul 2, 2022

I need to do a PCA or spectroscopic dataset (LIBS) classification method in MATLAB. how can i get the scripts?

Relevant answer

Minglang Cai

Jul 8, 2022

Answer

Maybe try it on Github.

What are the best machine learning prediction based algorithms to give a try for nonlinear small data set size ?

0 Recommendations

Sitaram Meduri

asked a question related to Dataset

Question

3 answers

Jul 8, 2022

I have a nonlinear data set of continuous data points which consists of 141 rows & 5 columns (4 independent variables and 1 output).

For getting a good start what can be the machine learning algorithms I can choose for?

Relevant answer

Balakumara Vignesh M

Jul 8, 2022

Answer

You can start with Linear regression, and then introduce the nonlinearity in variables not coefficients - like using the Polynomial features from sklearn library - Python.

In other words, Given 4 variables as X1, X2, X3, X4 - you can start with first degree LR => Y ~ X1 + X2 + X3 + X4; then move on to second degree without and with interactions => Y ~ X1 + X2 + X3 + X4 + X1*X1 + X1*X2 + X1*X3 + X1*X4 + ... + X4*X4; then move on to introducing nonlinear kernels => Y ~ exp(X1) + log(X2) + (1/X3) + ... something like this.

Once all these fails to capture the variations in Y data (which can be determined to an extent from the residual plot) move on to the other non linear techniques like PCR/PLS, SVR etc.

How do I get access to census data sets and drill down on specific ares on demographics? I am learnin python for this purpose?

0 Recommendations

James selva-kulendran

asked a question related to Dataset

Question

3 answers

Jul 4, 2022

i want to look at country of origin by states to have communication lines with groups with specific language skills. analytics practice

Relevant answer

Luttfi A. Al-Haddad

Jul 8, 2022

Answer

I don't use python for this instant, so I'll be suggesting a really simpler software.

Try Orange. It's simple as it doesn't need coding and extremely flexible.

How should I validate my research model when the previous works use different datasets which have different variables ?

5 Recommendations

Soumya M.D.

asked a question related to Dataset

Question

2 answers

Jul 1, 2022

I am planning to create a predictive model. However my approach uses a dataset which is not similar to prior works. So I can't apply my model on the previous datasets and those models will not work with my dataset. How can I validate my research in such cases?

Relevant answer

Mamta Nair

Jul 6, 2022

Answer

You will have to develop a new model based on new variables then only test it.

Are there any errors in my formula for calculating MSE in R?

0 Recommendations

Jaejun Kim

asked a question related to Dataset

Question

1 answer

Jul 5, 2022

The dataset is cheddar, which you could find it on 'faraway' package.

The random variable Y is 'taste', and X is H2S

I used this formula to calcualte MSE:

m=lm(taste~H2S, data=cheddar)

test.lm <- lm(taste~H2S, data=cheddar)

mean(test.lm$residuals^2)

The result was 109.538, however the right value of the MSE is 10.83^2.

Relevant answer

Anton Vrdoljak

Jul 5, 2022

Answer

Perhaps you may check some of other approaches to compute MSE:

https://statisticsglobe.com/root-mean-squared-error-in-r

Soumyabrata Bhattacharjee

0 Recommendations

asked a question related to Dataset

How much RMSE, when compared to the mean value of the data, is acceptable in timeseries modelling?

Question

4 answers

Jul 3, 2022

Hello all, I have an hourly dataset, whose mean value is at around 30. I could tune the LSTM model to have RMSE = 1.7. Please let me know if it's an acceptable one or should it be further tuned.

Relevant answer

Soumyabrata Bhattacharjee

Jul 5, 2022

Answer

The mean of the dataset is at around 32, and my RMSE is 1.70, is it acceptable then? @Medhat Sir

Can any one recommend a database that is meaninful to mexicans?

0 Recommendations

Eric Dolores

asked a question related to Dataset

Question

6 answers

Jul 4, 2022

I plan to use it in a machine learning class and I want the students to be motivated. Ideally it will be an image dataset.

Relevant answer

Jul 4, 2022

Answer

Dear Eric Dolores ,

Compound databases of natural products have a major impact on drug discovery projects and other areas of research. The number of databases in the public domain with compounds with natural origins is increasing. Several countries, Brazil, France, Panama and, recently, Vietnam, have initiatives in place to construct and maintain compound databases that are representative of their diversity.

Article BIOFACQUIM: A Mexican Compound Database of Natural Products

Regards,

Shafagat

How can i relate two datasets?

4 Recommendations

Samaneh H.Bahreini

asked a question related to Dataset

Question

1 answer

Jun 30, 2022

Hi,

I have two data sets, one consists of make/age/fuel and another one includes make/age/fuel/ engine size but is a very larger dataset, I need to find engine size for cars in the first data set from the second one, what is the faster way in R or excel?

thank you

Relevant answer

David Morse

Jul 2, 2022

Answer

Hello Samaneh,

In R, there are usually multiple ways to accomplish a given task. For your query, consider using the match(a, b) command.

Below is sample R code implementing this with two demonstration data.frames. All variables are treated as strings here, so if your variables include other types, be sure to modify the code accordingly. Any unmatched cases in the original data.frame will have "NA" values for the appended variable.

Good luck with your work.

# pick matching cases across two data.frames, add a variable from

# second data.frame to first data.frame

# for simplicity, all vectors are treated as string variables

# so be sure to modify code if numeric, boolean, or other variable types are used

# sample data.frame to be augmented

df <- data.frame(make=c('Chrysler', 'Fiat', 'Ford', 'Ford', 'Volkswagen'),

year=c('2001', '2010', '2016', '2019', '2020'),

mileage=c('23.1', '33.4', '26.4', '15.5', '28.0'))

# sample of "larger" data.frame from which new variable (displacement) will come

df2 <- data.frame(make=c('Audi', 'Chrysler', 'Dodge', 'Fiat', 'Ford', 'Volkswagen', 'Volvo'),

year=c('2015', '2001', '2007', '2018', '2019', '2020', '2021'),

mileage=c('24.7', '23.1', '18.0', '33.4', '15.5', '28.0', '26.5'),

displacement=c('2.8', '3.6', '5.4', '1.6', '5.0', '2.5', '2.5'))

# show original data.frame

# show data.frame to be searched for the displacement variable

df2

# add new vector to original dataframe, for cases which match

# note that all unmatched instances in original data.frame will have "NA" values

df$litres = df2$displacement[match(paste(df$make, df$year, df$mileage), paste(df2$make, df2$year, df2$mileage))]

# show revised original data.frame

Valentine George Mruma Luvara

3 Recommendations

asked a question related to Dataset

What is the rule for sample size, when validating linear multiple regression model with new data set?

Question

3 answers

Jul 1, 2022

I have developed the multiple regression model and my responses size was 164. Now, I want to validate the model using new data set. Is there any rule for sample size that I can use? One colleague suggested I use 20% of sample used in the study to develop my model. Please, I need advice.

Relevant answer

Samwel Alananga Sanga

Jul 1, 2022

Answer

With smaller samples (depending on what you have as a population) you can try to validate your model with additional data collection especially when you are modelling social phenomenon that are highly fragile thus exhibit unexpected variances. I would not suggest a fixed sample size here but advise that you continue expanding your validation sample until the goodness of fit changes in response to new data is insignificant. Of course this assumes a very good sampling design and strategy.

What are the best data sets availble for biometric research ?

0 Recommendations

Tajinder Kumar Saini

asked a question related to Dataset

Question

4 answers

Jul 1, 2022

Please share

Relevant answer

Tajinder Kumar Saini

Jul 1, 2022

Answer

Thanks

Correspondence analysis and correspondence factor analysis are same?

4 Recommendations

Devanand Maurya

asked a question related to Dataset

Question

2 answers

Jun 28, 2022

Hello,

if both analysis name is same, so please suggest the name of software for performing this analysis and which type dataset is needed.

Thank you

Devanand Maurya

Relevant answer

Josipa Kern

Jun 30, 2022

Answer

Please look at: https://www.google.hr/search?q=Multiple+correspondence+analysis&authuser=0&ei=LrG9Yp6oF4CI9u8Pq8OwsA4&oq=correspondence+analysis+and+factor+analysis&gs_lcp=Cgdnd3Mtd2l6EAEYBDIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwAzIHCAAQRxCwA0oECEEYAEoECEYYAFAAWABgs4UBaAFwAHgAgAEAiAEAkgEAmAEAyAEIwAEB&sclient=gws-wiz

SPSS has correspondence analysis in its manu too.

How to detect blurred images in a dataset?

4 Recommendations

Vimal Raj .M

asked a question related to Dataset

Question

3 answers

Jun 29, 2022

I have one thousand frames converted from a video taken from a particular location. From that dataset, I need to detect the blurred images and segregate that alone, how to do that???

Relevant answer

Vladimír Stenchlák

Jun 30, 2022

Answer

If you don't have any reference image in your dataset, then it is problematic, you are right, you can't use that function. What about to apply FFT to the images? You can have a look on the distribution of low and high frequencies. Low amount of high frequencies can indicate that the image is blurry, but you would need to estimate the right threshold of the "low amount" of the high frequencies. Maybe this approach is worth a try :)

How to proceed with and interpret an analyzed RNA seq data?

0 Recommendations

Shivreet Kaur

asked a question related to Dataset

Question

1 answer

Jun 28, 2022

I have an analyzed RNA seq data set. The analysis part including differential gene expression, clustering analysis and enrichment analysis has been done. I am aware that the bioinformatic part is done and most of the analysis part is also done. Could someone please guide on how to extract the biological relevance from the data set. What should be the starting point for working with this data? Should I start by looking at the differentially expressed genes in different comparisons or start from the cluster analysis and try to look for the genes.

Relevant answer

https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html

Jun 30, 2022

Answer

Dear Shivreet Kaur ,

In recent years, RNA sequencing (in short RNA-Seq) has become a very widely used technology to analyze the continuously changing cellular transcriptome, i.e. the set of all RNA molecules in one cell or a population of cells. One of the most common aims of RNA-Seq is the profiling of gene expression by identifying genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This tutorial demonstrates a computational workflow for the detection of DE genes and pathways from RNA-Seq data by providing a complete analysis of an RNA-Seq experiment profiling Drosophila cells after the depletion of a regulatory gene.

Regards,

Shafagat

Does anyone have a dataset which demonstrates perception of tattoos in each US state or in different countries?

0 Recommendations

Victoria Thompson

asked a question related to Dataset

Question

3 answers

Jun 20, 2022

I am working on a school project and am having trouble finding data to reference. I know I’ve seen similar studies before. Thanks so much!!!

Relevant answer

Ken Drinkwater

Jun 29, 2022

Answer

Hi, I don't have a data set, but an interesting question. Try looking at relevant papers and contacting those researchers, they may share data with you if you collect your own.

An example might be:

Renzoni, A., Pirrea, A., Novello, F., Lepri, A., Cammarata, P., Tarantino, C., ... & Perra, A. (2018). The tattooed population in Italy: a national survey on demography, characteristics and perception of health risks. Annali dell'Istituto superiore di sanita, 54(2), 126-136.

Good luck with your research!

How to compare two datasets of different datatypes?

9 Recommendations

Nonye Celestina Akagbobi

asked a question related to Dataset

Question

9 answers

Jun 27, 2022

who knows how I can compare two datasets (different datatypes) the data types do not have the same parameters.

what is the best way that I can link both data? thanks

Relevant answer

Shima Shafiee

Jun 29, 2022

Answer

You use the terms "compare" and "correlate". Those are two very different activities (one is useful for answering questions like "which is bigger", while the other is useful for answering questions like "does variable A increase as variable B does?"). Without using either term (or any statistical term at all, hopefully), what is it that you're trying to find out

Article CF3: Stata module to compare two datasets

https://towardsdatascience.com/how-to-quickly-compare-data-sets-76a694f6868a

Where can I find Datasets for Reflection-based DDoS attacks on VANET?

1 Recommendation

Samara Mayhoub

asked a question related to Dataset

Question

3 answers

Jun 25, 2022

I need a dataset related to Vehicular ad hoc Networks for Reflection-based DDoS attacks.

Any idea or suggestion is welcomed.

Thanks you!

Relevant answer

Len Leonid Mizrah

Jun 29, 2022

Answer

Dear Samara Mayhoub,

You may find some useful info below:

DDoS Attack Detection in SDN-based VANET Architectures

https://projekter.aau.dk/projekter/files/239545035/Master_Thesis___DDoS_Attack_Detection_in_SDN_based_VANET_Architectures__group_1097.pdf

_____

A Multivariant Stream Analysis Approach to Detect and Mitigate DDoS Attacks in Vehicular Ad Hoc Networks

https://www.hindawi.com/journals/wcmc/2018/2874509/

_____

naveenrj98/Security_Attacks_VANET

http://github.com/naveenrj98/Security_Attacks_VANET?ysclid=l4zelc91hx582576490

_____

Dear Researcher, kindly guide, how can i cluster the questionnaire line items of the large data set. like more than1000 observation?

6 Recommendations

Muhammad Qasim

asked a question related to Dataset

Question

3 answers

Jun 24, 2022

I have 79 final line items of questionnaire now i want to cluster the line items into distinct latent variable. kindly guide me how can i cluster the line items. thanks in anticipation.

Relevant answer

Dylan Molinie

Jun 28, 2022

Answer

The number of observations does not really matter. To regroup your lines as meaningful groups (or 'patterns', which might be seen as distinct variables in a way), you need a metric, a way to measure how two lines differ from one another. For instance, you may consider a score taking into account all the columns of a line; in general, one often uses the Euclidean distance, but it will not be the best in your case. Anyway, the idea remains to find a way to automatically compare two lines: this is where your knowledge in Business and Economics intervene.

Are System GMM results still usable if none of the AR(1) and AR(2) values are produced due to small time-frame?

0 Recommendations

Md. Atiqur Rahman

asked a question related to Dataset

Question

9 answers

Jun 24, 2022

I am using a panel dataset (N=73; T=9). Dataset Timeframe: 2010-2018

In the GMM estimate on the total dataset, the AR(1) and AR(2) values are fine.

But to investigate the impact of the European crisis, I had to split the data (5 Years during and immediately after the crisis, and the subsequent 4 years). But when GMM is run on the second set of data, (2015-2018), in one of the models, AR(1) and AR(2) values were not generated.

Is the result still usable? What are the potential problems of using this specific result?

Relevant answer

Adeyemi Oluwole

Jun 27, 2022

Answer

Check this out: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwiG0-uH4M34AhVoxYUKHcIoD6EQFnoECBkQAQ&url=https%3A%2F%2Fwww.econstor.eu%2Fbitstream%2F10419%2F73774%2F1%2FIfoWorkingPaper-118.pdf&usg=AOvVaw2s7COMAs8GxtjdDw_TJzIt

If I want to use published dataset of a research article, do I need the consent of the authors to use their data or simply use and reference? ?

6 Recommendations

Azka Saleem

asked a question related to Dataset

Question

9 answers

Jun 26, 2022

I welcome Answers and opinions.

Relevant answer

Subir Bandyopadhyay

Jun 27, 2022

Answer

Citing under References is enough but if you want to publish any data table you need to take permission from the publisher

How can we ensure the time prediction for a predictive maintenance algorithm if we have the proper resources and datasets?

18 Recommendations

Tuncay Ercan

asked a question related to Dataset

Question

2 answers

Jun 21, 2022

time predicted from a predictive maintenance algoritm

Relevant answer

Vladimír Stenchlák

Jun 25, 2022

Answer

Hello there! What kind of predictive maintenance algorithm are you using in your application?

How to performed correspondence factor analysis in ethnobotany and which types dataset needed for this analysis?

0 Recommendations

Devanand Maurya

asked a question related to Dataset

Question

6 answers

Jun 18, 2022

Hello,

I needed to understanding this type analysis please suggested name of software and dataset type that is helpful for me doing this analysis.

Thank you

Devanand Maurya

Relevant answer

David L Morgan

Jun 24, 2022

Answer

I can see why that article is no help, because it does not provide an citation for its method. As far as I can tell, "correspondence factor analysis" is simply correspondence analysis.

Why don't you try reposting your question and include correspondence analysis as a search term?

Where can i find different tooth diseases data set for deep learning?

5 Recommendations

Papolu hema Janardhana

asked a question related to Dataset

Question

3 answers

Jun 23, 2022

looking for dataset which can useful for our project.

Relevant answer

Khairul Eahsun Fahim

Jun 24, 2022

Answer

You can try to get similar datasets from UCI machine learning repository or from Kaggle.

What is the difference between PLUM and GENLIN procedure in SPSS (ordinal regression)?

0 Recommendations

Laura C. Salazar

asked a question related to Dataset

Question

5 answers

Jun 21, 2022

Hi All,

I'm carrying out an Ordinal Regression on my dataset. I have continuous predictors (about 12) and an ordinal response variable. When looking into Ordinal Regression in SPSS they have two different procedures to carry this out: PLUM and GENLIN. It is said that GENLIN is better because it is quicker and easier to carry out than PLUM. I wonder if GENLIN has other advantages?

Manyt thanks!

Laura

Relevant answer

Bruce Weaver

Jun 24, 2022

Answer

This page includes relevant info.

https://www.theanalysisfactor.com/spss-procedures-logistic-regression/

Machine Learning: Why when i have a subset of my dataset with similar characteristics the model performs worst?

4 Recommendations

Caty Gonçalves

asked a question related to Dataset

Question

3 answers

Jun 18, 2022

I am developing a ML model where i am working with medical specialties.

My dataset contains different specialties (General surgery, urology,etc...) . First I build a model with all my data with the specialties mixed, and then i apply the same ML algorithm at a specialty level. I achieve better evaluation metrics for all the Specialty models except for one specialty can someone help understand the reason?

Relevant answer

Behzad Shomali

Jun 24, 2022

Answer

Dear Alice,

as far as I know, there is mostly more than one image from each patient in medical datasets. Regarding that, while splitting your dataset or working with a small subset of the whole dataset, you should pay attention to samples to be such that:

No two images (scans) of the same patient are presented in the train and test set simultaneously
[Softly] keep the ratio of classes

So in your case, there is a chance that the subset you've chosen and trained your model on, doesn't contain enough number of samples of that specific specialty.

I hope it helps you!

How to perform Principal component analysis on RNA seq data?

2 Recommendations

Nisha Gautam

asked a question related to Dataset

Question

4 answers

Jun 22, 2022

Hello everyone!

I have RNA seq dataset for two groups knockout and wild types of mice samples. I have the normalized values in terms of quant all datasets. Please guide me how to perform PCA on the normalized values. I am not a bioinformatician, kindly suggest non-coding methods.

Thanks in advance!

Relevant answer

Jasmine Gallegos

Jun 22, 2022

Answer

Hi! I recommend using LatchBio for RNA seq data. I've used it several times, and it is super easy to use since it has a non-coding interface.

Hope that helps!

https://latch.bio/

Nail tracking/segmentation/detection dataset ?

5 Recommendations

Evelina Alexiutenko

asked a question related to Dataset

Question

3 answers

Jun 20, 2022

Hello everyone! I am in search of a suitable dataset for the nail tracking application. I need a dataset, I found someone, but I want the images to be more variable. Please If you have one, response to my ask.

Relevant answer

https://github.com/toddwyl/nailtracking

Jun 22, 2022

Answer

Dear Evelina Alexiutenko ,

Look the link, maybe useful.

Regards,

Shafagat

Which technique can be used when T=4 and N=29?

0 Recommendations

Cemre Nur Çetin

asked a question related to Dataset

Question

6 answers

Jun 16, 2022

Hi everyone,

I have panel dataset of 4 periods and 29 countries. Which method/technique can be suitable for my dataset?

Thanks for your answer in advance.

Relevant answer

Cemre Nur Çetin

Jun 21, 2022

Answer

Kelvyn Jones I really appreciate you taking the time. You've broadened my horizons, sir. I am going to start working out these details. Thank you so much, again.

Wanted to know if any online datasets are available containing human faces wearing different types of headwear. ?

2 Recommendations

Syed Ishtiyaq Ahmed

asked a question related to Dataset

Question

1 answer

Jun 20, 2022

I am working on a object recognition model that is able to detect whether the person in the picture is wearing any type of headwear or not (hat/cap/helmet/scarf/raincoat etc etc) . I am unable to find any large publicly available datasets of this kind. As of now I am resorting to writing scripts that scrape images from the web of people wearing hats/caps etc using bing image downloader API and Google image downloader API . Please let me know of any publicly available datasets of this kind. Thank you.

Relevant answer

Joan Torres

Jun 20, 2022

Answer

Check this Dataset, could be useful:

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

Can text classification models do well on very small datasets?

0 Recommendations

Mehdi Deb

asked a question related to Dataset

Question

8 answers

Apr 8, 2022

Hello everyone

I am a student working on a project about the prediction of the privacy policy applied to textual posts on facebook. The objective is to predict for a post of a specific user if they would share it with the public, their friends or some more specific audience.

In terms of previous works on the subject, I found some articles that do this:

Paper 1 : https://dl.acm.org/doi/pdf/10.1145/2517312.2517317

Paper 2 : https://books.google.fr/books?id=ZGBwDwAAQBAJ&pg=PA578&lpg=PA578&dq=%22A+privacy+settings+prediction+model+for+textual+posts+on+social+networks%22&source=bl&ots=3WkCYkl_XA&sig=ACfU3U3R-a8VjE1q1uLaIFThHhgIR3qw_Q&hl=fr&sa=X&ved=2ahUKEwi22cDl2oT3AhWEIMUKHQfdBEMQ6AF6BAgIEAM#v=onepage&q=%22A%20privacy%20settings%20prediction%20model%20for%20textual%20posts%20on%20social%20networks%22&f=false

Paper 3 : http://www.l3s.de/~herder/research/papers/2015/analyzing_and_predicting_privacy_settings.pdf

Paper 4 : https://www.scitepress.org/papers/2016/56897/56897.pdf (Section 4.2)

The first and second papers (and to a lesser extent, the fourth one too) use a model that is trained only on the data of the user, which gives them a model specific to that user, as opposed to the methodology in the third paper. The first one uses only 20 posts, and for the second the precise number is unknown (maybe more that 60)

My first question is: isn’t it too small of a dataset to train a text classification model ? The tf-idf vectors used would have high dimensions and the number of words that appear in multiple posts would be small.

I tried to replicate their results with some data collected thanks to some friends (asked 7 persons to label 20 different posts each), and using any model with tf-idf seem to give pretty bad results (they just act like dummy classifiers and predict the majority class)

I tried adding a small number of features next to the tf-idf vector like the length of the post or its positivity/negativity/objectivity score obtained with a sentiment analysis tool, but it doesnt seem to affect the model at all.

The first paper got a high accuracy with only 20 posts per user (it could maybe be due to the fact that the majority class had a high ratio of more that 70% ?), while the fourth one couldnt get past 65% with a much bigger dataset.

My second question is: am I missing something ? What do you think of the feasibility of the approach (using a very small dataset), and how to improve the results ?

Relevant answer

Rajat Tandon

Jun 19, 2022

Answer

An interesting related work that uses BERT.

I know what you did on Venmo

https://petsymposium.org/2022/files/papers/issue3/popets-2022-0069.pdf

How to add noise in an EEG recording?

5 Recommendations

Eleonora Adelina Dănilă

asked a question related to Dataset

Question

4 answers

Jun 1, 2022

I have two datasets (.edf) of EEG recordings, one for healthy people, one for depressive people.

Each of the recording has 20 channels. So far I opened the data in matlab with edfread() as a timetable.

How can I add a white noise in that timetable?

Relevant answer

Guy Meyer

Jun 19, 2022

Answer

Artificial Intelligence

answer would work. But it might not generate what you need. Have you considered including noises that are true to the body, like bodily function, sounds and surrounding RF? Eleonora Adelina Dănilă

Partial discharge (PD) datasets

7 Recommendations

Md Anisur Rahman

asked a question related to Dataset

Question

29 answers

Feb 27, 2020

Dear All,

I am looking for some Partial discharge (PD) datasets to download. I would appreciate if you can mention some data sources from where I can download the PD datasets.

Regards,

Anis

Relevant answer

Sorokhaibam Nilakanta Meitei

Jun 19, 2022

Answer

Please share with me: soronilameitei@gmail.com

Regression and Classification model for the same dataset in Python

6 Recommendations

Abdur Rasheed

asked a question related to Dataset

Question

6 answers

Jun 12, 2022

Hello Friends,

I am applying ML algorithms (DT, RF, ANN, SVM, KNN, etc) in python to my dataset which has features and target variables as continuous data. For example, when I'm using DecisonTreeRegressor I get the r_2 square equal to 0.977. However, I'm interested to deploy the classification metrics like confusion matrix, accuracy score, etc. For this, I converted the continuous target values into categorical ones. Now when I'm applying the DecisionTreeClassifier, I get the accuracy square=1.0 which I think is overfitting. Then I applied the normality checks, and correlation techniques (spearman) but the accuracy remains the same.

My question is am I right to convert numeric data into categorical one?

Secondly, if both regressor and classifiers are used for the same dataset, will the accuracy be changed?

Need your valuable suggestions, please.

For details plz see the attached files.

Thanks for the time

Relevant answer

Gérard Dreyfus

Jun 18, 2022

Answer

I think there are two misconceptions here.

1) There is no reason to expect similar accuracies for regression and classification on the same data set. Turning a regression problem into a classification problem is tricky, and essentially pointless.

2) r_2 is definitely not a valid index of the quality of a regression model. Imagine a model that systematically gives a prediction equal to 10 times the observation. The r_2 of the model will be equal to 1, although the model is obviously very poor. For regression, the most useful quality index is the root mean squared error, computed on a test set, i.e. on data that have never been used for designing the model, neither for training nor for model selection.

Publish a standard arabic speech emotion dataset?

9 Recommendations

Belhadj Mourad

asked a question related to Dataset

Question

2 answers

May 24, 2022

We are building an Arabic speech emotion dataset with 508 recorded persons, and every person recorded ten exact phrases divided into five emotions. The WAV files are noise-free and will be converted to MFCC and LDD features.

The validation process is in progress manually by a team of neuro-linguistics and psychologists. The dataset will be a public free access dataset.

What is the process of publishing this dataset?

What is the best journal to publish in?

Relevant answer

Belhadj Mourad

Jun 17, 2022

Answer

Ángel Carrión-Tavárez thank you for your reply.

Is there a dataset from surveillance cameras for detecting violence/fight ?

0 Recommendations

Alex Yang

asked a question related to Dataset

Question

1 answer

Jun 16, 2022

I am looking for a publically available video/image dataset from surveillance cameras, which is used to detection if there is violence like fight happened. Also,a dataset from surveillance cameras contains such class also helpful. I have collected UCF-crime, NTU-CCTV dataset, I want to know how to get more datasets like these or how to collect such videos by myself.

Relevant answer

Faruk Hadžić

Jun 17, 2022

Answer

https://www.abtosoftware.com/blog/violence-detection

https://arxiv.org/pdf/2002.04355.pdf

Understanding datasets analysis in SPSS and R

4 Recommendations

Aziz Ur Rehman

asked a question related to Dataset

Question

4 answers

Jun 17, 2022

Hello Seniors I hope you are doing well

Recently I've read some very good research articles. In those articles datasets were taken from V-Dem, Polity and Freedom House. Though they have shared the link of supplementary datasets and the process of how they analyzed these datasets in SPSS or R in brief but I couldn't understand and replicate these findings. It may be because I am not very good at quantitative data analysis.

So I want to know how could I better understand this Datasets analysis easily like V-Dem etc. Is there any good course online, lectures or conference video etc. Or good book?

Article links

1. https://www.tandfonline.com/doi/full/10.1080/13510347.2021.1994552

2. https://journals.sagepub.com/doi/abs/10.1177/0010414016688009

3. https://journals.sagepub.com/doi/abs/10.1177/0022002720957064

4. https://www.cambridge.org/core/journals/american-political-science-review/article/abs/democratic-subversion-elite-cooptation-and-opposition-fragmentation/24CA64BE0EFA4601D00AFC899CBAF044

5. https://www.tandfonline.com/doi/full/10.1080/13510347.2021.2013822

Any help would be appreciated.

Thanks in anticipation.

Relevant answer

Proloy Barua

Jun 17, 2022

Answer

Dear Aziz Ur Rehman ,

Please find some online course for learning R on Edx and Coursera platforms.

Thanks ~PB

Screenshot (91
9).png
220.57 KB
Screenshot (92
0).png
231.51 KB

Hi, I have a large dataset with intraday price and volume. How can I divide the intraday turnover values into 3 grps - large, medium, and small in R?

22 Recommendations

Karthik Natashekara

asked a question related to Dataset

Question

1 answer

Jun 12, 2022

Date Price Volume Turnover

1/1/22 10 12 120

1/1/22 11 10 110

1/1/22 13 20 260

1/1/22 12 15 180

1/1/22 10 13 130

1/1/22 9 9 81

Once I sort turnover in ascending order for every day, I hv 81, 110, 120, 130, 180, 260. Now I need to create categorical variable 1 for 81 and 110, cat variable 2 for 120, and 130, and cat var 3 for 180, and 260. My dataset has many year data and for each day there are thousands of transactions.

Relevant answer

Camilo R. Contreras

Jun 17, 2022

Answer

Hi Karthik Natashekara

I've been working with your example data.

Here is the code to add the new variable info

# data.frame is the variable name of your data

# count the nb columns to add the new variable

ncolData = ncol(data.frame)

for(i in 1:nrow(data.frame)) {

#data.frame[i,4] is the column Turnover

value =as.character(data.frame[i,4])

# change data.frame[i,ncolData+1] for the desired column

switch(value,

"81"={data.frame[i,ncolData+1]=1},

"110"={data.frame[i,ncolData+1]=2},

"120"={data.frame[i,ncolData+1]=3},

"130"={data.frame[i,ncolData+1]=4},

"180"={data.frame[i,ncolData+1]=5},

"260"={data.frame[i,ncolData+1]=6},

{print('nothing')}

)

}

Regards,

Where I can find the soil dataset for different states in India?

0 Recommendations

Manoj Taleka

asked a question related to Dataset

Question

8 answers

Jun 16, 2022

I am looking for Nitrogen, Potassium, Phosphorus, Organic carbon stock , pH value, Added Nutrients features in soil dataset.

Kindly suggest me the relevant data source.

Relevant answer

J. C. Tarafdar

Jun 16, 2022

Answer

@ Manoj, you may get it from Soil & Land Use Survey of India (Govt. of India). You can also download from FAO Soils Portal, GloSIS Global (Beta), European digital archive on soil maps (EuDASM), International Council for Science (ICSU) World Data System and GEOSS (Global Earth Observation System of Systems) portal.

How to apply statistical test on western blot data?

10 Recommendations

Nisha Gautam

asked a question related to Dataset

Question

1 answer

Jun 14, 2022

I have western blot data set for 3 experimental and three control samples for one housekeeping protein (actin) and three target proteins. Kindly guide me in details what type of stat I should apply. And how we normalize the data.

Thanks in Advance!

Relevant answer

Georgios Kalampounias

Jun 15, 2022

Answer

Hi Nisha,

In order to normalize the data, firstly you will have to scan them (as image files) and then use a software like ImageJ (it's a free tool) to automatically detect and measure signal intensity and band size.

The normalization process can be done by following this procedure:

https://www.google.gr/url?sa=t&source=web&rct=j&url=https://assets.thermofisher.com/TFS-Assets/BID/Technical-Notes/ibright-normalization-western-blotting-relative-quantitation-technical-note.pdf&ved=2ahUKEwi8sKed0q74AhVCR_EDHfKhCXIQFnoECCsQAQ&usg=AOvVaw2MY0IpUhtC4th_cjwR-nKk

Now, the statistical analysis part can be done using ANOVA. You can find excellent guides anywhere on the internet both for using ImageJ and performing the ANOVA test.

Hope this helps you