Question
Asked 3rd Dec, 2012

Whats the difference between training set and test set?

In chemoinformatics field?

Most recent answer

12th Jun, 2019
Sargol Mazraedoost
Pukyong National University
Hello
In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test(validation) set.
Check in these links. It might help you.

Popular Answers (1)

3rd Dec, 2012
Ping-Chang Lin
University of Chicago
In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually a dataset is divided into a training set, a validation set (some people use 'test set' instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.
10 Recommendations

All Answers (7)

3rd Dec, 2012
Ping-Chang Lin
University of Chicago
In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually a dataset is divided into a training set, a validation set (some people use 'test set' instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.
10 Recommendations
3rd Dec, 2012
George Fitzgerald
Universal Display Corporation
Just to add a bit to Ping-Chang's answer. The training set can be selected by applying a random filter to the data, e.g., select 20% of the points at random to generate the model and test against the remaining 80%. If you want to be especially careful, then do this multiple times, i.e., select different random training sets and compare the models. If you get similar models then your model has probably captured the essential chemistry and physics of the problem. If the models are very different, then you are just fitting equations without a good physical basis
8 Recommendations
21st Aug, 2015
Netra Pal Singh
MVN University
in my view we need all three set of data. All three are different. We need to have all three depending which algorithm we are using.
31st Aug, 2017
Ibrahim Farhani
Golestan University of Medical Sciences
Hi to researchers.
in data mining, the data divided into training set (most of data) and testing set (smaller portion) after the model processed by using the training set, you test the model by making prediction against the test set based on the value that determined for training set. if your division set are n number this process repeated for n times, for example if your data divided into four training set and one testing set, the process repeated for five times alternatively. now its easy to know whether your model guesses are correct or no!
thanks for your question.
2 Recommendations
14th Oct, 2018
Huynh Vuong Thu Minh
Can Tho University
Hi Netra Pal Singh,
You mean that data will be divided into 3 parts in which one of them was used for calibration? I am confuse depending on different algorithm or purpose? thank you
31st May, 2019
Engr. Omit Debnath
Southwest University of Science and Technology
1. Training Set is a subset to train a model & Test Set is a subset to test the trained model.
2. The Training Set is an initial set of data used to help a program understand how to apply technologies like neural networks to learn and produce sophisticated results.
Training Set is also known as a Training Data, Training Dataset or Learning Set.
Again,
The Test Set is a secondary (or tertiary) data set that is used to test a machine learning program after it has been trained on an initial training data set.
Test Set is also known as a Test Dataset or Test Data.

Similar questions and discussions

Can anybody give me some suggestions of HREM problem in GROMACS4.6.5?
Question
Be the first to answer
  • Shuangyan ZhouShuangyan Zhou
I am trying to implement a Replica Exchange with Solute Tempering (REST) in GROMACS4.6.5, and from what I understand as well as what I searched in the gmx-users,I think REST is a kind of Hamiltonian REMD with different lambda value for different replicas,so i implemented REST as follows:
2.Using a script to do the appropriate solute parameter scaling for each replica.
3.Using grompp normally on the new set of .top files to generate a set of .tpr files that differ not only in lambda but also in their solute parameters.
I did a short test with alanine dipeptide using 3 replicas,However,from the md.log file,I found that the term "Repl 0 <-> 1 dE = 0.000e+00" all with the value of 0,and the exchange probability of two neighboring replicas is 1.0,It sames that the energy of all replicas were equal to each other so they could exchange so fast.then I calculated the energy of three replicas,but It showed that they had different energy indeed.I was so confused,Could anyone give me some suggestion about this problem? Thanks very much.

Related Publications

Got a technical question?
Get high-quality answers from experts.