Question
Asked 26 May 2015

How to code categorical inputs for a neural network?

I am having many categorical variables as inputs to a neural network (eg: month (12 categories), day (seven categories)) I have coded them into binary values when using a NARX network to predict electricity demand. Please advice me on this.
Thank you in advance.

Most recent answer

I use:
for month  JANUARY-1.........DECEMBER-12
for week  MONDAY-1........SUNDAY-7
for weekend - 1
for workday- 2
for hour  1-24
1 Recommendation

Popular answers (1)

Not sure if NARX network differ to the common approach, but generally there are 2 ways to do this: use one input for each category and scale the integer values, e.g. (0,...,11) for month to continuous values in the range of other input variables. However, this approach would assume that you have some hierarchy in the categories, let's say February is 'better' or 'higher' than January. Since this is not the case, you would need to create one input node for each possible realization of a category, i.e. 12 input variables for the month where all are set to '0' except the category that is active, which is set to '1'. I would not advice to encode the integer categorical variables into binary values, and thus reduce the number of required input nodes. You would end up with a correlation bias among categories that have the value set to '1' at the same time, e.g. for (cold, mild, hot) = ([0,1], [1,0], [1,1]), the encoding for 'hot' would mean an artificial similarity to 'cold' and 'mild', since they share the same bit.  
6 Recommendations

All Answers (8)

Not sure if NARX network differ to the common approach, but generally there are 2 ways to do this: use one input for each category and scale the integer values, e.g. (0,...,11) for month to continuous values in the range of other input variables. However, this approach would assume that you have some hierarchy in the categories, let's say February is 'better' or 'higher' than January. Since this is not the case, you would need to create one input node for each possible realization of a category, i.e. 12 input variables for the month where all are set to '0' except the category that is active, which is set to '1'. I would not advice to encode the integer categorical variables into binary values, and thus reduce the number of required input nodes. You would end up with a correlation bias among categories that have the value set to '1' at the same time, e.g. for (cold, mild, hot) = ([0,1], [1,0], [1,1]), the encoding for 'hot' would mean an artificial similarity to 'cold' and 'mild', since they share the same bit.  
6 Recommendations
Deshani K.A.D.
University of Colombo
Thank you very much for your reply. Since there is no order, I have to use 12 inputs for the month variable as January, February, ...., December having 1's only at particular month and 0 otherwise.
Deshani K.A.D.
University of Colombo
Thank you very much Mahboobeh.
Bojan Ploj
College of Ptuj
You can use one bit on the input of the NN for each category.
This could be useful to you.
I use:
for month  JANUARY-1.........DECEMBER-12
for week  MONDAY-1........SUNDAY-7
for weekend - 1
for workday- 2
for hour  1-24
1 Recommendation

Similar questions and discussions

Related Publications

Article
Full-text available
Resumo. Este artigo descreve o simulador LegGen, que realiza a configuração automática do caminhar em robôs simulados dotados de pernas. No simulador LegGen, algoritmos genéticos são utilizados para a evolução dos parâmetros do caminhar em robôs móveis simulados através da biblioteca de simulação ba-seada em física Open Dynamics Engine (ODE). Diver...
Got a technical question?
Get high-quality answers from experts.