Question
Asked 26 May 2015
How to code categorical inputs for a neural network?
I am having many categorical variables as inputs to a neural network (eg: month (12 categories), day (seven categories)) I have coded them into binary values when using a NARX network to predict electricity demand. Please advice me on this.
Thank you in advance.
Most recent answer

I use:
for month JANUARY-1.........DECEMBER-12
for week MONDAY-1........SUNDAY-7
for weekend - 1
for workday- 2
for hour 1-24
1 Recommendation
Popular answers (1)
Not sure if NARX network differ to the common approach, but generally there are 2 ways to do this: use one input for each category and scale the integer values, e.g. (0,...,11) for month to continuous values in the range of other input variables. However, this approach would assume that you have some hierarchy in the categories, let's say February is 'better' or 'higher' than January. Since this is not the case, you would need to create one input node for each possible realization of a category, i.e. 12 input variables for the month where all are set to '0' except the category that is active, which is set to '1'. I would not advice to encode the integer categorical variables into binary values, and thus reduce the number of required input nodes. You would end up with a correlation bias among categories that have the value set to '1' at the same time, e.g. for (cold, mild, hot) = ([0,1], [1,0], [1,1]), the encoding for 'hot' would mean an artificial similarity to 'cold' and 'mild', since they share the same bit.
6 Recommendations
All Answers (8)
Not sure if NARX network differ to the common approach, but generally there are 2 ways to do this: use one input for each category and scale the integer values, e.g. (0,...,11) for month to continuous values in the range of other input variables. However, this approach would assume that you have some hierarchy in the categories, let's say February is 'better' or 'higher' than January. Since this is not the case, you would need to create one input node for each possible realization of a category, i.e. 12 input variables for the month where all are set to '0' except the category that is active, which is set to '1'. I would not advice to encode the integer categorical variables into binary values, and thus reduce the number of required input nodes. You would end up with a correlation bias among categories that have the value set to '1' at the same time, e.g. for (cold, mild, hot) = ([0,1], [1,0], [1,1]), the encoding for 'hot' would mean an artificial similarity to 'cold' and 'mild', since they share the same bit.
6 Recommendations
University of Colombo
Thank you very much for your reply. Since there is no order, I have to use 12 inputs for the month variable as January, February, ...., December having 1's only at particular month and 0 otherwise.
Similar questions and discussions
Related Publications
Resumo. Este artigo descreve o simulador LegGen, que realiza a configuração automática do caminhar em robôs simulados dotados de pernas. No simulador LegGen, algoritmos genéticos são utilizados para a evolução dos parâmetros do caminhar em robôs móveis simulados através da biblioteca de simulação ba-seada em física Open Dynamics Engine (ODE). Diver...