We know that no matter what learning algorithm is used, no matter what the initial weights are, and no matter what convergence parameters are used, a single layer perceptron cannot be trained to do an XOR problem. However, a two-layer perceptron with a sufficient number of hidden units is capable of doing such a problem. This paper emphasizes merely the architecture and number of hidden cells
... [Show full abstract] required to decide whether it will be successful in supervised learning, i.e. if they are m-bit training examples, the number of hidden cells required for successful learning is 2 — m — 1.