For complex networks it is currently difficult to analyze, but for shallow interpretable networks this problem can be addressed, which can be referred to in my article “Training neural networks for solving 1-D optimal piecewise linear approximation”
As far as I can imagine, it might be difficult to judge the maximum number of layers for a task without a proper ablation study. If the data is linearly separable then you don't need any hidden layers at all. If data is less complex and is having fewer dimensions or features then neural networks with 1 to 2 hidden layers would work. If data is having large dimensions or features then to get an optimum solution, 3 to 5 hidden layers can be used. It should be kept in mind that increasing hidden layers would also increase the complexity of the model and choosing hidden layers such as 8, 9, or in two digits may sometimes lead to overfitting. Please have a look at these links for further information:
Najla Matti Isaacc A deep network can have an infinite number of levels. You have the option of increasing the number of layers as many times as you desire.
In addition, go through these links, it might help further:
You only have hardware limitations and theoretically, if you do not consider the practicality or the problems that arise with the increase of model layers(Vanishing gradients, etc.), you can go as deep as you want in your hidden layers.
Earlier versions of neural networks such as the first perceptrons were shallow, composed of one input and one output layer, and at most one hidden layer in between. More than three layers (including input and output) qualify as “deep” learning.
The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.
A hidden layer in an artificial neural network is a layer in between input layers and output layers, where artificial neurons take in a set of weighted inputs and produce an output through an activation function.
My response to your three questions at this point:
"It depends."
Cheers,
Bill
p.s. To those who may think mine is a non-response, I have learned over time that before announcing "I got it," one needs to know more about each circumstance intended by the questions, in the mind of the person asking.
It's depends on your working application and hardware resources. In mostly , traditional artificial neural network have one to three hidden later for simple task use(shallow network ), but compromise with performance and other hand more deeper and depth network (i.e. CNN architectures) has more parameters that gives better performance than traditional. So, it's depends on what you want to do and what you wanted to get result. Hidden layer play important role to learning important feature and they extract feature to give output. So, you should keep in mind what types of data input you using and how many training sample instances you have. They impact your network overall out performance bcz more complexity means more hidden layer and they turning to overfit the model and if you have less layers as simple like you face the underfitting problem. I think you play with your hidden layer and check the performance of trained model.
It depends on the distribution of training data. If the distribution of your data is complex, that means it has lot of information and learning the non-linearity function to separate the classes is hard.
In that case you can have proportionally complex network having more number of layers to learn from the information within the data.
In my opinion, you should first consider some relevant papers as your benchmark models and try to develop and improve their performance because building a deep model from scratch is quite hard due to the different hyperparameters that you should set on your own.
There is a tool called "Keras Tuner" that you can use to leverage different numbers of hidden layers and their impact on the training procedure.
In general, the number of hidden layers depends on the amount of training data that you have and the complexity of your problem. So I think the related published papers can guide you in this regard.
No one can be specific about that because it depends on the steps you break your algorithm down to . I mean the complexity. You might want to handle each of the steps with different hidden layers
For complex networks it is currently difficult to analyze, but for shallow interpretable networks this problem can be addressed, which can be referred to in my article “Training neural networks for solving 1-D optimal piecewise linear approximation”