# Deep Feed Forward Neural Networks with TensorFlow

## October 22, 2018

The deep feed forward neural network is similar to the softmax regression model but is composed of more hidden layers. Deep feed forward networks have the following characteristics:

• Perceptrons are arranged in layers, with the first layer taking in inputs and the last layer producing outputs. The middle layers have no connection with the external world, and hence are called hidden layers.
• Each perceptron in one layer is connected to every perceptron on the next layer. Hence information is constantly “fed forward” from one layer to the next, and this explains why these networks are called feed-forward networks.
• There is no connection among perceptrons in the same layer.

Now that we understand what a deep feed forward neural network is, let’s import TensorFlow and our MNIST data set.

Now we need to define the number of layers and the number of hidden nodes within each hidden layer.

You can have as many hidden nodes in each layer as you want, and they don’t have to be the same. So the first layer could have 800, the second could have a 1000 and so on. Now we need to define the number of classes we have and the batch size that we want to train our neural network with for every epoch.

The number of classes we have is 10 since we have numbers from 0 to 9. Our batch size is 100, you can change this to your own preference as well. Now we need to reserve a place in our code for the flattened MNIST images and the labels.

x is a placeholder for the flattened images of height None and width 784 since the images are 28 pixels by 28 pixels. y is a placeholder for our labels. Now let’s define the weights and biases for our hidden layers.

The weights are tf.Variable whose value is a tf.random_normal of shape [input_node_size, output_node_size]. biases are also tf.Variable whose value is a tf.random_normal of shape [number_of_nodes_in_hidden_layer]. Now let us define each of our layers.

For each of our layers, we multiply the incoming Tensor with the weights of the hidden layer and then add the biases of the hidden layer to it. Then we pass the output of each hidden layer to a tf.nn.relu activation function which calculates the rectified linear for each layer. As with all of our earlier models, we need to train our model and calculate loss.

We have already discussed the scope and usage of every single line of the function defined above, so there is no need to go through them individually. Running the above model should give you an accuracy of about 95%, reasonably good but not good enough.