Open In Colab

Our first fully connected neural network in TensorFlow/KerasΒΆ

This example notebook provides a small example how to implement and train a fully connected neural network via TensoFlow/Keras on the MNIST handwritten digits dataset.

In [1]:
%tensorflow_version 2.x
In [2]:
import numpy as np
import matplotlib.pyplot as plt

from tensorflow import keras

%matplotlib inline

Load MNIST data, check its dimensions and let's look at a few random examplesΒΆ

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
In [4]:
x_train.shape, y_train.shape, x_test.shape, y_test.shape
Out[4]:
((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))
In [5]:
def show_train_imgs(n=8, m=5):
    for i in range(m):
        for j in range(n):
            idx = np.random.randint(len(y_train))
            plt.subplot(int('1' + str(n) + str(j+1)))
            plt.imshow(x_train[idx], cmap='gray')
            plt.title(y_train[idx], fontsize=30)
            plt.axis('off')
        plt.show()
In [6]:
plt.rcParams['figure.figsize'] = (15, 5)
show_train_imgs(8)
In [7]:
x_train.min(), x_train.max()
Out[7]:
(0, 255)

Normalize data & reshape to a 1D array instead of 2D matrix.ΒΆ

In [8]:
x_train = x_train.reshape(60000, 28*28)/255
x_test = x_test.reshape(10000, 28*28)/255

x_train.shape, x_test.shape, x_train.min(), x_train.max()
Out[8]:
((60000, 784), (10000, 784), 0.0, 1.0)
In [9]:
y_train[:5]
Out[9]:
array([5, 0, 4, 1, 9], dtype=uint8)

Conversion of the labels to one-hot encoded labelsΒΆ

In [10]:
y_train_oh = keras.utils.to_categorical(y_train)
y_test_oh = keras.utils.to_categorical(y_test)
y_train_oh[:5]
Out[10]:
array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)

We will use the so called Sequential API.ΒΆ

This API let's us to build neural networks with the following limitation: each layer's input is the output of the previous layer to build more flexible neural networks we will use the Functional APIΒΆ

The Sequential API...

  • builds up layer-by-layer
  • you can pass activation functions as an argument to most of the layers
  • or you can create activation layer
  • .summary()
  • model needs to be compiled before training
    • you need to set the loss function
    • optimizer
    • metrics
    • any callbacks (functions to run after/before epochs, batches, etc)
  • after compiling you may train your model
    • #epochs
    • batch size
    • train data
    • validation data can be provided
  • you can also generate predictions with a trained model
In [11]:
model = keras.Sequential()
model.add(keras.layers.Dense(784, activation='relu', input_dim=784))
model.add(keras.layers.Dense(512, activation='relu'))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
In [12]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 784)               615440    
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_3 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,182,874
Trainable params: 1,182,874
Non-trainable params: 0
_________________________________________________________________
In [13]:
784*784+784, 784*512+512, 512*256+256, 256*128+128, 128*10+10
Out[13]:
(615440, 401920, 131328, 32896, 1290)
In [14]:
model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.SGD(lr=1e-2), metrics=['accuracy'])

With GPU 1 epoch ~3s.ΒΆ

If you feel your model is much slower activate GPU on Google Colab via Runtime β†’ Change runtime type β†’ Hardware acceleraton β†’ GPU During training the most importrant summary is shown. You can also save trianing history.

In [15]:
history = model.fit(x=x_train, y=y_train_oh, batch_size=64, epochs=15, validation_data=(x_test, y_test_oh))
Epoch 1/15
938/938 [==============================] - 3s 3ms/step - loss: 0.7563 - accuracy: 0.8127 - val_loss: 0.3102 - val_accuracy: 0.9127
Epoch 2/15
938/938 [==============================] - 3s 3ms/step - loss: 0.2813 - accuracy: 0.9187 - val_loss: 0.2352 - val_accuracy: 0.9310
Epoch 3/15
938/938 [==============================] - 3s 3ms/step - loss: 0.2201 - accuracy: 0.9361 - val_loss: 0.1909 - val_accuracy: 0.9423
Epoch 4/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1820 - accuracy: 0.9474 - val_loss: 0.1770 - val_accuracy: 0.9488
Epoch 5/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1552 - accuracy: 0.9545 - val_loss: 0.1462 - val_accuracy: 0.9556
Epoch 6/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1343 - accuracy: 0.9609 - val_loss: 0.1324 - val_accuracy: 0.9609
Epoch 7/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1185 - accuracy: 0.9656 - val_loss: 0.1183 - val_accuracy: 0.9644
Epoch 8/15
938/938 [==============================] - 3s 3ms/step - loss: 0.1046 - accuracy: 0.9694 - val_loss: 0.1085 - val_accuracy: 0.9671
Epoch 9/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0936 - accuracy: 0.9731 - val_loss: 0.1078 - val_accuracy: 0.9691
Epoch 10/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0841 - accuracy: 0.9759 - val_loss: 0.1029 - val_accuracy: 0.9672
Epoch 11/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0762 - accuracy: 0.9782 - val_loss: 0.1005 - val_accuracy: 0.9678
Epoch 12/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0686 - accuracy: 0.9810 - val_loss: 0.0888 - val_accuracy: 0.9727
Epoch 13/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0622 - accuracy: 0.9820 - val_loss: 0.0877 - val_accuracy: 0.9728
Epoch 14/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0565 - accuracy: 0.9839 - val_loss: 0.0841 - val_accuracy: 0.9735
Epoch 15/15
938/938 [==============================] - 3s 3ms/step - loss: 0.0514 - accuracy: 0.9859 - val_loss: 0.0782 - val_accuracy: 0.9755
In [16]:
plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='val loss')
plt.xlabel('epochs', fontsize=15)
plt.legend(fontsize=20)
plt.show()
plt.plot(history.history['accuracy'], label='train accuracy')
plt.plot(history.history['val_accuracy'], label='val accuracy')
plt.xlabel('epochs', fontsize=15)
plt.legend(fontsize=20)
plt.show()

>97%, not too bad, but why not 100%?ΒΆ

Let's check the predictions, where the model goes wrong. Errorneous predictions are highlighted with a red dot. Also, from the learning curves above we can see, that the model is still not fully trained, the results are still improving.

In [17]:
def show_predictions(n=5, m=5):
    for j in range(m):
        idx_start = np.random.randint(len(x_test) - n)
        preds = model.predict(x_test[idx_start:idx_start+5])
        true_labels = y_test[idx_start:idx_start+5]

        for i in range(n):
            plt.subplot(int('1' + str(n) + str(i+1)))
            predstr = 'pred: ' + str(preds[i].argmax()) + ', prob: ' + str(int(np.round(preds[i].max()*100,0))) + '%'
            plt.title(predstr + ' / true: ' + str(true_labels[i]),fontsize=10)
            plt.imshow(x_test[idx_start+i].reshape(28, 28)*255, cmap='gray')
            if(preds[i].argmax() != true_labels[i]):
                plt.scatter([14], [14], s=500, c='r')
            plt.axis('off')
        plt.show()
In [18]:
show_predictions(m=20)

What accuracy can you achieve with a 5-layer fully connected neural network?ΒΆ