Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)

Objective

  • Implement ResNet from scratch

  • using Tensorflow and Keras

  • train on CPU then switch to GPU to compare speed

If you want to jump right to using a ResNet, have a look at Keras' pre-trained models. In this repo I am implementing a 50-layer ResNet from scratch not out of need, as implementations already exist, but as a learning process.

See project repository on GitHub.

Packages used

  • python 3.7.9

  • tensorflow 2.7.0 (includes keras)

  • scikit-learn 0.24.1

  • numpy 3.7.9

  • pillow 8.2.0

  • opencv-python 4.4.0.46

GPU support

The following NVIDIA software must be installed on your system:

  • NVIDIAยฎ GPU drivers โ€”CUDAยฎ 11.2 requires 450.80.02 or higher.

  • CUDAยฎ Toolkit โ€”TensorFlow supports CUDAยฎ 11.2 (TensorFlow >= 2.5.0)

  • CUPTI ships with the CUDAยฎ Toolkit.

  • cuDNN SDK 8.1.0 cuDNN versions).

Dataset

For this project I'm using the 10 Animals dataset available on Kaggle.

Implementation

Resnet50

ResNet is a family of Deep Neural Networks architectures introduced in 2015 He et al.. The original paper discussed 5 different architectures: 18-, 24-, 50-, 101- and 152-layer Neural Networks. I am implementing the 50-layer ResNet or ResNet50.

ResNets proposed a solution for the exploding/vanishing gradients problem common when building deeper and deeper NNs: taking the output of one layer and to jumping over a few layers and input this deeper into the neural network. This is called a residual block (also, identity block) and the authors illustrate this mechanism in their article like this:

image

The identity block can be used when the input x has the same dimension (width and height) as the output of the layer where we are feedforwarding x, othersize the addition wouldn't be possible. When this condition is not met, I use a convolution block like in the image below:

image

The only difference between the identity block and the convolution block is that the second has another convolution layer (plus a batch normalization) on the skip conection path. The convolution layer on the skip connection path has the purpose of resizing x so that its dimension matches the output and thus I can add those two together.

Following the ResNet50 architecture described in He et al. 2015, the architecture I'm implementing in this repo has the structure illustrated below:

image

GPU versus CPU training

The easiest way to see the diffence in training duration is to open the notebook in this repository, resnet-keras-code-from-scratch-train-on-gpu.ipynb, on Kaggle and follow the instructions for ativating GPU contained in the notebook. This is what I did in my case, as I don't have a separate GPU on my laptop.

To set up GPU support on a physical machine, follow these instructions.

Project contents

To process the data for obtaining squared images of the pre-defined size (as per model architecture definition), run the make_dataset.py script

from the src/data folder

To train a model, run the example_train.py script:

to see available parameters:

and select your preferred training options.

To make predictions using a pre-trained model, use the example_predict.py script:

and choose the desired setting from:

Last updated