Computer Vision | Deep Learning with Tensorflow & Keras (ResNet50, GPU training)
Last updated
Last updated
Implement ResNet from scratch
using Tensorflow and Keras
train on CPU then switch to GPU to compare speed
If you want to jump right to using a ResNet, have a look at . In this repo I am implementing a 50-layer ResNet from scratch not out of need, as implementations already exist, but as a learning process.
See
python 3.7.9
tensorflow 2.7.0 (includes keras)
scikit-learn 0.24.1
numpy 3.7.9
pillow 8.2.0
opencv-python 4.4.0.46
The following NVIDIA software must be installed on your system:
NVIDIAยฎ GPU drivers โCUDAยฎ 11.2 requires 450.80.02 or higher.
CUDAยฎ Toolkit โTensorFlow supports CUDAยฎ 11.2 (TensorFlow >= 2.5.0)
CUPTI ships with the CUDAยฎ Toolkit.
cuDNN SDK 8.1.0 cuDNN versions).
ResNets proposed a solution for the exploding/vanishing gradients problem common when building deeper and deeper NNs: taking the output of one layer and to jumping over a few layers and input this deeper into the neural network. This is called a residual block (also, identity block) and the authors illustrate this mechanism in their article like this:
The identity block can be used when the input x has the same dimension (width and height) as the output of the layer where we are feedforwarding x, othersize the addition wouldn't be possible. When this condition is not met, I use a convolution block like in the image below:
The only difference between the identity block and the convolution block is that the second has another convolution layer (plus a batch normalization) on the skip conection path. The convolution layer on the skip connection path has the purpose of resizing x so that its dimension matches the output and thus I can add those two together.
from the src/data folder
to see available parameters:
and select your preferred training options.
and choose the desired setting from:
For this project I'm using the .
ResNet is a family of Deep Neural Networks architectures introduced in 2015 . The original paper discussed 5 different architectures: 18-, 24-, 50-, 101- and 152-layer Neural Networks. I am implementing the 50-layer ResNet or ResNet50.
Following the ResNet50 architecture described in , the architecture I'm implementing in this repo has the structure illustrated below:
The easiest way to see the diffence in training duration is to open the notebook in this repository, , on Kaggle and follow the instructions for ativating GPU contained in the notebook. This is what I did in my case, as I don't have a separate GPU on my laptop.
To set up GPU support on a physical machine, follow .
To process the data for obtaining squared images of the pre-defined size (as per model architecture definition), run the script
To train a model, run the script:
To make predictions using a pre-trained model, use the script: