ndarray slicing with index out of bounds
While training a GAN network, I accidentally set the loop iterating through batches of training data to stop at a value way beyond the length of the training set. More exactly, I did this:
Notice that insead of len(X_real_train)
I accidentally wrote len(X_real)
During each batch iteration I extracted one slice from the X_real_train
, and fed it to a classifier for training.
Training seemed to go well for the first hundreds of batches. I had sanity check after each batch to see how the metrics of the model changed.
But since print
slows down training on a GPU so much, I removed this sanity check and used a print every ten epochs (1400 batches) instead of each batch.
And after the first epoch I noticed that the classifier (disc_model) would stabilize at *always* predicting 0 and never got out of that zone for the rest of the training.
I couldn't figure out why. I went over the architecture of the generator, of the discriminator etc.
I finally got to the training loop, I saw the X_real mistake and I couldn't figure out how I did not get an error.
It turns out this is how ndarray works:
The slicing doesnβt raise an error if both the start and stop indices are larger than the sequence length. This is in contrast to simple indexing: when indexing an element that is out of bounds, Python will throw an index out of bounds error. However, with slicing it simply returns an empty sequence.
Because my train data was 59500 instead of 70000 (train+test), my batch iteration was not stopping at 1190, but at 1400.
From 1190, I was going into index out of bounds territory.
But since ndarray would not throw an error but instead return an empy array, I didn't notice.
Which means that for iterations 1190 to 1400 I was training a classifier on an empty array of real data (label 1) and a batch of 100 fake sample (label 0).
And in these 210 iterations, my classifier learned to predict always 0. And next epoch, we would start over. It would learn to predict correctly, then towards the ent of the epoch I would have again 210 iterations of training on an empty array of real data (label 1) and a batch of 100 fake sample (label 0).
Last updated