ndarray slicing with index out of bounds

While training a GAN network, I accidentally set the loop iterating through batches of training data to stop at a value way beyond the length of the training set. More exactly, I did this:

# load MNIST dataset and make necessary preprocessing steps
X_real, y_real = get_real_data()

# now we split into train and test
X_real_train, X_real_test, y_real_train, y_real_test = train_test_split(X_real, y_real, test_size=0.15)

n_samples = len(X_real) * 2 # we add an equal amount of fake data (made by the generator of the GAN)

Notice that insead of len(X_real_train) I accidentally wrote len(X_real)

for e in tqdm(range(epochs)):
    for b in tqdm(range(n_samples // batch_size)):
        ## train Discriminator
        
        # get a batch of real images
        start = b * half_batch
        end = (b + 1) * half_batch
        
        X_real_b = X_real_train[start : end]
        y_real_b = y_real_train[start : end]
        
        # combine the real and fake images
        X = np.vstack((X_real_b, X_fake_b))
        y = np.vstack((y_real_b, y_fake_b))
        
        # run a single gradient update on a single batch of data (half real, half fake)
        # update weights and store discriminator prediction loss
        disc_loss, _ = disc_model.train_on_batch(X, y)
        .....

During each batch iteration I extracted one slice from the X_real_train , and fed it to a classifier for training.

Training seemed to go well for the first hundreds of batches. I had sanity check after each batch to see how the metrics of the model changed.

But since print slows down training on a GPU so much, I removed this sanity check and used a print every ten epochs (1400 batches) instead of each batch.

And after the first epoch I noticed that the classifier (disc_model) would stabilize at *always* predicting 0 and never got out of that zone for the rest of the training.

I couldn't figure out why. I went over the architecture of the generator, of the discriminator etc.

I finally got to the training loop, I saw the X_real mistake and I couldn't figure out how I did not get an error.

It turns out this is how ndarray works:

The slicing doesn’t raise an error if both the start and stop indices are larger than the sequence length. This is in contrast to simple indexing: when indexing an element that is out of bounds, Python will throw an index out of bounds error. However, with slicing it simply returns an empty sequence.

Because my train data was 59500 instead of 70000 (train+test), my batch iteration was not stopping at 1190, but at 1400.

From 1190, I was going into index out of bounds territory.

But since ndarray would not throw an error but instead return an empy array, I didn't notice.

Which means that for iterations 1190 to 1400 I was training a classifier on an empty array of real data (label 1) and a batch of 100 fake sample (label 0).

And in these 210 iterations, my classifier learned to predict always 0. And next epoch, we would start over. It would learn to predict correctly, then towards the ent of the epoch I would have again 210 iterations of training on an empty array of real data (label 1) and a batch of 100 fake sample (label 0).

PreviousPython NextComputer Vision

Last updated 3 years ago