How does the shuffle parameter in DataLoader works? Is it related to “cross validation"?

100375195 · July 15, 2021, 5:13pm

Only “training data” gets shuffled before every epoch and the validation data remains the same for each epoch??..

or it gets shuffled all together with the “validation data”?

And the other question is… if shuffle=True is not cross validation, how could I make cross validation (dividing data in folds and changing the validation fold) instead of using the regular method?

Thanks in advance!

dusty_nv · July 15, 2021, 5:21pm

Hi @100375195, it is best practice to shuffle the training dataset, because randomizing the order of the data can help the model to converge. Some datasets may be ordered by class, ect which can benefit from randomizing the order of the data samples.

The validation dataset doesn’t need to be shuffled because it has no impact on the model. It doesn’t typically get shuffled along with the training dataset (unless you explicitly set shuffle=True on the validation set dataloader)

I don’t believe PyTorch natively implements k-fold cross validation, so you may want to try referring to these resources instead:

100375195 · July 15, 2021, 5:28pm

Excellent

Thanks a lot for your answer!

system · September 13, 2021, 5:29pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Does the TLT support N-Fold Cross-Validation? TAO Toolkit	8	967	October 12, 2021
Shuffle instructions CUDA Programming and Performance	7	661	February 27, 2019
CUDA Pro Tip: Do The Kepler Shuffle Technical Blog	3	496	November 5, 2018
Nan error while training custom dataset Jetson Nano jetson-inference	8	922	August 26, 2021
How could warp shuffling be useful in matrix multiplication except for loading data to and from register variables? CUDA Programming and Performance	5	624	August 15, 2023
Loading a trained model doesn't appear to work as expected Jetson Nano jetson-inference	7	757	September 29, 2021
Warp shuffle over a mask CUDA Programming and Performance	0	461	August 20, 2020
What kind of function is __shfl_down_sync CUDA Programming and Performance	6	1580	June 6, 2023
Image preprocessing in Hello AI World Jetson Nano ai-training	2	678	October 15, 2021
TFrecords for validation_data_sources do not work with SSD and DSSD TAO Toolkit	2	364	February 16, 2023

How does the shuffle parameter in DataLoader works? Is it related to “cross validation"?

Related topics