Implications of Unequal Fold Sizes in Cross-Validation

I have split my dataset into k equally sized folds for cross-validation. However, I want to perform some additional sampling operations on the training set within each fold and this might make the folds’ size differ from each other. Suppose I have an imbalanced dataset. I split this data into k equally sized folds for cross-validation. I apply data sampling techniques like SMOTE, ADASYN or random under-sampling on the training folds. Then, the size of each training fold is altered, and I might end up with folds having a different size and a different proportion of train and test samples. Would this be acceptable, or would it break the premises of traditional k-fold cross-validation? Has any research been done on the matter? Thank you for your insights

machine-learning
cross-validation
sampling

Fernando Rubio Garcia

asked Sep 20, 2023 at 14:06

Fernando Rubio Garcia Fernando Rubio Garcia

1 1 1 bronze badge

$\begingroup$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. $\endgroup$

Commented Sep 20, 2023 at 14:12

0 Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

machine-learning
cross-validation
sampling