I have split my dataset into k equally sized folds for cross-validation. However, I want to perform some additional sampling operations on the training set within each fold and this might make the folds’ size differ from each other. Suppose I have an imbalanced dataset. I split this data into k equally sized folds for cross-validation. I apply data sampling techniques like SMOTE, ADASYN or random under-sampling on the training folds. Then, the size of each training fold is altered, and I might end up with folds having a different size and a different proportion of train and test samples. Would this be acceptable, or would it break the premises of traditional k-fold cross-validation? Has any research been done on the matter? Thank you for your insights
Fernando Rubio Garcia
asked Sep 20, 2023 at 14:06
Fernando Rubio Garcia Fernando Rubio Garcia
1 1 1 bronze badge
$\begingroup$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. $\endgroup$
Commented Sep 20, 2023 at 14:12