Cross-validation (CV) is a standard technique used across science to test how well a model predicts new data. Data are split into $K$ ``folds,'' where one fold (i.e., hold-out set) is used to evaluate a model's predictive ability, with folds cycled in standard $K$-fold CV. Researchers typically rely on conventions when choosing the hold-out size, commonly $80/20$ split, or $K=5$, even though this choice can affect inference and model evaluation. Principally, this split should be determined by balancing the predictive accuracy (bias) and the uncertainty of this accuracy (variance), which forms a tradeoff based on the size of the hold-out set. More training data means more accurate models, but fewer testing data lead to uncertain evaluation, and vice versa. The challenge is that this evaluation uncertainty cannot be identified, without strong assumptions, directly from data. We propose a procedure to determine the optimal hold-out size by deriving a finite-sample exact expression and upper bound on the evaluation uncertainty, depending on the error assumption, and adopting a utility-based approach to make this tradeoff explicit. Analyses of real-world datasets using linear regression and random forest demonstrate this procedure in practice, providing insight into implicit assumptions, robustness, and model performance. Critically, the results show that the optimal hold-out size depends on both the data and the model, and that conventional choices implicitly make assumptions about the fundamental characteristics of the data. Our framework makes these assumptions explicit and provides a principled, transparent way to select this split based on the data and model rather than convention. By replacing a one-size-fits-all choice with context-specific reasoning, it enables more reliable comparisons of predictive performance across scientific domains.