pwshub.com

From Train-Test to Cross-Validation: Advancing Your Model’s Evaluation

Many beginners will initially rely on the train-test method to evaluate their models. This method is straightforward and seems to give a clear indication of how well a model performs on unseen data. However, this approach can often lead to an incomplete understanding of a model’s capabilities. In this blog, we’ll discuss why it’s important to go beyond the basic train-test split and how cross-validation can offer a more thorough evaluation of model performance. Join us as we guide you through the essential steps to achieve a deeper and more accurate assessment of your machine learning models.

Let’s get started.

From Train-Test to Cross-Validation: Advancing Your Model’s Evaluation
Photo by Belinda Fewings. Some rights reserved.

Overview

This post is divided into three parts; they are:

  • Model Evaluation: Train-Test vs. Cross-Validation
  • The “Why” of Cross-Validation
  • Delving Deeper with K-Fold Cross-Validation

Model Evaluation: Train-Test vs. Cross-Validation

A machine learning model is determined by its design (such as a linear vs. non-linear model) and its parameters (such as the coefficients in a linear regression model). You need to make sure the model is suitable for the data before considering how to fit the model.

The performance of a machine learning model is gauged by how well it performs on previously unseen (or test) data. In a standard train-test split, we divide the dataset into two parts: a larger portion for training our model and a smaller portion for testing its performance. The model is suitable if the tested performance is acceptable. This approach is straightforward but doesn’t always utilize our data most effectively.

However, with cross-validation, we go a step further. The second image shows a 5-Fold Cross-Validation, where the dataset is split into five “folds.” In each round of validation, a different fold is used as the test set while the remaining form the training set. This process is repeated five times, ensuring each data point is used for training and testing.

Here is an example to illustrate the above:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

# Load the Ames dataset

import pandas as pd

Ames = pd.read_csv('Ames.csv')

# Import Linear Regression, Train-Test, Cross-Validation from scikit-learn

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split, cross_val_score

# Select features and target

X = Ames[['GrLivArea']]  # Feature: GrLivArea, a 2D matrix

y = Ames['SalePrice']    # Target: SalePrice, a 1D vector

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression model using Train-Test

model = LinearRegression()

model.fit(X_train, y_train)

train_test_score = round(model.score(X_test, y_test), 4)

print(f"Train-Test R^2 Score: {train_test_score}")

# Perform 5-Fold Cross-Validation

cv_scores = cross_val_score(model, X, y, cv=5)

cv_scores_rounded = [round(score, 4) for score in cv_scores]

print(f"Cross-Validation R^2 Scores: {cv_scores_rounded}")

While the train-test method yields a single R² score, cross-validation provides us with a spectrum of five different R² scores, one from each fold of the data, offering a more comprehensive view of the model’s performance:

Train-Test R^2 Score: 0.4789

Cross-Validation R^2 Scores: [0.4884, 0.5412, 0.5214, 0.5454, 0.4673]

The roughly equal R² scores among the five means the model is stable. You can then decide whether this model (i.e., linear regression) provides an acceptable prediction power.

The “Why” of Cross-Validation

Understanding the variability of our model’s performance across different subsets of data is crucial in machine learning. The train-test split method, while useful, only gives us a snapshot of how our model might perform on one particular set of unseen data.

Cross-validation, by systematically using multiple folds of data for both training and testing, offers a more robust and comprehensive evaluation of the model’s performance. Each fold acts as an independent test, providing insights into how the model is expected to perform across varied data samples. This multiplicity not only helps identify potential overfitting but also ensures that the performance metric (in this case, R² score) is not overly optimistic or pessimistic, but rather a more reliable indicator of how the model will generalize to unseen data.

To visually demonstrate this, let’s consider the R² scores from both a train-test split and a 5-fold cross-validation process:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

# Import Seaborn and Matplotlib

import seaborn as sns

import matplotlib.pyplot as plt

# Assuming cv_scores_rounded contains your cross-validation scores

# And train_test_score is your single train-test R^2 score

# Plot the box plot for cross-validation scores

cv_scores_df = pd.DataFrame(cv_scores_rounded, columns=['Cross-Validation Scores'])

sns.boxplot(data=cv_scores_df, y='Cross-Validation Scores', width=0.3, color='lightblue', fliersize=0)

# Overlay individual scores as points

plt.scatter([0] * len(cv_scores_rounded), cv_scores_rounded, color='blue', label='Cross-Validation Scores')

plt.scatter(0, train_test_score, color='red', zorder=5, label='Train-Test Score')

# Plot the visual

plt.title('Model Evaluation: Cross-Validation vs. Train-Test')

plt.ylabel('R^2 Score')

plt.xticks([0], ['Evaluation Scores'])

plt.legend(loc='lower left', bbox_to_anchor=(0, +0.1))

plt.show()

This visualization underscores the difference in insights gained from a single train-test evaluation versus the broader perspective offered by cross-validation:

Through cross-validation, we gain a deeper understanding of our model’s performance, moving us closer to developing machine learning solutions that are both effective and reliable.

Delving Deeper with K-Fold Cross-Validation

Cross-validation is a cornerstone of reliable machine learning model evaluation, with cross_val_score() providing a quick and automated way to perform this task. Now, we turn our attention to the KFold class, a component of scikit-learn that offers a deeper dive into the folds of cross-validation. The KFold class provides not just a score but a window into the model’s performance across different segments of our data. We demonstrate this by replicating the example above:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

# Import K-Fold and necessary libraries

from sklearn.model_selection import KFold

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

# Select features and target

X = Ames[['GrLivArea']].values  # Convert to numpy array for KFold

y = Ames['SalePrice'].values    # Convert to numpy array for KFold

# Initialize Linear Regression and K-Fold

model = LinearRegression()

kf = KFold(n_splits=5)

# Manually perform K-Fold Cross-Validation

for fold, (train_index, test_index) in enumerate(kf.split(X), start=1):

    # Split the data into training and testing sets

    X_train, X_test = X[train_index], X[test_index]

    y_train, y_test = y[train_index], y[test_index]

    # Fit the model and predict

    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    # Calculate and print the R^2 score for the current fold

    print(f"Fold {fold}:")

    print(f"TRAIN set size: {len(train_index)}")

    print(f"TEST set size: {len(test_index)}")

    print(f"R^2 score: {round(r2_score(y_test, y_pred), 4)}\n")

This code block will show us the size of each training and testing set and the corresponding R² score for each fold:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Fold 1:

TRAIN set size: 2063

TEST set size: 516

R^2 score: 0.4884

Fold 2:

TRAIN set size: 2063

TEST set size: 516

R^2 score: 0.5412

Fold 3:

TRAIN set size: 2063

TEST set size: 516

R^2 score: 0.5214

Fold 4:

TRAIN set size: 2063

TEST set size: 516

R^2 score: 0.5454

Fold 5:

TRAIN set size: 2064

TEST set size: 515

R^2 score: 0.4673

The KFold class shines in its transparency and control over the cross-validation process. While cross_val_score() simplifies the process into one line, KFold opens it up, allowing us to view the exact splits of our data. This is incredibly valuable when you need to:

  • Understand how your data is being divided.
  • Implement custom preprocessing before each fold.
  • Gain insights into the consistency of your model’s performance.

By using the KFold class, you can manually iterate over each split and apply the model training and testing process. This not only helps in ensuring that you’re fully informed about the data being used at each stage but also offers the opportunity to modify the process to suit complex needs.

Further Reading

APIs

Tutorials

Ames Housing Dataset & Data Dictionary

Summary

In this post, we explored the importance of thorough model evaluation through cross-validation and the KFold method. Both techniques meticulously avoid the pitfall of data leakage by keeping training and testing data distinct, thereby ensuring the model’s performance is accurately measured. Moreover, by validating each data point exactly once and using it for training K-1 times, these methods provide a detailed view of the model’s ability to generalize, boosting confidence in its real-world applicability. Through practical examples, we’ve demonstrated how integrating these strategies into your evaluation process leads to more reliable and robust machine learning models, ready for the challenges of new and unseen data.

Specifically, you learned:

  • The efficiency of cross_val_score() in automating the cross-validation process.
  • How KFold offers detailed control over data splits for tailored model evaluation.
  • How both methods ensure full data utilization and prevent data leakage.

Do you have any questions? Please ask your questions in the comments below, and I will do my best to answer.

Source: machinelearningmastery.com

Related stories
1 week ago - The rapid evolution of artificial intelligence (AI) has resulted in a powerful synergy between large language models (LLMs) and AI agents. This dynamic interplay is sort of like the tale of David and Goliath (without the fighting), where...
1 month ago - If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of...
1 week ago - Business process management (BPM) tools help organizations design, execute, analyze, and optimize their business processes. Manual workflow planning and execution are time-consuming and require a lot of effort from employees. Business...
1 month ago - Netstock is the best SCM software for small and medium businesses. Oracle NetSuite or SAP SCM are ideal supply chain management solutions for large enterprises. The post Best Supply Chain Management Software of 2024 appeared first on...
2 weeks ago - Is Acumatica the right cloud ERP solution for global business? Read our in-depth review, which includes features, benefits, and cons and verdict who should use it. The post Acumatica ERP Review: Features, Benefits, Pricing, and Cons...
Other stories
2 hours ago - Ubuntu 24.10 ‘Oracular Oriole’ is released on October 13th, and as you’d expect from a new version of Ubuntu, it’s packed with new features. As a short-term release, Ubuntu 24.10 gets 9 months of ongoing updates, security patches, and...
4 hours ago - Did you know that CSS can play a significant role in web accessibility? While CSS primarily handles the visual presentation of a webpage, when you use it properly it can enhance the user’s experience and improve accessibility. In this...
5 hours ago - Design thinking workshops are your key to turning big problems into clear solutions. In this blog, I share how to run them efficiently and keep your team aligned. The post How to run a design thinking workshop appeared first on LogRocket...
5 hours ago - New memory-optimized X8g instances offer up to 3 TiB DDR5 memory, 192 vCPUs, and 50 Gbps network bandwidth, designed for memory-intensive workloads like databases, analytics, and caching with unparalleled price/performance and efficiency.
5 hours ago - Gain indispensable data engineering expertise through a hands-on specialization by DeepLearning.AI and AWS. This professional certificate covers ingestion, storage, querying, modeling, and more.