Tips for Tuning Hyperparameters in Machine Learning Models

[MLM] Tips for Tuning Hyperparameters in Machine Learning Models

Image by Author | Created on Canva

If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?

You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.

This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.

1. Start Simple: Train a Baseline Model Without Any Tuning

When beginning the process of hyperparameter tuning, it’s good to start simple by training a baseline model without any tuning. This initial model serves as a reference point to measure the impact of subsequent tuning efforts.

Here’s why this step is essential and how to execute it effectively:

A baseline model provides a benchmark to compare against models with the models . This helps in quantifying the improvements achieved through hyperparameter tuning.
Select a default model: Choose a model that fits the problem at hand. For example, a decision tree for a classification problem or a linear regression for a regression problem.
Use default hyperparameters: Train the model using the default hyperparameters provided by the library. For instance, if using scikit-learn, instantiate the model without specifying any parameters.

Assess the performance of the baseline model using appropriate metrics. This step involves splitting the data into training and testing sets, training the model, making predictions, and evaluating the results:

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

from sklearn.datasets import load_iris

# Load data

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)

# Initialize model with default parameters

model = DecisionTreeClassifier()

# Train model

model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)

baseline_accuracy = accuracy_score(y_test, y_pred)

print(f'Baseline Accuracy: {baseline_accuracy:.2f}')

Document the performance metrics of the baseline model. This will be useful for comparison as you proceed with hyperparameter tuning.

2. Use Hyperparameter Search with Cross-Validation

Once you have established a baseline model, the next step is to optimize the model’s performance through hyperparameter tuning. Utilizing hyperparameter search techniques with cross-validation is a robust approach to finding the best set of hyperparameters.

Why use hyperparameter search with cross-validation?

Cross-validation provides a more reliable estimate of model performance by averaging results across multiple folds, reducing the risk of overfitting to a particular train-test split.
Hyperparameter search methods like Grid Search and Random Search allow for systematic exploration of the hyperparameter space, ensuring a thorough evaluation of potential configurations.
This method helps in selecting hyperparameters that generalize well to unseen data, leading to better model performance in production.

Choose a search technique: Select a hyperparameter search method. The two most common strategies are:

Grid search which involves an exhaustive search over a parameter grid
Randomized search which involves random sampling parameters from a specified distribution

Define hyperparameter grid: Specify the hyperparameters and their respective ranges or distributions to search over.

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_iris

# Load data

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)

# Initialize model

model = DecisionTreeClassifier()

# Define hyperparameter grid for Grid Search

param_grid = {

'criterion': ['gini', 'entropy'],

'max_depth': [None, 10, 20, 30],

'min_samples_split': [2, 5, 10]

}

Use cross-validation: Instead of defining a cross-validation strategy separately, you can use cross_val_score to evaluate model performance with the specified cross-validation scheme.

from sklearn.model_selection import cross_val_score

# Grid Search

grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')

grid_search.fit(X_train, y_train)

best_params_grid = grid_search.best_params_

best_score_grid = grid_search.best_score_

print(f'Best Parameters (Grid Search): {best_params_grid}')

print(f'Best Cross-Validation Score (Grid Search): {best_score_grid:.2f}')

Using hyperparameter tuning with cross-validation this way ensures more reliable performance estimates and improved model generalization.

3. Use Randomized Search for Initial Exploration

When starting hyperparameter tuning, it’s often beneficial to use randomized search for initial exploration. Randomized search provides a more efficient way to explore a wide range of hyperparameters compared to grid search, especially when dealing with high-dimensional hyperparameter spaces.

Define hyperparameter distribution: Specify the hyperparameters and their respective distributions from which to sample.

from sklearn.model_selection import RandomizedSearchCV

from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_iris

import numpy as np

# Load data

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Initialize model

model = DecisionTreeClassifier()

# Define hyperparameter distribution for Random Search

param_dist = {

'criterion': ['gini', 'entropy'],

'max_depth': [None] + list(range(10, 31)),

'min_samples_split': range(2, 11),

'min_samples_leaf': range(1, 11)

}

Set up randomized search with cross-validation: Use randomized search with cross-validation to explore the hyperparameter space.

# Random Search

random_search = RandomizedSearchCV(model, param_dist, n_iter=100, cv=5, scoring='accuracy')

random_search.fit(X_train, y_train)

best_params_random = random_search.best_params_

best_score_random = random_search.best_score_

print(f'Best Parameters (Random Search): {best_params_random}')

print(f'Best Cross-Validation Score (Random Search): {best_score_random:.2f}')

Evaluate the model: Train the model using the best hyperparameters and evaluate its performance on the test set.

best_model = DecisionTreeClassifier(**best_params_random)

best_model.fit(X_train, y_train)

y_pred = best_model.predict(X_test)

final_accuracy = accuracy_score(y_test, y_pred)

print(f'Final Model Accuracy: {final_accuracy:.2f}')

Randomized search is, therefore, better suited for high-dimensional hyperparameter spaces and computationally expensive models.

4. Monitor Overfitting with Validation Curves

Validation curves help visualize the effect of a hyperparameter on the training and validation performance, allowing you to identify overfitting or underfitting.

Here’s an example. This code snippet evaluates how the performance of a Random Forest classifier varies with different values of the n_estimators hyperparameter using validation curves. It does this by calculating training and cross-validation scores for a range of n_estimators values (10, 100, 200, 400, 800, 1000) across 5-fold cross-validation.

from sklearn.model_selection import validation_curve

from sklearn.ensemble import RandomForestClassifier

import matplotlib.pyplot as plt

import numpy as np

# Define hyperparameter range

param_range = [10, 100, 200, 400, 800, 1000]

# Calculate validation curve

train_scores, test_scores = validation_curve(

RandomForestClassifier(), X_train, y_train,

param_name="n_estimators", param_range=param_range,

cv=5, scoring="accuracy")

# Calculate mean and standard deviation

train_mean = np.mean(train_scores, axis=1)

train_std = np.std(train_scores, axis=1)

test_mean = np.mean(test_scores, axis=1)

test_std = np.std(test_scores, axis=1)

It then plots the mean accuracy scores along with their standard deviations for both training and cross-validation sets. The resulting plot helps to visualize whether the model is overfitting or underfitting at different values of n_estimators.

# Plot validation curve

plt.plot(param_range, train_mean, label="Training score", color="r")

plt.fill_between(param_range, train_mean - train_std, train_mean + train_std, color="r", alpha=0.3)

plt.plot(param_range, test_mean, label="Cross-validation score", color="g")

plt.fill_between(param_range, test_mean - test_std, test_mean + test_std, color="g", alpha=0.3)

plt.title("Validation Curve with Random Forest")

plt.xlabel("Number of Estimators")

plt.ylabel("Accuracy")

plt.legend(loc="best")

plt.show()

5. Use Bayesian Optimization for Efficient Search

Using Bayesian optimization for hyperparameter tuning is a highly efficient and effective approach. It uses probabilistic modeling to explore the hyperparameter space—requiring fewer evaluations and computational resources.

You’ll need libraries like scikit-optimize or hyperopt to perform Bayesian optimization. Here, we’ll use scikit-optimize:

!pip install scikit-optimize

Define the hyperparameter space: Specify the hyperparameters and their respective ranges to search over.

from skopt import BayesSearchCV

from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Load data

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=25)

# Initialize model

model = DecisionTreeClassifier()

# Define hyperparameter space for Bayesian Optimization

param_space = {

'criterion': ['gini', 'entropy'],

'max_depth': [None] + list(range(10, 31)),

'min_samples_split': (2, 10),

'min_samples_leaf': (1, 10)

}

Set up Bayesian optimization with cross-validation: Use Bayesian optimization with cross-validation to explore the hyperparameter space.

# Bayesian Optimization

opt = BayesSearchCV(model, param_space, n_iter=32, cv=5, scoring='accuracy')

opt.fit(X_train, y_train)

best_params_bayes = opt.best_params_

best_score_bayes = opt.best_score_

print(f'Best Parameters (Bayesian Optimization): {best_params_bayes}')

print(f'Best Cross-Validation Score (Bayesian Optimization): {best_score_bayes:.2f}')

Evaluate the model: Train a final model using the best hyperparameters found by Bayesian optimization and evaluate its performance on the test set.

best_model = DecisionTreeClassifier(**best_params_bayes)

best_model.fit(X_train, y_train)

y_pred = best_model.predict(X_test)

final_accuracy = accuracy_score(y_test, y_pred)

print(f'Final Model Accuracy: {final_accuracy:.2f}')

Summary

Effective hyperparameter tuning can make a substantial difference in the performance of your machine learning models.

By starting with a simple baseline model and progressively using search techniques, you can systematically explore and identify the best hyperparameters. From initial exploration with randomized search to efficient fine-tuning with Bayesian optimization, we went over practical tips to optimize your model’s hyperparameters.

So happy hyperparameter tuning!

1. Start Simple: Train a Baseline Model Without Any Tuning

2. Use Hyperparameter Search with Cross-Validation

3. Use Randomized Search for Initial Exploration

4. Monitor Overfitting with Validation Curves

5. Use Bayesian Optimization for Efficient Search

Summary

Related stories

Other stories