In the first two parts of this series, we explored the importance of time series forecasting and traditional methods like ARIMA and Exponential Smoothing. While these techniques excel when data is relatively limited, they often underperform when richer data are available. Machine learning (ML) methods offer a flexible, data-driven approach that can uncover intricate patterns and improve forecasting accuracy, provided the data is rich enough. In this third part, we’ll dive into the key machine learning techniques used for time series forecasting, explaining how each one works and when to use them.

Why Machine Learning?

Traditional time series models like ARIMA assume a linear relationship in the data and often require the data to be stationary. However, real-world data can be highly non-linear and influenced by external factors, making these assumptions restrictive. Machine learning models, on the other hand, don’t require the same strict assumptions and can model complex, non-linear relationships, making them particularly useful for more challenging forecasting tasks.

Key Machine Learning Techniques for Time Series Forecasting

Tree-based Models

  1. Decision Trees: Decision trees split the data into subsets based on feature values, creating a tree-like structure where each node represents a decision based on a feature. Although decision trees can model non-linear relationships, they tend to overfit the data, leading to poor generalization on new data.
  2. Random Forests: A random forest is an ensemble of decision trees, typically trained on different subsets of the data. By averaging the predictions of many trees, random forests reduce overfitting and improve accuracy. This approach is robust, can handle a variety of data types, and is effective for medium-sized datasets. However, random forests can still struggle with very large datasets or highly complex time series patterns.
  3. Gradient Boosting Machines (GBMs): GBMs, including popular implementations like XGBoost and LightGBM, build decision trees sequentially. Each new tree attempts to correct the errors of the previous ones, gradually improving the model’s accuracy. GBMs are powerful for time series forecasting because they can capture complex patterns and interactions between features. However, they require careful tuning of hyperparameters to avoid overfitting and can be computationally intensive.

Support Vector Machines (SVMs)

Support Vector Machines are a type of supervised learning algorithm that can be used for both classification and regression tasks. In the context of time series forecasting, SVMs are particularly useful for modeling non-linear relationships. SVMs work by finding the hyperplane that best separates the data points into different categories, using kernel functions to map the input data into a higher-dimensional space. This allows SVMs to capture complex patterns that linear models might miss. While powerful, SVMs can be sensitive to the choice of hyperparameters and may not scale well with very large datasets.

Neural Networks

  • Feedforward Neural Networks (FNNs): FNNs are the most basic type of neural network, consisting of an input layer, one or more hidden layers, and an output layer. For time series forecasting, FNNs can be trained to predict future values by using lagged observations as inputs. However, they are limited in their ability to capture temporal dependencies, as they do not have a memory of past inputs beyond what is explicitly provided.
  • Recurrent Neural Networks (RNNs): RNNs are specifically designed to handle sequential data, making them well-suited for time series forecasting. Unlike FNNs, RNNs maintain a hidden state that captures information from previous time steps, allowing them to model temporal dependencies more effectively. Variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) further enhance this capability by addressing the problem of vanishing gradients, making them capable of learning long-term dependencies. These models are highly effective for complex, non-linear time series but require a significant amount of data and computational resources to train.
  • Convolutional Neural Networks (CNNs): Originally developed for image processing, CNNs have also been applied to time series forecasting. In this context, CNNs can be used to detect patterns across different time windows by applying convolutional filters. This makes them particularly useful for capturing local patterns in the data, such as short-term trends or seasonality. CNNs can be combined with RNNs or used in hybrid models to enhance forecasting accuracy.

Hybrid Models

Hybrid models combine different machine learning techniques to leverage the strengths of each approach. For example, a neural network might be used to model non-linear patterns in the data, while a linear regression model captures the trend component. Hybrid models can be particularly powerful in scenarios where a single model struggles to capture all aspects of the time series. However, they require careful design and tuning to ensure that the different components work well together.

Feature Engineering for Time Series Forecasting

Feature engineering plays a critical role in machine learning for time series forecasting. By creating new input variables that capture the underlying structure and patterns in the data, you can significantly improve the performance of your models. Common feature engineering techniques include:

  • Lag Features: Previous time points (e.g., values from the previous day, week, or month) are used as predictors for future values.
  • Rolling Statistics: Calculating moving averages, standard deviations, or other statistics over different window sizes to capture trends and variability in the data.
  • Seasonal Indicators: Creating variables that capture seasonal patterns, such as the day of the week, month, or holidays.
  • External Variables: Incorporating exogenous variables, such as economic indicators, weather data, or other relevant factors that might influence the time series.

Testing

Testing a time series model is a crucial step in the forecasting process. It involves assessing how well the model performs on unseen data, ensuring that the model generalizes well beyond the data it was trained on. Unlike typical machine learning tasks where data can be randomly split into training and testing sets, time series data requires more careful handling due to its sequential nature. Here’s how you can effectively test time series models.

1. Train-Test Split

In time series forecasting, the data is split into training and testing sets based on time. The training set contains the earlier data points, which the model uses to learn patterns, while the testing set includes later data points to evaluate the model’s performance.

  • Fixed Training and Test Sets: A common approach is to reserve a portion of the time series as the test set, typically the most recent observations. For instance, if you have 5 years of daily data, you might use the first 4 years for training and the last year for testing.
  • Walk-Forward Validation (Rolling Window): To get a more robust estimate of model performance, you can use walk-forward validation, also known as time series cross-validation. In this approach, the model is trained on a rolling window of data and tested on the subsequent time step. This process is repeated as the window moves forward in time, allowing you to assess how the model performs over different periods.

2. Backtesting

Backtesting involves simulating how the model would have performed in the past, using historical data to generate forecasts, and comparing them to actual outcomes. This technique is particularly valuable in financial forecasting, where you can assess how well a trading strategy or economic model would have performed based on historical market data.

  • Expanding Window Backtesting: Here, you start with an initial training period, make a forecast for the next time step, and then expand the training set to include that time step before making the next forecast. This method mimics how a model would be used in real-time, continually updating as new data becomes available.
  • Rolling Window Backtesting: Similar to walk-forward validation, rolling window backtesting uses a fixed-size window that moves forward in time. The model is trained on this rolling window and tested on the next time step, repeating the process across the time series.

3. Out-of-Sample Testing

Out-of-sample testing evaluates the model on data that was not used during training. This helps to assess how well the model generalizes to new, unseen data. It is crucial for detecting overfitting, where a model performs well on the training data but poorly on new data.

  • Temporal Train-Test Split: For out-of-sample testing, ensure that the test set only contains data from periods not included in the training set. This ensures the model is truly tested on unseen data, reflecting its real-world performance.

4. Cross-Validation Techniques

While traditional cross-validation methods like k-fold cross-validation are not suitable for time series due to the importance of temporal order, modified cross-validation techniques can be used.

  • Time Series Cross-Validation: This technique involves splitting the time series into multiple folds while respecting the temporal order. Each fold represents a point in time, where the model is trained on data up to that point and validated on the next time step. This approach helps in understanding the model’s performance across different periods, providing a more comprehensive evaluation.

5. Evaluation Metrics for Time Series Testing

When testing time series models, selecting appropriate evaluation metrics is critical. Common metrics include:

  • Mean Absolute Error (MAE): Measures the average magnitude of errors between the predicted and actual values, providing a straightforward measure of forecast accuracy.
  • Root Mean Squared Error (RMSE): Similar to MAE but gives more weight to larger errors, making it useful for scenarios where big misses are particularly costly.
  • Mean Absolute Percentage Error (MAPE): Expresses the forecast error as a percentage, allowing for easier comparison across different datasets or time series with different scales.
  • Mean Squared Logarithmic Error (MSLE): This metric is particularly useful when predicting values that vary over several orders of magnitude. It penalizes underestimations more than overestimations.

The Dangers of Relying Too Much on New Techniques

While machine learning techniques are powerful, they require a great deal more data than do traditional time series techniques. As a result, it is not possible to simply forget the old “passe” methods and embrace the cool new ones. If you are working with just a few years worth of, let’s say, monthly sales data, chances are a traditional ARIMA model will outperform a machine learning based one. Forecasters will have to continue to learn both the traditional techniques and the new ones. Machine learning augments the traditional set of techniques—it does not replace them. 

Read more: Interest rate forecasting

Conclusion

Machine learning techniques provide a powerful alternative to traditional time series forecasting methods, especially when dealing with complex, non-linear data. By leveraging models like decision trees, SVMs, and neural networks, and applying thorough feature engineering, you can unlock new levels of forecasting accuracy. However, it’s crucial to understand the strengths and limitations of each technique and to carefully evaluate your models using appropriate metrics.

In the next part of this series, we’ll move from theory to practice, demonstrating how to implement these machine learning techniques for time series forecasting using Python, with real-world datasets to showcase their effectiveness.

Read more: A Series on Time Series, Part I: Why Forecast?

A Series on Time Series, Part III: Machine Learning Methods was last modified: October 31st, 2024 by Franklin Carroll