Overview of Vector Autoregressive Models and Examples of Applications and Implementations

Machine Learning Artificial Intelligence Digital Transformation ICT Sensor Data & IOT ICT Infrastructure Stream Data Processing Probabilistic Generative Model Support Vector Machine Sparse Modeling Anomaly and Change Detection Relational Data Learning Time Series Data Analysis Navigation of this blog

Vector Autoregressive Models

Vector Autoregression Model (VAR Model) is one of the time series data modeling methods used in fields such as statistics and economics, etc. The VAR model is a model that is applied when multiple variables interact with each other.

The general autoregression model (Autoregression Model) expresses the value of a variable as a linear combination of its past values, and the VAR model extends this idea to multiple variables, becoming a model that predicts current values using past values of multiple variables.

Specifically, the VAR model is represented by the following equation (using the simple two-variable case as an example).

\[Y_t = c + A_1 * Y_(t-1) + A_2 * Y_(t-2) + \dots + A_p * Y_(t-p) + \epsilon_t\]

Where,

\(Y_t\) is the k-dimensional vector of variables in period t in a p-order VAR model.
c is the constant term (offset).
\(A_1, A_2, \dots, A_p\) is the coefficient matrix, representing the coefficients of \(Y_{t-1}, Y_{t-2}, …, Y_{t-p}\) coefficients, respectively.
\(\epsilon_t\) is the error term (white noise), which is assumed to have zero mean and no autocorrelation.

VAR models are usually estimated using methods such as least squares, and the order p (number of past periods) of the model must be appropriately chosen. since VAR models can handle many interrelated variables at once, they are widely applied in macroeconomic analysis and financial market forecasting.

Extensions of the VAR model include the Vector Error Correction Model (VECM) and the Vector Moving Average Model (VMA). These models are useful for more complex time series data analysis.

vector error correction model

The Vector Error Correction Model (VECM) is used to analyze non-stationary (with a unit root) time series data and is an extension of the VAR model, which is often used in fields such as economics and finance.

VECM is applied when the variables in the time series data are non-stationary and a long-run equilibrium relationship exists between them. This means, for example, that economic variables such as GDP and consumption may behave non-stationary due to temporary shocks, but equilibrium relationships may exist in the long run, and VECM takes the approach of modeling the equilibrium relationships among these variables.

In VECM, non-stationary data are first made stationary by differencing. Then, a VAR model is applied to the differenced data, and after estimating the VAR model, an error correction term (ECT) is introduced to capture the long-term equilibrium relationship. The Error Correction Term is a term that indicates the extent to which the non-stationary variables deviate from the equilibrium relationship and correct to equilibrium.

The mathematical expression of the VECM is as follows (using the two-variable case as an example).

\[\Delta Y_t = \Pi Y_{t-1} + \Gamma_1\Delta Y_{t-1} + … + \Gamma_p \Delta Y_{t-p} + \epsilon_t\]

Where,

\(\Delta Y_t\) is the differenced k-dimensional vector of variables.
\(\Pi\) is the coefficient matrix of the equilibrium relationship and is an important parameter of the VECM.
\(\Gamma_1, \dots , \Gamma_p\) is the coefficient matrix of the differenced variables and represents the coefficients of the p-th order VAR model.
\(\epsilon_t\) is the white noise (error term).

Because the VECM captures long-run relationships, including equilibrium relationships, it has a structure in which the effects of short-term shocks decay and return to equilibrium. Because of this property, VECMs are often used in economics to model relationships such as the money demand function, purchasing power parity, and interest rate parity, etc. VECMs can be estimated using the Maximum Likelihood Estimation method or Bayesian Methods (Bayesian Methods) as described in “Overview and Various Implementations of Bayesian Estimation“.

vector moving average model

The Vector Moving Average Model (VMA), like the Vector Autoregressive Model (VAR model), is a method for modeling multiple time series data simultaneously, The VMA model uses the past white noise (error term) of a variable to model the current value.

The VMA model predicts the current value as a linear combination of the past values of the white noise, and can be expressed in a concrete formula as follows (using the simple two-variable case as an example).

\[Y_t = \mu + \Theta_1\epsilon_{t-1} + \Theta_2 \epsilon_{t-2} + \dots + \Theta_q\epsilon_{t-q} + \epsilon_t\]

Where,

\(Y_t\) is a k-dimensional vector of variables, representing the data in period t.
\(\mu\) is the constant term (offset).
\(\Theta_1, \Theta_2, \dots, \Theta_q\) are coefficient matrices, representing the past \(\epsilon_{t-1}, \epsilon_{t-2}, \dots, \epsilon_{t-q}\) coefficients of the white noise, respectively.
\(\epsilon_t\) is a k-dimensional white noise (error term) vector that is assumed to have zero mean and no autocorrelation.

The VMA model, like the VAR model, uses historical values and thus can model data reflecting historical information. However, unlike the VAR model, the VMA model directly uses past values of the error terms, so it can model the structure of white noise autocorrelation and may have fewer parameters than the VAR model, thus lightening the computational load.

The VMA model is also used in combination with the VAR model as the VARMA principal vector autoregressive moving average model; the VARMA model can model more complex time series data by combining multiple historical values with multiple historical white noise.

vector autoregressive moving average model

The Vector Autoregressive Moving-Average Model (VARMA model) is a statistical model for simultaneously modeling multiple time series variables. The VARMA model is characterized by the inclusion of both an autoregressive (AR) term and a moving average (MA) term.

The VARMA model is a combination of the VAR and VMA models: the VAR model uses the past values of a variable to predict its current value, while the VMA model uses the past white noise (error terms) of a variable to predict its current value. The VARMA model combines these The VARMA model combines these ideas and is expressed as follows (for a simple two-variable case as an example).

\[Y_t = c + A1 * Y_{t-1} + A2 * Y_{t-2} + \dots + Ap * Y_{t-p} + \epsilon_t + \Theta_1\epsilon_{t-1} + \Theta_2\epsilon_{t-2} + \dots + \Theta_q\epsilon_{t-q}\]

Where,

\(Y_t\) is a k-dimensional vector of variables, representing the data in period t.
c represents the constant term (offset).
A1, A2, …, Ap are the coefficient matrices of the VAR model. , Ap are the coefficient matrices of the VAR model, representing the coefficients of \(Y_{t-1}, Y_{t-2}, \dots, Y_{t-p}\), respectively.
\(\epsilon_t\) is the k-dimensional white noise (error term) vector, which is assumed to have zero mean and no autocorrelation.
\(\Theta_1, \Theta_2, \dots, \Theta_q\) is the coefficient matrix of the VMA model, representing the coefficients of the past values of white noise \(\epsilon_{t-1}, \epsilon_{t-2}, \dots, \epsilon_{t-q}\) respectively.

The VARMA model can model more complex time series data by combining multiple past values and multiple past white noise The parameters of the VARMA model can be estimated using techniques such as the method of least squares (OLS) and the maximum likelihood method (Maximum Likelihood Estimation, MLE), described below. (MLE) and other methods described below. The VARMA model is also very general and has been applied in various fields such as economics, finance, meteorology, and social sciences.

Algorithms used in vector autoregressive models

The following algorithms are commonly used to estimate vector autoregressive models

Ordinary Least Squares (OLS): The most common and basic algorithm, when estimating a VAR model by OLS, involves solving a minimization problem to estimate the coefficient matrix from the data set. Specifically, the coefficients are estimated by the method of least squares, using the past values of each variable as explanatory variables and the current values as objective variables. This method is relatively simple to compute and is suitable for small-scale VAR models.
Bayesian Methods: Bayesian methods estimate parameters using a priori information (prior distribution), allowing for uncertainty in the estimated results since the parameters of the VAR model are estimated probabilistically. Bayesian VAR models are estimated using sampling methods such as the MCMC (Markov Chain Monte Carlo) method described in “Overview and Implementation of Markov Chain Monte Carlo Methods“.
Univariate Methods: Univariate methods test for the existence of a unit root in time series data. Since the VAR model assumes stationarity, it cannot be applied directly to non-stationary data. If stationarity is not ensured, pre-processing such as taking differences is necessary.
Information Criteria: Information criteria are used as indicators for selecting the order (number of past periods) of the VAR model. By selecting the order to minimize information criteria such as Akaike (Akaike information criterion) and BIC (Bayesian information criterion), an appropriate model can be selected while preventing over-training.

These algorithms are useful for estimating the VAR model and selecting the order. It is important to select an appropriate algorithm depending on the nature and purpose of the data. For large-scale VAR models, it is necessary to pay attention to numerical stability and computational load.

Libraries and platforms used for vector autoregressive models

Various programming languages, statistical packages, and libraries are available to implement vector autoregressive models (VAR models). The following are examples of typical programming languages and libraries.

<Python>

The major libraries for implementing VAR models in Python include

statsmodels: a library for statistical models in Python that supports estimation and analysis of VAR models.
pmdarima: a library dedicated to the estimation of autoregressive moving average models (ARIMA), but which can also be used for VAR models.

<R>

The following packages are used to implement VAR models in R.

vars: This package is used to estimate and analyze VAR models in R. It is suitable for the analysis of multivariate time series data.
tsDyn: This package is for time series analysis, and can be used to estimate and analyze VAR models.

<MATLAB>

The following functions are available to implement VAR models in MATLAB.

varm: A function included in MATLAB’s statistical toolbox, used to estimate VAR models.

<Julia>

The following libraries are available to implement VAR models in Julia.

TimeSeries: A library included in Julia’s statistical toolbox that supports analysis of time series data, including VAR models.

These libraries and packages can be used to easily implement and analyze VAR models. As a platform, integrated development environments (IDEs) such as Jupyter Notebook and RStudio can be used to smoothly execute and visualize the code.

Application of vector autoregressive models

Vector autoregressive models have been widely applied in a variety of fields, including economics, finance, meteorology, and social sciences. Some specific applications are listed below.

Macroeconomics: In economics, VAR models are used to model economic indicators such as GDP, inflation, unemployment, and interest rates.
Finance: In the field of finance, VAR models can be used to model time-series data such as stock prices, exchange rates, and interest rates to understand market trends and correlations. VAR models are also used for risk management and portfolio optimization.
Meteorology: In the field of meteorology, VAR models are used to model weather factors (temperature, humidity, pressure, etc.). This allows us to understand the interrelationships of weather variability and climate change, and to analyze weather forecasts and the effects of climate change.
Social Sciences: In the social sciences, socioeconomic variables such as demographics, labor markets, education, and health are sometimes analyzed with VAR models. This allows us to understand social interrelationships and influences, which contributes to policy making and improvement of social systems.
Marketing: In the field of marketing, VAR models can be used to evaluate marketing strategies and forecast market trends by modeling advertising expenditures, sales volume, and competitor trends.

VAR models are actually used in a variety of fields and are widely used as a useful method for modeling and forecasting time-series data, taking into account the interrelationships among multiple variables.

Finally, examples of these implementations are discussed.

For an example implementation of a vector autoregressive model in python

For Python implementations, a library called statsmodels can be used to implement VAR models. statsmodels is a package for handling statistical models and can be used to estimate and predict VAR models.

The following is an example of implementing a VAR model using statsmodels, specifically estimating and predicting a two variable VAR model.

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Creating dummy data
# Example of creating 3-dimensional data with two variables
np.random.seed(0)
n_obs = 100
data = np.random.randn(n_obs, 2)
df = pd.DataFrame(data, columns=['var1', 'var2'])

# VAR model estimation
lag_order = 2  # Specify the order of the VAR model
model = sm.tsa.VAR(df)
result = model.fit(lag_order)

# Display VAR model parameters
print(result.summary())

# Prediction by VAR model
forecast_period = 10  # Specify future time period
forecast = result.forecast(df.values[-lag_order:], steps=forecast_period)

# Displays forecast results
print("Prediction by VAR model:")
print(forecast)

In this example, dummy data is generated to create a VAR model, and the model is estimated using the VAR class. fit method estimates the parameters of the VAR model, and the forecast method predicts future values.

It is important to note that the order of the VAR model (lag_order) and appropriate pre-processing of the data must be adjusted appropriately for the actual data, and further analysis is required to interpret and evaluate the results of the VAR model.

For an example implementation of a vector autoregressive model in R

The R language allows us to implement vector autoregressive models (VAR models) using the vars package. vars package provides a useful tool for multivariate time series analysis, and below we describe an example implementation of a VAR model using R.

First, install and load the vars package.

install.packages("vars") # Install vars package
library(vars) # Load the vars package

Next, dummy data is generated to estimate the VAR model.

set.seed(0)
n_obs <- 100
data <- matrix(rnorm(n_obs*2), nrow=n_obs) # Create data with two variables

# VAR model estimation
lag_order <- 2 # Specify the order of the VAR model
var_model <- VAR(data, p=lag_order, type="const") # Include constant terms with type="const

# View a summary of the VAR model
summary(var_model)

# Prediction by VAR model
forecast_period <- 10 # Specify future time period
forecast <- predict(var_model, n.ahead=forecast_period)

# Displays forecast results
print("Prediction by VAR model:")
print(forecast)

In this example, the matrix function is used to generate dummy data with two variables, and the VAR function is used to estimate the VAR model. p argument is the lag_order of the VAR model, and type argument is “const” to estimate the VAR model including the constant term. The estimation and prediction results of the VAR model can be displayed using the summary and predict functions.

Reference Information and Reference Books

For more details on time series data analysis, see “Time Series Data Analysis. Please refer to that as well.

Reference book is “Practical Time-Series Analysis: Master Time Series Data Processing, Visualization, and Modeling using Python“

“Time Series Analysis Methods and Applications for Flight Data“

“Time series data analysis for stock indices using data mining technique with R“

“Time Series Data Analysis Using EViews“

“Practical Time Series Analysis: Prediction with Statistics and Machine Learning“

“Vector Autoregressive Models for Multivariate Time Series” by Patrick T. Brandt and John T. Williams

“Time Series Analysis” by James D. Hamilton

“New Introduction to Multiple Time Series Analysis” by Helmut Lütkepohl

“Applied Time Series Econometrics” by Helmut Lütkepohl and Markus Krätzig

“Forecasting, Structural Time Series Models and the Kalman Filter” by Andrew C. Harvey