Overview of time series data analysis
Time-series data is called data whose values change over time, such as stock prices, temperatures, and traffic volumes. By applying machine learning to this time-series data, a large amount of data can be learned and used for business decision making and risk management by making predictions on unknown data.
Time-series data includes trends, seasonality, and random elements. Trends represent long-term trends, seasonality is cyclical patterns, and random elements are unpredictable noise. To take these factors into account, various methods are used in forecasting time-series data.
Among them, ARIMA described in “Examples of implementations for general time series analysis using R and Python”, Prophet described in “Time series analysis using Prophet“, LSTM described in “Overview of LSTM and Examples of Algorithms and Implementations“,, and state-space models are representative methods used. These methods are prediction methods based on machine learning, which learns from past time-series data to predict the future.
Since time-series data includes time, in order to learn past data and predict the future, it is necessary to process the time-series data appropriately, for example, by decomposing the data into trend, seasonal, and residual components, which also requires various innovations.
Implementing Time Series Data Analysis in Python
Various libraries and tools exist for time series analysis using Python. Here, we describe a concrete implementation of time series analysis using pandas, one of the most popular libraries.
First of all, time series data is created using the datetime data type in pandas. This can be done, for example, by creating hourly data as follows.
import pandas as pd
import numpy as np
# Creation of time series data
date_rng = pd.date_range(start='1/1/2020', end='1/10/2020', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))
Next, we will visualize the time series data, which can be easily plotted using the plot method in pandas. The following is an example of visualization of the data created above.
import matplotlib.pyplot as plt
# Data Visualization
df.plot(x='date', y='data')
plt.show()
In time series analysis, it is important to decompose factors such as trend and seasonality, and PANDAS has the ability to decompose time series data using the seasonal_decompose method. The following is an example of decomposing the above data into trend, seasonality, and residuals.
from statsmodels.tsa.seasonal import seasonal_decompose
# Decomposition of time series data
result = seasonal_decompose(df['data'], model='additive', period=24)
# Visualization of decomposition results
result.plot()
plt.show()
Based on the results of trend and seasonality decomposition, forecasts can also be made using ARIMA models, etc. An example of a forecast using an ARIMA model is shown below.
from statsmodels.tsa.arima_model import ARIMA
# Model Definition and Learning
model = ARIMA(df['data'], order=(1, 1, 1))
model_fit = model.fit(disp=0)
# Obtain forecast results
forecast = model_fit.forecast(steps=24)
# Visualization of forecast results
plt.plot(df['data'])
plt.plot(forecast[0])
plt.show()
Although this was an example of implementing time series analysis using pandas, it is possible to implement time series analysis using other libraries such as statsmodels and scikit-learn.
Implementation of time series data analysis using R
The R language is one of the programming languages that are very suitable for time series analysis. The R language has a specialized package for time series analysis called “forecast”. In the following, we describe a concrete implementation example of time series analysis using the forecast package.
First of all, time series data are created. The following is an example of creating hourly data.
library(forecast)
# Creation of time series data
date_rng <- seq(as.POSIXct("2020-01-01 00:00:00"),
as.POSIXct("2020-01-10 23:00:00"),
by="hour")
df <- data.frame(date=date_rng, data=rnorm(length(date_rng)))
Next, we will visualize the time-series data; the ggplot2 package can be used to easily draw graphs in the R language. The following is an example of visualization of the data created above.
library(ggplot2)
# Data Visualization
ggplot(df, aes(x=date, y=data)) +
geom_line() +
xlab("Date") +
ylab("Data")
In time series analysis, it is important to decompose factors such as trend and seasonality, and the forecast package has a function to decompose time series data using the stl() function. The following is an example of decomposing the above data into trend, seasonality, and residuals.
# Decomposition of time series data
result <- stl(df$data, s.window="periodic")
# Visualization of decomposition results
autoplot(result)
Based on the results of trend and seasonality decomposition, forecasts can also be made using ARIMA models, etc. The forecast package has a function to automatically select the parameters of an ARIMA model using the auto.arima() function. Below is an example of forecasting the above data using the ARIMA model.
# ARIMA Model Predictions
model <- auto.arima(df$data)
forecast <- forecast(model, h=24)
# Visualization of forecast results
autoplot(forecast)
In addition to this forecast package, there are numerous other packages suitable for time series analysis in R, such as the stats package and the TSA package.
Reference Information and Reference Books
For more details on time series data analysis, see “Time Series Data Analysis. Please refer to that as well.
Reference book is “
“
“
コメント