Example implementation for general time series analysis using R or Python

Machine Learning Artificial Intelligence Digital Transformation ICT Sensor Data & IOT ICT Infrastructure Stream Data Processing Probabilistic Generative Model Support Vector Machine Sparse Modeling Anomaly and Change Detection Relational Data Learning Time Series Data Analysis Navigation of this blog
Overview of time series data analysis

Time-series data is called data whose values change over time, such as stock prices, temperatures, and traffic volumes. By applying machine learning to this time-series data, a large amount of data can be learned and used for business decision making and risk management by making predictions on unknown data.

Time-series data includes trends, seasonality, and random elements. Trends represent long-term trends, seasonality is cyclical patterns, and random elements are unpredictable noise. To take these factors into account, various methods are used in forecasting time-series data.

Among them, ARIMA described in “Examples of implementations for general time series analysis using R and Python, Prophet described in “Time series analysis using Prophet“, LSTM described in “Overview of LSTM and Examples of Algorithms and Implementations“,, and state-space models are representative methods used. These methods are prediction methods based on machine learning, which learns from past time-series data to predict the future.

Since time-series data includes time, in order to learn past data and predict the future, it is necessary to process the time-series data appropriately, for example, by decomposing the data into trend, seasonal, and residual components, which also requires various innovations.

Implementing Time Series Data Analysis in Python

Various libraries and tools exist for time series analysis using Python. Here, we describe a concrete implementation of time series analysis using pandas, one of the most popular libraries.

First of all, time series data is created using the datetime data type in pandas. This can be done, for example, by creating hourly data as follows.

import pandas as pd
import numpy as np

# Creation of time series data
date_rng = pd.date_range(start='1/1/2020', end='1/10/2020', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))

Next, we will visualize the time series data, which can be easily plotted using the plot method in pandas. The following is an example of visualization of the data created above.

import matplotlib.pyplot as plt

# Data Visualization
df.plot(x='date', y='data')
plt.show()

In time series analysis, it is important to decompose factors such as trend and seasonality, and PANDAS has the ability to decompose time series data using the seasonal_decompose method. The following is an example of decomposing the above data into trend, seasonality, and residuals.

from statsmodels.tsa.seasonal import seasonal_decompose

# Decomposition of time series data
result = seasonal_decompose(df['data'], model='additive', period=24)

# Visualization of decomposition results
result.plot()
plt.show()

Based on the results of trend and seasonality decomposition, forecasts can also be made using ARIMA models, etc. An example of a forecast using an ARIMA model is shown below.

from statsmodels.tsa.arima_model import ARIMA

# Model Definition and Learning
model = ARIMA(df['data'], order=(1, 1, 1))
model_fit = model.fit(disp=0)

# Obtain forecast results
forecast = model_fit.forecast(steps=24)

# Visualization of forecast results
plt.plot(df['data'])
plt.plot(forecast[0])
plt.show()

Although this was an example of implementing time series analysis using pandas, it is possible to implement time series analysis using other libraries such as statsmodels and scikit-learn.

Implementation of time series data analysis using R

The R language is one of the programming languages that are very suitable for time series analysis. The R language has a specialized package for time series analysis called “forecast”. In the following, we describe a concrete implementation example of time series analysis using the forecast package.

First of all, time series data are created. The following is an example of creating hourly data.

library(forecast)

# Creation of time series data
date_rng <- seq(as.POSIXct("2020-01-01 00:00:00"), 
                 as.POSIXct("2020-01-10 23:00:00"), 
                 by="hour")
df <- data.frame(date=date_rng, data=rnorm(length(date_rng)))

Next, we will visualize the time-series data; the ggplot2 package can be used to easily draw graphs in the R language. The following is an example of visualization of the data created above.

library(ggplot2)

# Data Visualization
ggplot(df, aes(x=date, y=data)) +
  geom_line() +
  xlab("Date") +
  ylab("Data")

In time series analysis, it is important to decompose factors such as trend and seasonality, and the forecast package has a function to decompose time series data using the stl() function. The following is an example of decomposing the above data into trend, seasonality, and residuals.

# Decomposition of time series data
result <- stl(df$data, s.window="periodic")

# Visualization of decomposition results
autoplot(result)

Based on the results of trend and seasonality decomposition, forecasts can also be made using ARIMA models, etc. The forecast package has a function to automatically select the parameters of an ARIMA model using the auto.arima() function. Below is an example of forecasting the above data using the ARIMA model.

# ARIMA Model Predictions
model <- auto.arima(df$data)
forecast <- forecast(model, h=24)

# Visualization of forecast results
autoplot(forecast)

In addition to this forecast package, there are numerous other packages suitable for time series analysis in R, such as the stats package and the TSA package.

Reference Information and Reference Books

For more details on time series data analysis, see “Time Series Data Analysis. Please refer to that as well.

Reference book is “Practical Time-Series Analysis: Master Time Series Data Processing, Visualization, and Modeling using Python

Time Series Analysis Methods and Applications for Flight Data

Time series data analysis for stock indices using data mining technique with R

Time Series Data Analysis Using EViews

Practical Time Series Analysis: Prediction with Statistics and Machine Learning

コメント

タイトルとURLをコピーしました