Reading notes for Iwanami Data Science Series “Time Series Analysis

Machine Learning Artificial Intelligence Digital Transformation ICT Sensor Data & IOT ICT Infrastructure Stream Data Processing Probabilistic Generative Model Support Vector Machine Sparse Modeling Anomaly and Change Detection Relational Data Learning Time Series Data Analysis Navigation of this blog
Summary

Time-series data is called data whose values change over time, such as stock prices, temperatures, and traffic volumes. By applying machine learning to this time-series data, a large amount of data can be learned and used for business decision making and risk management by making predictions on unknown data.

Typical methods used include ARIMA (autoregressive sum moving average model), LSTM (long short-term memory model), Prophet (a library developed by Facebook that specializes in forecasting time-series data), and state-space models. These methods are prediction methods based on machine learning, which learns from past time-series data to predict the future.

Here, we describe the application of state-space models to time-series data analysis, based on Iwanami Data Science Series: “Time-Series Analysis: State Space Models, Causal Analysis, and Business Applications.

The state-space model is one of the most commonly used statistical models and a general-purpose framework for analyzing time-series data. In the state space model, observed time series data are considered to be generated by some stochastic process, and a mathematical model is constructed to describe the process. It also assumes a non-observable variable called “state variable” that governs the stochastic process, and considers a transition model that represents the temporal change of the state variable and an observation model that generates observed values from the state variable.

In this article, we discuss the reading notes.

Reading notes for Iwanami Data Science Series “Time Series Analysis

Reading notes are provided below.

State Space Model

1. state space model

Overview
A state space model is a model for handling many time series models in a unified manner
Handles state estimation issues such as time series prediction, interpolation, component decomposition, and varameter estimation

(1) What is state? Prediction and state of a time series
A record of a phenomenon that fluctuates with time, such as temperature or stock prices, is called a time series
Observed value at time n yn
In a physical system, past information is conveyed to the future via the present
Information about future movements in the past is aggregated at the present time.
Aggregated information is called state
If there is a state, the future can be predicted only by discarding information from the past.

(2) State Space Model
How is a time series represented using states?
How is state xn represented at the next time when state xn-1 is known?
State Space Model, Observation Model, System Model
System Model
xn=Fn(xn-1)+Gn(vn)
Gn(vn): noise
Observation model
yn=Hn(xn)+wn
State Space Model
Coupled observation model and system model
General form
System model
Xn ~ qn(|xn-1)
Observation model
Yn ~ rn(|xn)
Example 1: Linear-Gaussian state-space model
Functions Fn,Gn,Hn are linear functions
Fn(x)=Fnx, Gn(v)=Gnv, Hn(x)=Hnx
System Model
xn=Fnxn-1 + Gnvn
Observation model
yn=Hnxn + wn
Example 2: AR (autoregressive) model
Second-order AR (autoregressive) model
Yn = a1yn-1 + a2yn-2 + vn vn ~ N(0, σ2)
Linear Gaussian space model
It is popular to divide a time series into frequency components to capture the variability characteristics of the time series
When a Fourier transform is performed to obtain a periogram, many peaks appear, making it difficult to determine which peaks are meaningful.
Smoothing of periodograms has a strong empirical component
Once the order m of the AR model is obtained, the power spectrum at frequency f is automatically obtained.
AR order must be at least 2k for spectrum to have k peaks
Example 3: Component Decomposition Model
A time series yn with J components xn(j), j=1,…. J and the observation noise wn
Yn = xn(1) +… +xn(J) +wn
xn(j) is represented by a state-space model
System model
xn(j) = F(j)xn-1 +G(j)vn
Observation model
Yn = H(j)xn +wn
State Space Model
Xn =Fxn-1 + Gvn
F,G,H
Example 4: Time-varying coefficient model
In the autoregressive model, the autoregressive coefficients aj are replaced by coefficients an,j that vary with time
an,j = an-1,j +vn,j vn,j=N(0,τ2)
Autoregressive model and random walk model described in “Overview of Random Walks, Algorithms, and Examples of Implementations” for coefficient changes using state space model
Xn = Fxn-1 + Gvn
vn~N(0,σ2)
Yn = Hnxn +wn
wn~N(0,τ2)
Example 5: Time-varying variance model
If the time series yn follows a normal distribution, but its variance changes from moment to moment and yn=εn,εn~N(0,σn2)
Variance σn2 changes with time logσn2 = logσn-12 +vn
If the state is xn=logσn2, the state space model is
xn=xn-1 + vn
Yn = exp(xn/2)wn
Wn is the standard normal distribution
Stationary time series model
Stochastic structures such as mean, variance, and covariance functions do not change with time
Non-stationary time series model
Those for which stationarity does not hold.

2. filtering – sequential estimation of states

(1) State estimation problem

Time series of observed values Yj={y1,… yn} and a state space model
The problem of estimating the state xn is called “state estimation
Depending on the relationship between the last time n of the observed values and the time j of the state to be estimated, there are three cases
For n<j : Prediction
For n=j: Filter
For n>j: Smoothing
Not only the predicted value of a state, but also its variance or standard deviation, must be known in order to help make a rational decision.
Even if the forecast for tomorrow’s temperature is the same (10 degrees Celsius), the judgment will be different if the standard deviation is 1 and Nuwa’s.
Predicted values and variances are also needed for likelihood

(2) Sequential filter and smoothing

Given an observation Yj, the problem of state estimation becomes the problem of finding the conditional distribution p(xn|Yj) of state xn under Yj
The predictive distribution is p(xn|xn-1,Yn-1)=p(xn|xn-1)
Can be calculated sequentially using the following formulas
one-period-ahead forecasting
Given a filter distribution at time n-1, the predictive distribution at time n can be calculated
Filter
Can be obtained from the predicted distribution and the latest observed distribution
Sequential filters can be created by repeating the prediction and filter operations
Smoothing Algorithm
Fixed interval smoothing
Fixed lag smoothing
Fixed point smoothing

(3) Linear and Gaussian state-space models and Kalman filter

Called “linear-Gaussian state-space model” when the functions Fn and Hn are linear and the white noise vn and wn follow a normal distribution
Kalman filter algorithm

(4) Likelihood calculation and parameter tests for time series models

The likelihood of a time series model with parameter θ is a time series Yn={y1,…. yn}, the likelihood of the time series model with parameter θ is L(Θ)=fN(YN|Θ) using the simultaneous probability density function.

(5) Component decomposition by state space model Seasonal adjustment

Applications of linear and Gaussian state-space models
Component decomposition of economic time series
Monthly economic time series show medium- to long-term trends such as upswings and downswings, and annual cyclical fluctuations that repeat similar patterns every year.
A method to decompose an observed time series into a trend component called the trend, a seasonal component, and chance variation.
Seasonal adjustment method
Typical Trend Models
Second-order trend model: tn=2tn-1-tn-2+vn
The “seasonal component” is the pattern of variation in the time series that appears repeatedly every year.
Add a “day-of-week effect term
Day of the week effect in the state space model
Dealing with missing values
If an observation is missing in the sequential filter, simply omit the corresponding step
Smoothing after filtering to interpolate missing values

(6) Filtering nonlinear and non-Gaussian state-space models

Overview
In linear and Gaussian state-space models, the conditional distribution of states p(xn|Yj) is normally distributed and can be efficiently computed by the Kalman filter

1. by analytical approximation

If the model is nonlinear or the noise distribution is non-Gaussian, another model is needed
Gaussian approximation
Mixed Gaussian approximation
Approximated by sum of M normal distributions φi
Kalman filter can be applied
Computational complexity is enormous

2. numerical approximation method

Staircase function approximation
Approximated by a staircase function with n quantiles
Can approximate arbitrary distributions with fairly high accuracy when the number of quantiles is several hundred or more
Difficult to apply to state-space models with more than 4 dimensions

3. method by particle approximation

Particle approximation
Conceptual diagram
Represents a distribution using a large number of particles that can be considered independently generated
Many particles are concentrated where the density function is high

(7) Particle filter

Overview
Particle Filters
Approximates the distribution of predictions, filters, and smoothing using m particles
Create a staircase function that steps by 1/m at each particle point and call it the “empirical distribution function
Converges to true distribution function as number of particles m increases
Depending on the complexity of the model and the system required, a number of particles m between 1,000 and 100,000 is used.
Algorithm for a particle filter to sequentially generate particles that approximate the predicted and filtered distributions
Particles following the predicted distribution {pn(1),… ,pn(m)} are the particles {fn-1(1),…,pn(m)} of the filter one period ago. ,fn-1(m)} can be generated from the filter particles {fn-1(1),…,fn-1(m)} of the previous period.
Filter particles {fn(1),… ,fn(m)} are the particles of the predictive distribution {pn(1),… pn(m)} of the predictive distribution.

General Prediction
The prediction step follows the conditional distribution p(xn-1|Yn-1) of the state xn-1 one period ago
Particle {fn-1(1),… ,fn-1(m)} and the particles {vn(1),…,fn-1(m)} with system noise vn ,vn(m)}, respectively, are assumed to be given m
Define new particle pn(j) by pn(j)=Fn(fn-1(j), vn(j))

Filter
Filter Steps
Why is resampling necessary?
If only the prediction and filter steps are repeated without resampling, the weights of many particles will be zero and the distribution will be degenerate
The purpose of resampling is to reduce the weights {αn(1),… ,αn(m)} with weights {pn(1),…,αn(m)}. pn(m)} with equal weights in the form of an empirical distribution function.
Rigorous random sampling is not always necessary
Reapproximation can be performed more accurately than simple sampling by performing stratified extraction

Particle Filter Algorithm
Particle Filter Algorithm

(8) Smoothing

Application of particle filter to smoothing

(9) Example Estimation of seismic wave variations

Example of seismic wave observation record
The arrival of P- and S-waves causes a significant change in the number of minutes
How to estimate the changing variance (variable transformation zn=log((y2n-12+y2n)/2)
The magnitude of variance of the original series corresponds to the height of the series
Even if the variance is normally distributed in the original data, it follows a double exponential distribution after transformation
Trend Analysis

Effect of particle m
When the number of particles is set to 1/000
Even with m=100, median can be estimated, but distribution can no longer be determined

What can be done with a nonlinear state space model?

(1) Self-organizing state-space model

Simultaneous state and parameter estimation

(2) Non-linear smoothing

(3) Non-Gaussian distribution model

Defective value handling
Is resampling necessary?
Is faithful random sampling necessary in resampling?
Should the lag L be large?

State Space Model with R

1. dim package

Overview
R package developed by Giovanni Petris
Handles linear and normally distributed dynamic linear models
Model Notation
yt = Ftθt + vt vt ~ N(0,Vt)
Yt is the observed value at time t
Θt is the state at time t
vt is observed noise
Ft is the coefficient matrix
Vt is the variance-covariance matrix of the observed noise
Θt =Gtθt-1 + wt wt ~ N(0,Wt)
wt is system noise
Gt is the coefficient matrix
Wt is the variance-covariance matrix of the system noise
Θ0 N(m0, C0)
m0 is a vector of expected initial state
C0 is the variance-covariance matrix of the initial state

(1) Local level model with dim

Sample data
Model with random walk only at the level (first-order difference model)
yt,θt,Ft,Gt,Vt,Wt are all scalars
Ft,Gt,Vt,Wt are all constant regardless of time t (F,G,V,W)
Use dlmModPoly function when dealing with local models in dlm
Construct a dynamic linear model expressed in polynomial form
An Order argument of 1 specifies a local-level model.
dlm builds models as objects and operates on them
Variance of observed noise (dV) and variance of system noise (dw) must be given
Assume 1 for the time being.

(2) Seasonal adjustment model by dim

2. the KFAS package

(1) Seasonal adjustment model by KFAS

(2) Poisson distribution model

3. conclusion

Parameter estimation for state-space model

Try to implement a particle filter

Program in R language
Program Description
Try to run it
Additional Explanation
Try using the Cauchy distribution
The concept of “data assimilation
What’s ahead

Comparison of calculation methods for estimation

Applications of State Space Models to Marketing

1. market response model basic structure

2. model extensions

(1) Evolution and deepening
(2) Make regression coefficients trivial
(3) Incorporate latent structure

3. analysis examples

4. summary

Causal Inference with VAR Models

1. data preprocessing

Interpolation of missing values
Ensure stationarity

VAR model estimation and lag selection

Estimation of VAR model
Lag selection for VAR model

3. estimation of constrained VAR models

Causal Estimation and Impulse Response Functions

Causal Estimation
Impulse response function
In lieu of a summary

Hidden Markov and state-space models

Reconstructing the Shape of Objects from Time Series Data An Invitation to Temporal Astronomy
What is an accretion disc?
Oscillation Phenomena Reflecting the Shape of the Accretion Disk
New Tomographic Techniques for Accretion Disks
The accretion disc of V455 Andromeda

Simulation and Data Science

Simulation, Data Assimilation, and Emulation

1. interpolation and extrapolation
2. data assimilation (data-a-simulation)
3. state-space models and sequential filters
4. emulation
5. coupling with DNN

Use of Emulators

Weather Forecasting and Data Science

The Quiet Revolution

1. how weather forecasting works
2. data assimilation
3. typhoon forecast and mobile observation
The Weather Forecaster’s Job

Shaking Proteins and Aging Me

Proteins Fold
Proteins waver
Proteins Misfold
Protein Simulation
Multivariate analysis of simulations
Extract dynamic information from time series
Generalized Eigenvalue Problem

What is an “inverse problem” in molecular simulation?

Realistic SimCity Dreams

1. SimCity as a problem-solving tool
2. challenges in realizing a realistic SimCity
3. technology for realistic SimCity
4. conclusion

Dreams, Brain, and Machine Learning

Physiology of Dreams
Dream Functions and Freud’s Dream Theory
Dreams and the “Generative” Model
Dream Content and the Brain
Brain Decoding
Experiments in Dream Brain Measurement
Decoding of Dreams
Conclusion
Interesting Sudoku made with a calculator

コメント

タイトルとURLをコピーしました