State Space Model with Clojure: Implementation of Kalman Filter
The state-space model described in “Time Series Data Analysis (1) State Space Model” was mainly used as a tool for analyzing changes on a time axis, but in fact, as described in “Difference between Hidden Markov Model and State Space Model and Parameter Estimation of State Space Model“, it can also be applied as a continuous model of state changes not specified on a time axis. The state-space model can also be applied as a continuous state change model that does not specify a time axis. In this article, we describe the implementation of the Kalman filter, one of the applications of the state-space model, in Clojure.
The Kalman filter is an infinite impulse response filter used to estimate time-varying quantities (e.g., position and velocity of an object) from discrete observations with errors, and is used in a wide range of engineering fields such as radar and computer vision due to its ease of use. Specific examples of its use include integrating information with errors from device built-in accelerometers and GPS to estimate the ever-changing position of vehicles, as well as in satellite and rocket control.
The Kalman filter is a state-space model with hidden states and observation data generated from them, similar to the hidden Markov model described previously, in which the states are continuous and changes in the state variables are statistically described using noise following a Gaussian distribution.
The Linear Dynamic System (LDS) model used in the Kalman filter is expressed by the following equation
\[p(y,x)=p(y_1|x_1)p(x_1)\displaystyle\prod_{t=2}^Tp(y_t|x_t)p(x_t|x_{t-1})\quad(1)\]
In LDS, the likelihood p(yt|xt), conditional distribution p(xt|xt-1), and prior distribution p(x1) all take the form of (multivariate) normal distributions, so the posterior distribution can be calculated analytically using forward-backward (message propagation) calculation.
We now define the stochastic generative model as follows.
\[x_0\sim Norm(\mu_0,V_0)\\x_t=A\cdot x_{t-1}+\delta_{t-1}\quad \delta_{t-1}\sim Norm(0,C)\\y_t=C\cdot x_t+\epsilon_t\quad \epsilon_t\sim Norm(0,R)\quad(2)\]
The respective varameters are as follows.
- μo: Mean value of the initial condition
- V0: Variance of the initial state
- A: Transition matrix
- Q: Transition variance matrix
- C: the observed value generator matrix
- R: the observed variance matrix
As an example, consider the task of estimating the trajectory of a noisy periodic motion in a two-dimensional latent space that gives rise to a 20-dimensional signal in the observed space. To make it periodic, we assume that the transition matrix is a simple rotation as follows and that the covariance between the transition and the observation is diagonal.
\[A=\begin{bmatrix} cos\omega&-sin\omega\\sin\omega&cos\omega\end{bmatrix}\quad Q=qI_2\quad C_{1,2}\sim Dirichlet(0.1,\dots,0.1)\quad R=rI_D\quad(3)\]
See “Getting Started with Clojure” for the Clojure environment settings and “Probabilistic Programming with Clojure” for the probabilistic programming environment settings.
First, create a project file with “lein new app gr-test (arbitrary project name)” and set the project.clj file in the project folder as follows
(defproject gr-anglican "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
:url "https://www.eclipse.org/legal/epl-2.0/"}
:dependencies [[org.clojure/clojure "1.11.1"]
[anglican "1.1.0"]
[org.clojure/data.csv "1.0.1"]
[net.mikera/core.matrix "0.62.0"]
;[plotly-clj "0.1.1"]
;[org.nfrac/cljbox2d "0.5.0"]
;[quil "3.1.0"]
]
:main ^:skip-aot gr-anglican.core
:target-path "target/%s"
:plugins [[org.clojars.benfb/lein-gorilla "0.6.0"]]
:profiles {:uberjar {:aot :all
:jvm-opts ["-Dclojure.compiler.direct-linking=true"]}})
At the top of the directory you created, type “lein gorilla” in the terminal to run the gorilla repl server, and enter the URL displayed (http://127.0.0.1:49259(automatically generated address)/worksheet.html) in your browser to run Enter the URL ((auto-generated address)/worksheet.html) in your browser to run Gorilla Repl.
First, set up the main namespace and install the library.
(ns kalman-test
(:require [gorilla-plot.core :as plot]
[gorilla-repl.image :as image]
[clojure.java.io :as io]
[clojure.core.matrix :as mat
:refer [matrix identity-matrix zero-vector
shape inverse transpose
mul mmul add sub div]]
:reload)
(:import [javax.imageio ImageIO])
(:use [anglican core emit runtime stat]))
Next, the input data and model are set up. Set the parameters of equation (3) to ω=0.1256…, q=0.01, and r=0.01.
;; Utility
(mat/set-current-implementation :vectorz)
(def +plot-colors+ ["#05A" "#A05" "#0A5" "#50A" "#A50" "#5A0"])
;; number of time points
(def +T+ 100)
;; dimension of observation space
(def +D+ 20)
;; dimension of latent space
(def +K+ 2)
;; parameters
(def +init-state+ [0. 1.])
(def +omega+ (/ (* 4.0 Math/PI) +T+))
(def +trans-matrix+ (rotation-matrix +omega+))
(def +q+ 0.01)
(def +trans-cov+ (mul +q+ (identity-matrix +K+)))
(def +obs-matrix+
(transpose
(matrix
(repeatedly 2 #(sample* (dirichlet (repeat +D+ 0.1)))))))
(def +r+ 0.01)
(def +obs-cov+ (mul +r+ (identity-matrix +D+)))
(def +init-mean+ +init-state+)
(def +init-cov+ +trans-cov+)
(def +parameters+ [+init-mean+ +init-cov+
+obs-matrix+ +obs-cov+
+trans-matrix+ +trans-cov+])
;; sample state sequence and observations
(def +states+
(let [trans-dist (mvn (zero-vector +K+) +trans-cov+)]
(reduce (fn [states _]
(conj states
(add (mmul +trans-matrix+ (peek states))
(sample* trans-dist))))
[+init-state+]
(range 1 +T+))))
(def +observations+
(let [obs-dist (mvn (zero-vector +D+) +obs-cov+)]
(map (fn [state]
(add (mmul +obs-matrix+ state)
(sample* obs-dist)))
+states+)))
Plot the state of the latent space.
(defn get-limits
([points pad]
(let [xs (mat/reshape points [(mat/ecount points)])
x-min (reduce min xs)
x-max (reduce max xs)
x-pad (* pad (- x-max x-min))]
[(- x-min x-pad) (+ x-max x-pad)]))
([points]
(get-limits points 0.1)))
(def +x-limits+ (get-limits +states+))
(def +y-limits+ (get-limits +observations+))
;; plot latent states
(plot/list-plot (map #(into [] %) +states+)
:joined true
:color (first +plot-colors+)
:plot-range [+x-limits+ +x-limits+])
The result plotted will be the following, noisy, circular data.
Plot the data for the first six dimensions of the observation space.
;; plot observations
(apply plot/compose
(for [[ys color] (map vector
(transpose +observations+)
+plot-colors+)]
(plot/list-plot ys
:joined true
:color color
:plot-range [:all +y-limits+])))
Based on this observed data, the parameters of the latent space are estimated. First, we set up an inference query for the Kalman filter.
(with-primitive-procedures [mmul add sub shape matrix identity-matrix zero-vector]
(defquery kalman
"A basic Kalman smoother. Predicts a state sequence
from the posterior given observations"
[observations obs-matrix obs-cov
trans-matrix trans-cov init-mean init-cov]
(let [;; D is dimensionality of data,
;; K is dimensionality of latent space
[D K] (shape obs-matrix)
;; prior on observation noise
obs-dist (mvn (zero-vector D) obs-cov)
;; prior on initial state
start-dist (mvn init-mean init-cov)
;; prior on transition noise
trans-dist (mvn (zero-vector K) trans-cov)]
(matrix
(reduce (fn [states obs]
(let [;; sample next state
prev-state (peek states)
state (if prev-state
(add (mmul trans-matrix prev-state)
(sample trans-dist))
(sample start-dist))]
;; observe next data point (when available)
(observe (count states)
obs-dist
(sub (mmul obs-matrix state) obs))
;; append state to sequence and continue with next obs
(conj states state)))
;; start with empty sequence
[]
;; loop over data
observations)))))
Next, we reason with them, which takes 9498.048292 msecs in a Mac M1 environment.
(def samples
(->> (doquery :smc
kalman
[+observations+
+obs-matrix+ +obs-cov+
+trans-matrix+ +trans-cov+
+init-mean+ +init-cov+]
:number-of-particles 1000)
(take 10000)
doall
time))
Calculate the mean and variance parameters of the posterior distribution.
(defn empirical-moments [predict-id samples]
(let [weighted-states (collect-by predict-id samples)]
{:mean (empirical-mean weighted-states)
:var (empirical-variance weighted-states)}))
(def moments (empirical-moments :result samples))
Output the results.
(defn plot-moments [moments color]
(let [mean (:mean moments)
err (mul 2 (mat/sqrt (:var moments)))
[T K] (shape mean)]
(plot/compose
(plot/list-plot (into [] (mat/slice mean 1 0))
:joined true
:plot-range [:all +x-limits+]
:color color)
(plot/list-plot (into [] (mat/slice (add mean err) 1 0))
:plot-range [:all +x-limits+]
:joined true
:color (second +plot-colors+))
(plot/list-plot (into [] (mat/slice (sub mean err) 1 0))
:plot-range [:all +x-limits+]
:joined true
:color (second +plot-colors+))
(plot/list-plot (into [] (mat/slice mean 1 1))
:plot-range [:all +x-limits+]
:joined true
:color color)
(plot/list-plot (into [] (mat/slice (add mean err) 1 1))
:plot-range [:all +x-limits+]
:joined true
:color (second +plot-colors+))
(plot/list-plot (into [] (mat/slice (sub mean err) 1 1))
:plot-range [:all +x-limits+]
:joined true
:color (second +plot-colors+)))))
It can be confirmed that the observed data with noise is output with a certain mean and variance.
(plot-moments moments (nth +plot-colors+ 0))
For comparison, the calculation is performed with the Forward-Backward Algorithm (FBA), which is an inference algorithm for hidden Markov models.
;; Parameters (following Matlab implementation by Murphy)
;;
;; http://www.cs.ubc.ca/~murphyk/Software/Kalman/kalman.html
;;
;; m0 init-mean
;; V0 init-cov
;; C obs-matrix
;; R obs-cov
;; A trans-matrix
;; Q trans-cov
;; Filtering notation
;;
;; y observation y(t)
;; mf filtering expectation mf(t) = E[x(t) | y(1:t)]
;; Vf filtering covariance Vf(t) = Cov[x(t) | y(1:t)]
;; mp prior predictive expectation mp(t) = E[x(t) | y(1:t-1)]
;; Vp prior predictive covariance Vp(t) = Cov[x(t) | y(1:t-1)]
(defn kalman-filter
[observations
obs-matrix obs-cov
trans-matrix trans-cov
init-mean init-cov]
(let [;; set-up some shorthands
C obs-matrix
Ct (transpose C)
R obs-cov
A trans-matrix
At (transpose A)
Q trans-cov
I (apply identity-matrix (shape init-cov))
z (zero-vector (first (shape obs-matrix)))]
(loop [ys observations
mp init-mean
Vp init-cov
mfs []
Vfs []
cs []]
(if-let [y (first ys)]
(let [S (add (mmul C (mmul Vp Ct)) R)
Sinv (inverse S)
;; K(t) = Vp(t) Ct (C Cp Ct + R)^-1
K (mmul Vp (mmul Ct Sinv))
;; mf(t) = mp(t) + K(t) (y(t) - C mp(t))
e (sub y (mmul C mp))
mf (add mp (mmul K e))
;; Vf(t) = (I - K(t) C) Vp(t)
Vf (mmul (sub I (mmul K C)) Vp)
;; c(t) = N(y(t) | C mp(t), C Vp Ct + R)
c (observe* (mvn z S) e)]
(recur (rest ys)
;; mp(t+1) = A mf(t)
(mmul A mf)
;; Vp(t+1) = A Vf At + Q
(add (mmul A (mmul Vf At)) Q)
(conj mfs mf)
(conj Vfs Vf)
(conj cs c)))
[mfs Vfs cs]))))
;; Backward pass notation
;;
;; ms smoothing expectation ms(t) = E[x(t) | y(1:T)]
;; Vs smoothing covariance Vs(t) = Cov[x(t) | y(1:T)]
;; mpp predictive expectation mpp(t) = mp(t+1) = E[x(t+1) | y(1:t)]
;; (shifted by one relative to mp in filtering pass)
;; Vpp predictive covariance Vpp(t) = Vp(t+1) = Cov[x(t+1) | y(1:t)]
;; (shifted by one relative to Vp in filtering pass)
(defn kalman-backward
[trans-matrix trans-cov mfs Vfs]
(let [;; set-up shorthands
A trans-matrix
At (transpose A)
Q trans-cov]
(loop [mfs (pop mfs)
Vfs (pop Vfs)
mss (list (peek mfs))
Vss (list (peek Vfs))]
(if (empty? mfs)
[mss Vss]
(let [;; mpp(t) = mp(t+1) = A mf(t)
mpp (mmul A (peek mfs))
;; Vpp(t) = Vp(t+1) = A Vf(t) At + Q
Vpp (add (mmul A (mmul (peek Vfs) At)) Q)
;; J(t) = Vf(t) At Vp(t+1)^-1
J (mmul (peek Vfs) (mmul At (inverse Vpp)))
Jt (transpose J)
;; ms(t) = mf(t) + J (ms(t+1) - mp(t+1))
ms (add (peek mfs) (mmul J (sub (peek mss) mpp)))
;; Vs(t) = Vf(t) + J (Vs(t+1) - Vp(t+1)) Jt
Vs (add (peek Vfs) (mmul J (mmul (sub (peek Vfs) Vpp) Jt)))]
(recur (pop mfs)
(pop Vfs)
(conj mss ms)
(conj Vss Vs)))))))
(defn kalman-smoother
[observations
obs-matrix obs-cov
trans-matrix trans-cov
init-mean init-cov]
(let [[mfs Vfs cs] (kalman-filter observations
obs-matrix obs-cov
trans-matrix trans-cov
init-mean init-cov)
[mss Vss] (kalman-backward trans-matrix trans-cov mfs Vfs)]
[mss Vss cs]))
Calculate the mean and variance parameters of the posterior distribution using them.
(def true-moments
(time
(let [[mean covar _]
(kalman-smoother +observations+
+obs-matrix+ +obs-cov+
+trans-matrix+ +trans-cov+
+init-mean+ +init-cov+)]
{:mean (matrix mean)
:var (matrix (map mat/diagonal covar))})))
Comparison of stochastic generative model and FBA. Blue is the result with the stochastic model and green is the result with FBA.
(plot/compose
(plot/list-plot (map #(into [] %) (:mean true-moments))
:joined true
:plot-range [+x-limits+ +x-limits+]
:color (nth +plot-colors+ 2))
(plot/list-plot (map #(into [] %) (:mean moments))
:joined true
:plot-range [+x-limits+ +x-limits+]
:color (nth +plot-colors+ 0)))
We can confirm that the trajectory of the original data is reproduced and that the results of the stochastic model and FBA are consistent.
Observational data shows similar results.
(plot-moments true-moments (nth +plot-colors+ 2))
We will now modify the original model to perform parameter estimation. One of the advantages of stochastic programming systems is that it is almost trivial to encode assumptions about the data. In this case, we can reduce the complexity of the problem by using the diagonal matrix form of A instead of estimating the full matrix, and by substituting directly into the program code using the rotated and diagonal matrix forms of Q.
(with-primitive-procedures [mmul mul add sub shape matrix identity-matrix zero-vector rotation-matrix]
(defquery kalman-params
"A basic Kalman smoother. Predicts a state sequence from the posterior
given observations"
[observations obs-matrix obs-cov init-mean init-cov]
(let [;; D is dimensionality of data,
;; K is dimensionality of latent space
[D K] (shape obs-matrix)
;; prior on observation noise
obs-dist (mvn (zero-vector D) obs-cov)
;; prior on initial state
start-dist (mvn init-mean init-cov)
;; sample transition matrix
omega (sample (gamma 1.0 10.0))
trans-matrix (rotation-matrix omega)
;; sample transition covariance
q (sample (gamma 1.0 100.0))
trans-cov (mul (identity-matrix K K) q)
;; prior on transition noise
trans-dist (mvn (zero-vector K) trans-cov)]
;; loop over observations
(reduce (fn [states obs]
(let [;; sample next state
prev-state (peek states)
state (if prev-state
(add (mmul trans-matrix prev-state)
(sample trans-dist))
(sample start-dist))]
;; observe next data point (when available)
(observe (count states)
obs-dist
(sub (mmul obs-matrix state) obs))
;; append state to sequence and continue with next obs
(conj states state)))
;; start with empty sequence
[]
;; loop over data
observations)
;; output parameters
{:omega omega
:q q})))
サンプル推定を行う。所要時間は9909.22075 msecsとなる。
(def param-samples
(->> (doquery :smc
kalman-params
[+observations+
+obs-matrix+ +obs-cov+
+init-mean+ +init-cov+]
:number-of-particles 10000)
(take 10000)
doall
time))
Output an estimate of ω.” [0.12566370614359174 {:mean 0.12840811816524908, :var 8.02931252691968E-5}]” is obtained and we can deduce that the default value of 0.1256… is given with variance.
[+omega+ (empirical-moments (comp :omega :result) param-samples)]
Then output an estimate of q.” [0.01 {:mean 0.013420267138197581, :var 1.564053612139262E-5}]”We can see that the default value of 0.01 is given with variance.
[+q+ (empirical-moments (comp :q :result) param-samples)]
コメント