Weather Forecasting and Data Science

Machine Learning Artificial Intelligence Digital Transformation Probabilistic Generative Models Support Vector Machine Sparse Modeling Anomaly and Change Detection Relational Data Learning Time Series Data Analysis Simulation and Machine Learning Navigation of this blog
Summary

From Iwanami Data Science Series: “Time Series Analysis – State Space Model, Causal Analysis, and Business Applications. In the previous article, we discussed data assimilation and emulation as techniques for integrating machine learning with simulation. In this issue, we will discuss weather forecasting and data science.

Weather Forecasts and Simulations

There are several methods used by weather agencies and other institutions to produce weather forecasts. One of the main ones will be the kinematic method, which predicts future rainfall distribution by exterior from the current rain distribution. Currently, weather radar covers all of Japan, and rainfall conditions can be monitored almost in real time on the JMA website. Therefore, the movement of rain clouds can be determined from the differences in the time direction, and where these rain clouds will move in the future can be predicted.

The forecasts that the JMA publishes on its website as short-time forecasts are made in this way. On the other hand, kinematic forecasts are effective only for very short periods of time, within a few hours. This is because the winds themselves that drive the rain clouds change from moment to moment due to various physical processes. Therefore, forecasts more than a few hours ahead are made by constructing a discretized physical model based on a system of partial differential equations, such as the basic equations of fluid dynamics, which describe atmospheric flow, and obtaining approximate solutions by numerical calculations. This is referred to here as the physics-based method, as distinguished from the kinematic method.

In the physics-based method, numerical simulations are performed to imitate physical laws. However, to start a numerical simulation, the current state of the atmosphere must be assigned to the system as an initial value for the calculation. To create these initial values, the first step is to collect observation data. The organizations around the world have established a framework in which various types of observation data, such as ground, radar, ship, air, and satellite observations, are quickly shared through a line called the Global Telecommunication System (GTS), and organizations such as the Japan Meteorological Agency (JMA) use quality control to remove any suspicious data from these observations. The Japan Meteorological Agency and other organizations are in the process of removing suspicious data from these observations through quality control.

Furthermore, since it is wasteful to simply use the observed data as initial values, a process called data assimilation is used. Data assimilation is a system that uses the variance and covariance of errors in physical quantities found in observed values and simulation results to spread the information of observed data to places that are not observation points. The data assimilation process can be regarded as a type of artificial intelligence and is involved in the basic theory of mobile observation, which generates initial values and creates the conditions for starting numerical simulations.

The physical model used in the numerical simulation will represent physical laws such as Newton’s laws of motion, conservation of mass, equation of state of expectation, conservation of energy, and phase change. Because these physical laws are valid with a high degree of accuracy, they are generally considered to provide more accurate results than statistically constructed models.

However, since physical models are not systems of equations that can be solved with paper and pencil, partial derivatives must be replaced by four arithmetic operations in terms of differences in the spatial and temporal directions, and numerical calculations must be performed in an approximate manner. In other words, the equations to be described shall be discretized in the space and time directions to create a huge system of simultaneous equations. This is conceptually written as follows.

\[\mathbf{x}_t=M(\mathbf{x}_0)\]

Here, the subscripts represent time, and M is the time evolution operator including nonlinearities based on physical laws. x contains all the discretized temperatures and wind speeds at each location, and is a very long vector called the state variable vector. Here, x0 is the initial value and xt is the result of the prediction by the physical model.

The figure below shows an image of the physical model divided into a grid. The JMA provides daily weather forecasts using the global model GSM, which predicts the atmospheric conditions for the entire planet, and the regional model MSM, which divides the conditions around Japan into smaller areas.

Since the standby is horizontally connected, in the case of a global model, the calculation is performed by dividing the earth into 100 layers with horizontal grid spacing equivalent to about 20 km, and in the vertical direction, from the milky way to about 80 km above the earth. The lower figure of lattice points is about 130 million, and four physical quantities are placed at each lattice point, so the number of variables to be forecast, or state variable vectors x, is about 500 million.

Although the physical laws hold with a high degree of accuracy, they are not perfect because a coarse spatial discretization is applied. This is because phenomena on scales finer than the grid spacing cannot be reproduced explicitly. For example, clouds are several hundred meters in size, so GSM and MSM cannot modify every single cloud. Moreover, since there is a time limit for issuing weather forecasts, it is not possible to easily increase the number of grid points.

Furthermore, modeling realistic layer changes is very difficult (e.g., in reality, water does not immediately turn to ice when the temperature drops below 0°C), so bold pride of place is used. To compensate for these imperfections, it is important to reduce the grid spacing and improve our understanding of the laws of physics by introducing advanced computers, but we are still a long way from representing the real world.

The forecast values obtained by numerical simulation are slightly different from those of the real world due to the aforementioned lack of resolution and imperfections in the physical model. Therefore, statistical corrections are made to the forecast values using neural networks and Kalman filters based on past experience. This process is called guidance. Finally, the forecaster looks at the results of the guidance, makes a final decision, and announces it to the public with an explanatory statement.

In summary, a weather forecast is delivered through the following steps: (a) obtaining observation data, (b) generating initial values by making the most effective use of observation data through data assimilation, (c) simulating forecasts by giving the initial values to a physical model, and (d) correcting the results obtained in (c) using guidance and confirming them by the forecaster. (d) The results obtained in (c) are corrected by the guidance and confirmed by the forecaster.

Data assimilation

The process of data assimilation is described in detail here. In order to produce a good forecast using a physical model, it is necessary to use the obtained observation data to estimate a plausible current state and give this as an initial value for the forecast.

This is due to the fact that weather forecasting is viewed as an initial problem for partial differential equations. The initial values are generated using observation data, but instead of using only the information from the observation data, the initial values are generated by combining them with the results of the physical model through a mechanism called data assimilation. The reason for this is that the number of observed data is insufficient and the observed data itself has errors.

If the observed data is sufficient to recover all state variables, it is possible to determine initial values by the method of least squares, etc. However, in the global model, the number of available observed data is on the order of several million for the huge number of state variables (approximately 500 million) shown in the figure below. Therefore, attempting to estimate state variables directly from observation data alone will result in a coarse approximation.

In particular, it is impossible to make dense observations because data from the sea, the sky, and areas around prominent phenomena are mostly limited to remote sensing by satellites and other means. To start the numerical calculation, initial values must be given for all the part rules for all the grid points. Therefore, we first use the forecast values that have been issued in the past as the first estimate (prior information), and then consider making appropriate adjustments to the values close to the observation points and using the covariance between physical quantities at some distance from the observation points. Furthermore, if the observed data is highly uncertain, it is necessary to approach the forecast value that was issued in the past rather than the observed value. A data assimilation system is a platform that handles these matters in an integrated manner.

As an example, let us assume that only the central pressure of a typhoon has been obtained as an observed value.

If the observed central pressure is lower than previously forecasted values, the data assimilation system will modify the initial values to lower the central pressure. However, as one can easily imagine, when the central pressure is low, the ambient wind speed should also be strong. Therefore, when a correction is made to lower the central pressure, a correction is also made to increase the ambient wind speed at the same time.

Mathematically speaking, the covariance of the error between the central pressure and wind speed is negative, which means that the wind speed can also be corrected from the observed value of the central pressure. Many people know that there is a negative correlation between central pressure and wind speed in typhoons, but in practice, we use the relationship between various physical quantities. The heart of the data assimilation system is to have a computer calculate this.

From here, data assimilation is explained mathematically. Assuming that observational data can be collected at time t, the goal is to obtain an optimal estimate of the initial value of xt for weather forecasting with time t as the initial time. First, we describe the time evolution of the error using the discretized time evolution equation described earlier.

\[\Delta\mathbf{x}_i=M(\mathbf{x}_0+\Delta\mathbf{x}_0)-M(\mathbf{x}_0)=M\Delta\mathbf{x}_0\]

where Δx is the (agnostic) error defined as the difference from the true value, and M=∂M/∂x is the Jacobi matrix of the physical model M. In the last part, the first-order approximation of the Taylor expansion is used, assuming that the second-order or higher terms in Δx are sufficiently small compared to the first-order terms in Δx. The matrix representing the covariance of the errors and its time evolution can also be written as follows.

\begin{eqnarray}\mathbf{B}_0&=&<\Delta\mathbf{x}_0\Delta\mathbf{x}_0^T>\\\mathbf{B}_t&=&<\Delta\mathbf{x}_t\Delta\mathbf{x}_t^T>=<\mathbf{M}\Delta\mathbf{x}_0\Delta\mathbf{x}_0^T\mathbf{M}^T>=\mathbf{MB}_0\mathbf{m}^T\end{eqnarray}

< > denotes the expected value, the superscript T denotes the transpose, and B0 is the error covariance of the state variable. The introduction of these equations implicitly assumes that Δx is stochastically generated and that we can at best estimate only its covariance B0. The equations also represent the fact that the error covariance of the state variable changes from moment to moment, reflecting the characteristics of the physical model.

4D variational method

The calculation of the time variation of the forecast error covariance is important for more accurately reconstructing and accurately forecasting conditions over the ocean and in the sky, where observation data is sparse, as we saw earlier in the relationship between central pressure and wind speed. However, one problem arises here. If there are approximately 500 million complementary variables, the number of elements in the forecast error covariance B0 and Bt would be approximately 25 K. Even a supercomputer with excellent computational performance cannot forecast the covariance of such a large system explicitly. Therefore, the JMA has implemented an algorithm called the four-dimensional variational method as a data assimilation method to speed up the calculation.

For simplicity, let us assume that the time evolution of the error is linear and that the observed data are available only at time t. In this case, the 4-D variational method defines an evaluation function with the initial value vector of the state variable as the independent variable and solves the optimization problem as follows.

\begin{eqnarray}J(\mathbf{x}_0)&=&\frac{1}{2}(\mathbf{x}_0-\mathbf{x}_{0,fg})^T\mathbf{B}_0^{-1}(\mathbf{x}_0-\mathbf{x}_{0,fg})\\& &+\frac{1}{2}(\mathbf{H}_tM(\mathbf{x}_0)-\mathbf{y}_t)^T\mathbf{R}_t^{-1}(\mathbf{H}_tM(\mathbf{x}_0)-\mathbf{y}_t)\end{eqnarray}

Here, x0,fg are state variables based on forecasts made in the past and are called first estimates. Then, yt is a vector of observed values at time t, Rt is the error covariance matrix of the observed values, and Ht is a matrix representing the interior of the grid points in the physical model to the point where the observed values were obtained.

In this equation, the second term is the main term and quantifies the difference between the observed values and the initial values of the state variables used in the numerical model. The first term represents a constraint that the model should not deviate too far from the predictions made in the past. This is equivalent to giving a prior distribution from a Bayesian statistical point of view. From a meteorological point of view, this means that no matter how well the observed data and the state variables fit, we are applying a constraint that the ground temperature should not reach 100°C at a point where no observations are available.

The slope of this evaluation function can be calculated using M=∂M/∂x0 as in the following equation.

\[\frac{\partial J(\mathbf{x}_0)}{\partial \mathbf{x}_0}=\mathbf{B}_0^{-1}(\mathbf{x}_0-\mathbf{x}_{0,tg})+\mathbf{M}^T\mathbf{H}_t^T\mathbf{R}_t^{-1}(\mathbf{H}_t\mathbf{M}(\mathbf{x}_t)-\mathbf{y}_t)\]

In other words, by solving this optimization problem iteratively using the transposed matrix MT, the initial values of the state variables are determined in a computation time that is several dozen times longer than the forecast computation. Although this involves the operation of finding the inverse of a huge matrix, in effect, the calculation of the inverse can be avoided by making v0, defined by the variable transformation \(\mathbf{x}_0=\mathbf{B}_0^{1/2}\mathbf{v}_0\), an independent variable, or the singular value and singular vector and replacing it with a generalized inverse matrix.

Let x0,opt be x0 where ∂J(x0)/∂x0=0, and let xt,opt be the state variables at the optimized time t, assuming that the error is normally distributed, as in the following formula.

\[\mathbf{x}_{t,opt}=\mathbf{Mx}_{0,opt}mathbf{x}_{t,fg}+\mathbf{MB}_0\mathbf{m}^T\mathbf{H}_t^T(\mathbf{R}+\mathbf{HMB}_0\mathbf{M}^T\mathbf{H}^T)^{-1}(\mathbf{y}_t-\mathbf{H}_t\mathbf{x}_{t,fg})\]

This equation shows that the 4D variational grammar works to update xt,fg by calculating the residuals yt-Htxt,fg of the observed values and the first estimate made from the physical model and the time evolution of B0, MB0MT, and comparing it to the observation error (the correction range is determined by the size of R and MB0MT) The time evolution of B0, Bt=MB0MT, represents the covariance between physical quantities such as central pressure and wind speed, which helps to propagate the information obtained from the observed data.

The principle of the 4D variational method is deeply rooted in statistical estimation. In data or not, the problem of daily optimization of conditions such as temperature, wind speed, and humidity at each grid point is solved using the pride of physics equation as a constraint. Therefore, everyone is unknowingly benefiting from data science in daily weather forecasting.

In the field of meteorology, there are more and more people who think that multilayer neural networks will replace the current system, but it is difficult for multilayer neural networks to surpass the existing data assimilation system, even if the sophistication of guidance continues to increase.

This is because, in the current situation, problems that minimize the difference from observed data are already solved using the fundamental laws of physics with good accuracy, such as Newton’s laws of motion and conservation of mass, as the constraint conditions. Conversely, if one wants to maximize the benefits of a modern data science approach to meteorology, one should target phenomena that physical models are not good at, such as cloud physics and turbulence, which numerical models are not good at.

In the next article, we will discuss simulation and machine learning analysis of protein 3D geometry (misfolding).

コメント

Exit mobile version
タイトルとURLをコピーしました