April 21, 2016

Using GoldSim to Estimate Forecasting Parameters from Data

Posted by Ryan Roper

Recently, a customer contacted me to ask how to estimate parameters of a best-fit trend line for data stored in a Time Series element. The solution had to be sufficiently generic for multiple data sets and the fitting had to occur at the beginning of a simulation so that the parameters could be used for forecasting. The first solution I proposed used a SubModel with the optimization capability enabled. This worked well for a few data sets, but the solution needed to be scaled up to work efficiently for dozens of data sets on a single run. By the time I had implemented the approach for 10 data sets, I could see that the run time was going to be too long.

Since we had a known equation to fit to the data, I decided to implement a solution using the Gauss-Newton algorithm, an iterative nonlinear least-squares method for fitting trend lines to data (see Gauss-Newton Algorithm). This solution dramatically reduced the computation time and still gave great results. Since I think this solution could be of interest to many users in a variety of application areas, I put together a nice example model and posted it to our model library: Gauss-Newton Trend Line Fitting. In this blog post, I describe the implementation and show some results.


Implementation Details: The GoldSim implementation of the Gauss-Newton algorithm uses a Looping Container to carry out the required iterative calculations. The Looping Container is conditionally activated right at the beginning of the simulation and then deactivates after the first time step. As mentioned, the Gauss-Newton method requires that you have a specific equation to fit to the data. If you have a complex model for which you need to estimate optimal input parameters, you should use GoldSim's optimization capabilities. But when fitting a known equation, the Gauss-Newton method is more efficient.

The Gauss-Newton method relies on matrix calculations. Input data are stored in Time Series elements. A data vector is generated from Time Series elements using the Time Series lookup capabilities. A Jacobian matrix also must be generated, which stores the negative first derivatives of the fitting function with respect to the function parameters. Basic matrix calculations are then used to iteratively update the parameter value guesses until the estimated parameter values stop changing appreciably.

Input Data Options: The data used in this example is for three major U.S. financial market indexes, the NASDAQ, S&P 500 and NYSE (New York Stock Exchange), between January 1985 and April 2016. Data were downloaded from http://finance.yahoo.com/. In the dashboard, you can select which data set to use as well as the date range of data over which to fit the trend line. This way you can, for example, get an estimate of the annual percentage rate of change over 25 years of data or just over the last 5 years.

Trend line fit to NYSE data from 1990 to 2015

Trend line fit to NYSE data from 2010 to 2015

Model fitting options: In the model fitting options, you can select what kind of trend line to fit to the data (linear, exponential or compound interest). Available options for the trend line equation all have two parameters. These are generically called P1 and P2. In the case of a linear trend line, P1 is the y-intercept (i.e. the time-zero value) and P2 is the slope in units of 1/yr (i.e. the per-year market value increase). In the case of the exponential and compound interest equations, P1 is the time-zero starting value and P2 is the annual interest rate or percent change.

If this model is of interest to you, be sure to download it and take a look: Gauss-Newton Trend Line Fitting.

No comments:

Post a Comment