I’m a sucker for statistical methods and Machine Learning particularly anything with a cool sounding name. When reading about Crouching Tiger Hidden Markov Models in an earlier post I stumbled across a topic called regime detection.
In economics latent Markov models, are so called Regime switching models. Regime Detection comes in handy when you are trying to decide which strategy to deploy. For example there are periods (regimes) when Trend Following strategies, like an autoregressive integrated moving average model (ARIMA) or exponential smoothing state space models (ETS) forecasting work better and there are periods when other strategies might be better. This might be useful if you are forecasting an index or rate that typically follows a trend but occasionally becomes more volatile.
For example some time series may be particularly well behaved except during the unpredictable economic downturns. The idea behind using the Regime Switching Models to identify market states is that market returns might have been drawn from two or more distinct distributions. Fortunately we do not have to fit regimes by hand, there is the depmixS4 package for Hidden Markov Models at CRAN that uses the expectation-maximization (EM) algorithm to fit Hidden Markov Models.
We use an economic indicator variable from the UK Building Cost Information Service (BCIS), as it provides an excellent demo of the type of variable that tends to have an upward trend until occasional market effects cause uncertainty and volatility. Tender price index (TPI) is used for many practical purposes in the construction industry, including establishing the level of individual tenders, adjustment for time, pricing, cost planning, and forecasting cost trends and general comparisons. Any index that responds to market conditions is suitable for this methodology.
Here it appears we have distinguishable states or regimes, the steady upward trend and the more volatile mountainous peaks followed by a slight trough.
We use the package documentation from vignette("depmixS4") to get started. First we experiment on the original time series, then we will see the impact of looking at the TPI first order difference.
We can compare this to a trivial one state model which returns the mean and standard deviation of the modelled variable. The two state model is slightly better with smaller log-likelihood, Akaike information criterion (AIC) and Bayesian information criterion (BIC) despite an increase in the degrees of freedom associated with the larger number of states modelled.
First order difference
As the time series is non-stationary, let’s take the first order difference and lagged time series of the tpi, as this often helps when modelling time series. Again we see the peaks and troughs mirrored by areas of increased volatility.
We fit a two state model which results in a reduction in the Log likelihood and both Information Criteria measures.
Second order difference
From the second order difference it looks like we have two regime states which could be modelled by Gaussian distributions with different standard deviations.
We fit a model, regime detection.
Which state are we in?
So our hidden Markov model explains more of the variation when fitted to the first order difference of the Tender Price Index using a two state model.
Thus if we plot our TPI first order difference and add these distributions, we can elucidate when the TPI is in one or the other state (steady or volatile). The mean for both is above zero due to the non-stationary nature of the TPI as it wanders ever upwards through time, like other inflation indices. The standard deviation for the second volatile state is greater, reflecting the uncertainity and difficulting in forecasting TPI while in this state.
We can plot the first order difference of TPI and identify the different regimes using the Gaussian 95% confidence intervals (mean +/- 1.96*sd). If the TPI first order difference lies outside the 95%.
This is quite useful for identifying when TPI is in its different states, however it is of post hoc interest, as we can only look at it after the fact. However, as the volatile years putatively associated with state 2 tend to persist for several quarters, if we enter this state we can predict that our standard time series methods will not be useful for several quarters until the TPI first order difference generative model transitions back to state 1 with probability 0.522 as described in the transition matrix.
The problem is there is some overlap between states. How can we tell which state the TPI is in?
Which state when?
First we need to build a dataframe for a ggplot2 class object and not use the zoo class yearqrt for our dates. We place the Quarterly style with full dates and assume Quarters occur on the first day of the month of January, April, July, October.
This looks good and allows us to plot. The graph shows what looks like a more or less stationary process punctuated by a few spikes of extreme volatility. Have a guess as to when the most extreme spike occurs?
Let’s construct and fit a regime switching model and confirm while we are at it, that the 2 state model is superior. It is, try adjusting the nstates argument to confirm, which gives the lowest log-likelihood and AIC and BIC?
Now we have an inference task where we know the mean and standard deviation of the two different states, thus we can infer the proability that an observation belongs to a given state, either state 1 (calm) or state 2 (volatile).
Now we have the probabilities of each state our Bear (calm, state 1) and our volatile state (state 2).
This tells us the current volatility of the TPI and thus will determine the utility and precision of our forecasts that rely on standard timeseries ARIMA methods. Given the TPI is currently in a volatile state probably, we should be cautious when using our standard forecasting strategies. This is particularly poignant given the market volatility associated with the EU referendum.
The states prove to be quite sticky and unlikely to change as indicated by the transition matrix:
Perhaps we can use markovchain package to run simulations and determine most probable scenario to assist forecast.
If we look at the tail end of the first order difference of the TPI:
We observe TPI difference of less than the 99% CI for State 2. Therefore we assume that the TPI difference is in State 1 to initiate our simulation, although it could be in state 2 given the previous large differences of 15 & 7.
Thus we run the simulation.
We could run this 10,000 times and build up a probability distribution of likely states for future TPI to assist with time series forecasting using traditional time series methods that rely on historic data to predict the future.
Conclusion
Hybrid models can be developed which can add confidence to using traditional time series methods such as ARIMA and ETS, whereby we expect the future to behave in a similar fashion to the past (especially less volatile periods in an indices history).
Prediction is very difficult, especially about the future.