Forecasting the Yen to Dollar Exchange Rate

Part 2: ARMA/ARIMA Modeling and Interpretation

9 min readFeb 23, 2021

If you missed part 1, you can read it here.

Model

ARMA (Auto-Regressive Moving Average) as well as ARIMA (Auto-Regressive Integrated Moving Average) models are commonly used in predicting and forecasting time series data. To determine the ideal parameters p and q for our ARMA model, we can view the autocorrelation (acf) and partial autocorrelation (pacf) plots for our dataset.

from statsmodels.graphics.tsaplots import plot_pacf, plot_acfplot_acf(df_log_mean, lags=50)
plot_pacf(df_log_mean, lags=50)
plt.show()

ACF gives us the auto-correlation values of any time series with its lagged values. It will check to see if the data has any correlation with its previous values, based on trend, seasonality, and residuals.

PACF will find the partial correlation values of the residuals of a time series with its next lag value. This partial correlation removes variations that were found in the ACF and only deals with the remaining residuals.

The partial plot shows a single large spike at lag = 1, indicating the ideal q term for our ARMA model. The ACF plot is a bit more difficult, as it shows large spikes from lag = 1 to lag = 12. In order to determine the best p term, we will investigate the results of the first 3 lags first to see which values are significant in our model. Reminder that the fewer correlating features that we have, the less likely that we will run into multicollinearity issues later on in the process. Therefore, we are trying to determine the best p term without including too many lag values.

from statsmodels.tsa.arima_model import ARMAmodel_arma = ARMA(df_log_mean, (1,1))
model_arma_fit = model_arma.fit()
print(model_arma_fit.summary())

This process was repeated for ARMA(2,1) and ARMA(3,1).

The tables provided by .summary() contain quite a bit of information that can be a little overwhelming to process. The key values that we are going to be focused on are AIC as well as the P>|z| value, or the P-value.

The AIC value, or the Akaike Information Criterion, is a measured value that is used to determine the best model based on in-sample fit and the model features it used to predict future values. Simply put, the AIC will tell us which model was the “best” at not over- or under-fitting to the data. The smaller the value, the “better” the model. Our ARMA(2,1) model produced the lowest AIC value, making it the “best” model of the three that were fit to our dataset.

The other value of interest is the classic P-value. If we look at the P>|z| columns for all three summary tables, we can see that our ARMA(3,1) model included a non-significant L3 parameter with P>|z| = 0.935. This result shows that the third lag in our auto-correlation plot is insignificant and also most likely removes the possibility of additional lags being significant , or P>|z| ~ 0. This result further supports our AIC argument that the ARMA(2,1) model is the best fit of the three. By passing in .forecast() to our model, we can check to see if our model is on the right track.

pred = model_arma2_fit.forecast(20, alpha=0.05)[0]np.exp(pred)

Remember: When working with transformed data, always make sure to reverse-transform your results. For more information, visit https://stats.idre.ucla.edu/sas/faq/how-can-i-interpret-log-transformed-variables-in-terms-of-percent-change-in-linear-regression/.

array([1.00137157, 1.00131597, 1.00126168, 1.00120971, 1.00116077,
       1.00111535, 1.00107375, 1.00103611, 1.00100244, 1.00097268,
       1.00094667, 1.00092419, 1.00090501, 1.00088885, 1.00087542,
       1.00086446, 1.00085567, 1.00084879, 1.00084356, 1.00083974])

Nice!…or is it? Our array of results still doesn’t seem to be in the right magnitude. 1 yen to 1 dollar would be an economic nightmare. This is because we also subtracted the rolling mean from the log-transformed dataset. How can we add the rolling mean back before calculating the inverse-log of our results? The math is a little complicated for the scope of this project. Instead, we can bypass the rolling-mean subtraction step by implementing an ARIMA model.

The difference between the ARIMA and ARMA model is the addition of the integrated parameter, d. The value of 𝑑 will determine the number of lag values to subtract from the current date value. This is similar to the .diff() method, the only difference being that the model will find the best fit value using a pseudo-grid search. To find a starting point, we can test different d values manually:

df_log_diff1 = df_log.diff(1).dropna()
stationarity_tests(df_log_diff1)Output[]:
ADF Statistic: -49.569
n_lags: 0.0
p-value: 0.0
Critical Values:
   1%, -3.433
Critical Values:
   5%, -2.863
Critical Values:
   10%, -2.567
Probably stationary

KPSS Statistic: 0.284
p-value: 0.1
n_lags: 28
Critial Values:
   10% : 0.347
   5% : 0.463
   2.5% : 0.574
   1% : 0.739
Probably stationary

For this model, it looks like d = 1 is enough differencing to make the data stationary. We can stop testing for additional values to keep the number of parameters in our pseudo-grid search as low as possible. Now that we have our new d parameter, we can use the same p and q parameters that we found in our previous ARMA model. Our grid search will iterate through every combination of parameters to determine the best model based on their AIC.

p = range(0, 3)
d = range(0, 2)
q = range(0, 2)# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))# Generate all different combinations of seasonal p, q and q triplets
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]warnings.filterwarnings("ignore")diffyearly = set()
for n, param in enumerate(pdq):
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(df_log,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)results = mod.fit()diffyearly.add((param, param_seasonal, results.aic))
        except:
            continue
    print(f'{len(pdq) - (n+1)} iterations remaining...')
            
sorted_diffyearly = sorted(diffyearly, key=lambda x: x[2])mod = sm.tsa.statespace.SARIMAX(df_log,
                                order=sorted_diffyearly[0][0],
                                seasonal_order=sorted_diffyearly[0][1],
                                enforce_stationarity=False,
                                enforce_invertibility=False)results = mod.fit()print(results.summary())

Our SARIMAX (seasonal ARIMA with eXogenous regressors) results show that although our AIC slightly increased from our previous results, it is still close to our “best” model. We can also observe that our P-value for our lag is significant. But why do we only have an ar.L1 parameter? The grid search determined the parameters set in our previous ARMA model were ideal for a log-transformed AND rolling mean subtracted dataset, not for our current only-log-transformed set. The program found SARIMAX(1,0,0) to be our “best” fitting model. In case you were wondering, yes, SARIMAX(1,0,0), ARMA(1,0), and AR(1) are all the same thing.

So why did we even use statsmodels.SARIMAX() if our results show that a simple AR(1) model works best? Two main reasons:

We found that p = 1 is the single significant parameter only after conducting our complicated and time-consuming grid search.
The .ARMA() function is deprecated while .SARIMAX() has a wide variety of methods and attributes that can be used to better validate our results.

results.plot_diagnostics(figsize=(16,12));

The four plots above provide validation visuals for our model. The standardized residual plot shows that the residuals over time in our series do not show any sort of seasonality and instead seems to be some noise/trends. The lack of seasonality is further supported by the correlogram plot. This plot shows that the residuals of our values do not correlate with the time lagged version of the same residual.

The remaining plots are not as nice to look at. The normal Q-Q plot shows some normality in our residuals but falls further away from the linear trend line at either end. This Q-Q plot pattern is typical of datasets that have more extreme values than a normal distribution. This skew from a normal distribution is also reflected in the final histogram plus estimated density plot. We can see that our data does not have an evenly distributed histogram and our KDE curve does not mirror the normally distributed curve N(0,1).

From here, we could either:

Re-explore — Go back to the dataset and continue to explore different patterns and transformations. As the diagnostic plots suggest, there are multiple instances of non-normal residuals that can be addressed to improve model performance OR
Validate — We’ve made it this far with this model, we might as well see how well it can perform on validation and testing sets.

For the sake of this post, we will continue on with validating our model.

pred = results.get_prediction(start=pd.to_datetime('2020'))
pred_conf = pred.conf_int()
plt.rcParams['figure.figsize'] = 15, 6ax = df_log['2017':].plot(label='log observed')pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast',
                         color='r', alpha=0.9)# Plot the range for confidence intervals
ax.fill_between(pred_conf.index,
                pred_conf.iloc[:, 0],
                pred_conf.iloc[:, 1], color='g', alpha=0.5)ax.set_xlabel('Date')
ax.set_ylabel('Log Yen to Dollar')
plt.legend()
plt.show()

# real and predicted values
yen_pred = pred.predicted_mean
yen_true = df_log.loc['2020':, 'yen']# reversing the log transformation
exp_mse = ((np.exp(yen_pred) - np.exp(yen_true)) **2).mean()
print(f'Mean Squared Error: {exp_mse}')
print(f'RMSE: {np.sqrt(exp_mse)}')
Output[]:
Mean Squared Error: 0.27342094541709405
RMSE: 0.5228966871353212

Using a one-step ahead forecast to predict the change in exchange rate, the model was able to produce an RMSE of 0.523. This result does seem to indicate that the model is fairly accurate with an RMSE close to 0. We can see if this is reflected in a different validation measure, walk-forward validation. This method of validation is similar to cross-validation as it is used to make multiple sets of train and validation sets in order to find the average error. However, since we are working with time series data, the values must be kept in its original order, rather than being shuffled around randomly like in K-fold validation.

# evaluate an ARIMA model using a walk-forward validation# split into train and test sets
X = df_log['yen']
size = int(len(X) * 0.80)
train, test = X[0:size], X[size:len(X)]
history = [x for x in train]
predictions = list()# walk-forward validation
for t in range(len(test)):
    model = ARIMA(history, order=(1,0,0))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    predictions.append(yhat)
    obs = test[t]
    history.append(obs)
    #print(f'predicted={yhat}, expected={obs}')# plot walk-forward validation
predict = pd.DataFrame(predictions, index=test.index)
plt.figure(figsize=(14,7))
plt.plot(np.exp(test), label='True Value')
plt.plot(np.exp(predict), color='red', label='Walk-Forward Validation Prediction')
plt.xlabel('Date')
plt.ylabel('JPY')
plt.legend()
plt.show()

# validate model mse
rmse = sqrt(mean_squared_error(np.exp(test), np.exp(predictions)))
print(f'Test RMSE: {round(rmse, 4)}')
print(f'Test MSE: {rmse**2}')Test RMSE: 0.4535
Test MSE: 0.2056433307661961

The walk-forward method produced an RMSE = 0.4535 when it tried to forecast prices that the model had never seen before. This result, along with the walk-forward validation plot, looks promising. To test our model, we can use the .get_forecast() method and plot the results alongside the time series data.

forecast = results.get_forecast(steps=500)fc_conf = forecast.conf_int()
fc_confxp = np.exp(fc_conf)# plot future predictions
ax = (np.exp(df_log)).plot(label='observed', figsize=(17, 10))
np.exp(forecast.predicted_mean).plot(ax=ax, label='Forecast')
ax.fill_between(fc_confxp.index,
                fc_confxp.iloc[:, 0],
                fc_confxp.iloc[:, 1], color='k', alpha=0.25)
ax.set_xlabel('Date')
ax.set_ylabel('Yen')
ax.set_title('Forecast of JPY to USD, 2021 --> ')plt.legend()
plt.show()

Interpret

This forecast plot has obvious issues:

The forecast “curve” is a straight line that is predicting an incredibly stable increase in the USD — JPY exchange rate over the next year.
The confidence interval (grey shaded region) is such a wide range of values that won’t help to predict a future exchange rate very accurately. This can be seen more clearly in the data frame fc_conf.

np.exp(fc_conf['2022':]).head(1)

Our model predicts that the opening exchange rate in 2022 will be somewhere between 89.643077 and 125.132536, an overall 36 point spread. Although this technically is a prediction, I would personally not use this model until the confidence intervals are reduced and the prediction curve actually looks similar to real exchange rate data.

This concludes this two part, end-to-end time series forecasting project. The results were nowhere near ideal but gave me an opportunity to practice working with time series data. If you are a fellow data science beginner and have any questions or want to chat about data, science, and/or data science, find me on LinkedIn or Twitter.

Forecasting the Yen to Dollar Exchange Rate

Part 2: ARMA/ARIMA Modeling and Interpretation

Model

Interpret

Written by Scott Okamura