Chapter

IV How Well Do the Models Perform?

Author(s):
Catherine Pattillo, Andrew Berg, Gian Milesi-Ferretti, and Eduardo Borensztein
Published Date:
January 2000
Share
  • ShareShare
Show Summary Details

Recent work claiming success in predicting crises has concentrated almost exclusively on in-sample prediction.23 There is an important danger in focusing on this type of evaluation as it may overestimate the ability of the models to predict future crises. The implementation of each model involves using historical data to estimate the model, that is to decide exactly how much weight to give to each of the predictive variables. The danger is that, as emphasized in Section II, different sets of crises may be fundamentally different. Models fitted over historical data may not provide much guidance for the prediction of subsequent crises. To guard against overestimating the usefulness of these crisis-prediction models, we emphasize the testing of these models “out of sample,” that is in predicting crises that occurred after the models were formulated and estimated.

The Asian crisis is a natural testing ground for the out-of-sample performance of crisis-prediction models. The high number of crises in 1997 came largely as a surprise to most observers, suggesting a possible role for prediction models. These crises contained a variety of new elements as well as some important points of continuity with previous crises. Moreover, a number of crisis-prediction models had been elaborated prior to the Asian crisis, inspired by the Mexican crisis of 1994. Thus, this section addresses the question: if one had used these models in late 1996, how well armed would one have been to predict the Asian crisis?

To evaluate the performance of early warning systems, one study representative of each of the three types of approaches to predicting currency crises that were identified above is examined in depth. These studies are among the most well known and promising based on their success within sample. All these models were formulated prior to 1997, so that their application to the Asian crisis is truly out of sample. In addition, the results from a model currently under development in the Developing Country Studies Division of the IMF's ReseaRch Deprtment (DCSD),24 which combines features of the other three approaches, are also presented. Out-of-sample tests of this model can be performed only in the sense of estimating it using data up to 1995 and then checking how well it fares in predicting events in 1997. This is not a “pure” out-of-sample test, however, in that the model was inevitably formulated, and, in particular, variables chosen, with the benefit of hindsight regarding the Asian crisis. The models have the following features.

  • Kaminsky, Lizondo, and Reinhart (1998) (KLR) develop an early warning system for currency crises based on a variety of monthly indicators that signal a crisis whenever they cross a certain threshold value. A variable-by-variable approach is chosen so that a surveillance system based on the method would provide assessments of which variables are “out-of-line.” The information from the separate variables is combined to produce a composite measure of the probability of crisis.

  • Frankel and Rose's (1996) (FR) probit regression model of currency crashes analyzes a broad set of potentially important variables. Motivated by the Mexican crisis, the study tests in particular the hypothesis that certain characteristics of capital inflows are associated with currency crashes. Their use of annual data permits them to look at these variables, as well as others that are available only at annual frequency.

  • Sachs, Tornell, and Velasco (1996a) (STV) restrict their attention to a cross-section of countries in 1995. They test whether the incidence and severity of crisis following the devaluation of the Mexican peso can be explained by a particular set of fundamentals, where the framework assumes that countries with weak fundamentals and low reserves were particularly vulnerable to the effects of Mexico's devaluation.25

  • The DCSD model, in the spirit of the KLR approach, uses monthly data to determine which variables contribute to the probability of a crisis occurring within the following 24 months. The concept is to take maximum advantage of the predictive power provided by the monthly indicators suggested by KLR, as well as some addi tional variables, by using them in an econometric framework similar to that of FR (a probit regression).

The most basic evaluation of the models is to gauge how well they perform in predicting crises in-sample, that is, when applied to the historical data that was used to formulate the model. The discussion below is based on estimation of the KLR, STV, and DCSD models on a common sample of 23 emerging market economies through April 1995 and on a sample of 41 developing countries for FR.26

Kaminsky-Lizondo-Reinhart Model

Since the KLR approach assesses the signaling properties of indicators one at a time, the effectiveness of the approach can be examined by determining the extent to which each individual indicator is useful in predicting crises. Of the 15 indicators considered (see Table 2), 8 are found to be informative, in that crises are more likely to occur when the indicator signals than when it does not. These “good” indicators are deviations of the real exchange rate from trend, the growth in the ratio of M2 to international reserves, export growth, growth of international reserves, “excess” M1 balances, growth in domestic credit as a fraction of GDP, the real interest rate, and the change in terms of trade.

Table 2.Performance of Kaminsky-Lizondo-Reinhart (KLR) Indicators
Good SignalsIncrease in Expected
as a Share of TimesFalse AlarmsProbability ofProbability if
the Indicator Shouldas a Share ofCrisis GivenIndicator Signals
Be Signaling1Signals2a Signal(In percentage points)3
(1)(2)(3)(4)
Real exchange rate426534729
M2/reserves growth rate26653517
International reserves
growth rate18673315
Export growth rate1573279
Excess M1 balances1573279
Real interest rate1578224
Domestic credit/GDP
growth rate1978224
Terms of trade growth rate1080201
Lending rate/deposit rate98515–1
M2 multiplier growth rate178515–2
Industrial productione
growth rate138218–2
Import growth rate128416–2
Real interest differential148614–4
Stock price index
growth rate138713–6
Bank deposit growth rate78812–6
Average for 8 “good”
indicators518712911
Sources: Berg and Pattillo (1999a), and IMF staff calculations.

A good signal is a signal that is followed by a crisis within 24 months.

A false alarm is a signal that is not followed by a crisis within 24 months.

Probability of crisis given a signal less the unconditional probability of crisis in the sample.

Deviation from trend.

“Good” indicators are those for which signals are in fact associated with a higher frequency of crisis. That is, for good indicators, a signal implies a probability of crisis higher than that implied by the actual incidence of crisis over the sample.

Sources: Berg and Pattillo (1999a), and IMF staff calculations.

A good signal is a signal that is followed by a crisis within 24 months.

A false alarm is a signal that is not followed by a crisis within 24 months.

Probability of crisis given a signal less the unconditional probability of crisis in the sample.

Deviation from trend.

“Good” indicators are those for which signals are in fact associated with a higher frequency of crisis. That is, for good indicators, a signal implies a probability of crisis higher than that implied by the actual incidence of crisis over the sample.

Table 2 shows various measures of the effectiveness of these indicators. First, consider the observations that are in fact followed by a crisis within 24 months. Column (1) shows the fraction of these observations for which the indicator is signaling a crisis. Next consider false alarms, that is signals that are not followed by a crisis within 24 months. Column (2) shows the fraction of signals that are false alarms. A perfect indicator would obtain 100 percent in column (1), implying that a signal was issued every month during the 24 months prior to each crisis, and 0 in column (2), indicating that all signals that were issued were indeed followed by a crisis within 24 months. Clearly, some indicators are better than others. The best eight would seem to contain useful information; for these indicators, the issuance of a signal implies a probability of crisis higher than that implied by the actual incidence of crises over the sample, as shown in column (4). On average, these good indicators signal a crisis 18 percent of the time a crisis does in fact ensue.27 A large share of the signals are bad signals: 71 percent of the times that the average good indicator signaled, it was not followed by a crisis within 24 months.

These results appear to be poor: signals are mostly false alarms, while most precrisis months are not signaled. However, these signals do carry substantial information about the probability of crisis. When an indicator signals, a crisis ensues more often than when there is no signal. For example, 47 percent of the time the real exchange rate signals, a crisis ensues within 24 months, as column (3) of Table 2 shows. This probability of crisis is much higher than the average frequency of crises in the sample, so that the real exchange rate signal does increase the expected probability of crisis, as shown in column (4).

Clearly, it would be desirable to combine the information from the various indicators. To this end, the indicators can be aggregated into a composite index that measures the probability of crisis for each country at every point in time (see Kaminsky, 1998). Table 2 showed that some indicators are much better predictors than others. Thus, each indicator is weighted by a measure of its reliability in predicting crises. The performance of this approach can be assessed systematically by looking at various goodness-of-fit measures (Box 1 explains these measures in detail). For example, a natural question to pose is whether the estimated probability of crisis is above 50 percent prior to actual crises. Table 3 shows that the predicted probability of crisis was above 50 percent in 9 percent of the months between January 1970 to April 1995 when a crisis followed within 24 months. The model does correctly call almost all (99 percent) of the more numerous tranquil periods. One could consider a lower cutoff probability than 50 percent if there is relatively higher concern with missing crises and relatively lower concern with issuing false alarms. Using a 25 percent cutoff, for example, the model predicts a crisis in 46 percent of periods that it should.28 Of course, this improvement in the fraction of crises correctly predicted comes at the expense of a lower fraction of tranquil periods correctly called.

Table 3.Predictive Power of Kaminsky-Lizondo-Reinhart (KLR), Frankel-Rose (FR), and Developing Country Studies Division (DCSD) Models—In-Sample1
Full Sample1986-April 1995 Sample
DCSDDCSD
KLR weighted-probabilitiesprobabilities
sum-basedFRwithout short-with short-
probabilitiesprobabilitiesterm debtterm debt
Goodness-of-fit (cutoff
probability of 50 percent)
Percent of observations
correctly called83908485
Percent of precrisis periods
correctly called293372
Percent of tranquil periods
correctly called39998100100
False alarms as percent of
total alarms43026110
Probability of crisis given
An alarm5707489100
No alarm617101615
Goodness-of-fit (cutoff
probability of 25 percent)
Percent of observations
correctly called75867881
Percent of precrisis periods
correctly called246634839
Percent of tranquil periods
correctly called381898488
False alarms as percent of
total alarms465526364
Probability of crisis given
An alarm535483736
No alarm61361111
Sources: Berg and Pattillo (1999a), and IMF staff calculations.

The KLR and DCSD models are estimated using a 23-country or economy sample consisting of Argentina, Bolivia, Brazil, Chile, Colombia, India, Indonesia, Israel, Jordan, Korea, Malaysia, Mexico, Pakistan, Peru, the Philippines, South Africa, Sri Lanka, Taiwan Province of China, Thailand, Turkey, Uruguay, Venezuela, and Zimbabwe.

The 41-country FR sample includes all of the above except for Israel, Jordan, South Africa, and Taiwan Province of China, plus the following 22 countries: Algeria, Botswana, Costa Rica, Cote d'Ivoire, the Dominican Republic, Ecuador, Egypt, El Salvador, Guatemala, Hungary, the Islamic Republic of Iran, Jamaica, Mauritius, Morocco, Oman, Panama, Portugal, Paraguay, Romania, the Syrian Arab Republic, Trinidad and Tobago, and Tunisia.

This is the number of precrisis periods correctly called (observations for which the estimated probability of crisis is above the cutoff probability and a crisis ensues within 24 months) as a share of total precrisis periods.

This is the number of tranquil periods correctly called (observations for which the estimated probability of crisis is below the cutoff probability and no crisis ensues within 24 months) as a share of total tranquil periods.

A false alarm is an observation with an estimated probability of crisis above the cut off (an alarm) not followed by a crisis within 24 months.

This is the number of precrisis periods correctly called as a share of total predicted precrisis periods (observations for which the estimated probability of crisis is above the cutoff probability).

This is the number of periods where tranquility is predicted and a crisis actually ensues as a share of total predicted tranquil periods (observations for which the estimated probability of crisis is below the cutoff probability).

Sources: Berg and Pattillo (1999a), and IMF staff calculations.

The KLR and DCSD models are estimated using a 23-country or economy sample consisting of Argentina, Bolivia, Brazil, Chile, Colombia, India, Indonesia, Israel, Jordan, Korea, Malaysia, Mexico, Pakistan, Peru, the Philippines, South Africa, Sri Lanka, Taiwan Province of China, Thailand, Turkey, Uruguay, Venezuela, and Zimbabwe.

The 41-country FR sample includes all of the above except for Israel, Jordan, South Africa, and Taiwan Province of China, plus the following 22 countries: Algeria, Botswana, Costa Rica, Cote d'Ivoire, the Dominican Republic, Ecuador, Egypt, El Salvador, Guatemala, Hungary, the Islamic Republic of Iran, Jamaica, Mauritius, Morocco, Oman, Panama, Portugal, Paraguay, Romania, the Syrian Arab Republic, Trinidad and Tobago, and Tunisia.

This is the number of precrisis periods correctly called (observations for which the estimated probability of crisis is above the cutoff probability and a crisis ensues within 24 months) as a share of total precrisis periods.

This is the number of tranquil periods correctly called (observations for which the estimated probability of crisis is below the cutoff probability and no crisis ensues within 24 months) as a share of total tranquil periods.

A false alarm is an observation with an estimated probability of crisis above the cut off (an alarm) not followed by a crisis within 24 months.

This is the number of precrisis periods correctly called as a share of total predicted precrisis periods (observations for which the estimated probability of crisis is above the cutoff probability).

This is the number of periods where tranquility is predicted and a crisis actually ensues as a share of total predicted tranquil periods (observations for which the estimated probability of crisis is below the cutoff probability).

Table 4.Correlation of Actual and Predicted Rankings Based on Sachs-Tornell-Velasco (STV) Approach: In-Sample
Actual1Predicted
CrisisCrisis
severityRankseverityRank
Mexico791.321293.572
Argentina202.092322.111
Brazil197.013186.054
Uruguay85.03474.577
Philippines71.875168.085
Venezuela51.65612.8910
Taiwan Province
of China44.00769.618
Colombia42.29849.259
South Africa22.329–1.5011
Zimbabwe15.7910–85.3023
Indonesia13.1511–19.2317
Sri Lanka7.3712–27.8819
Pakistan6.7713–35.8020
India–12.2814–11.4815
Jordan–15.6415–36.7621
Thailand–18.191678.596
Turkey–24.9217–16.9616
Malaysia–26.2418–5.8813
Peru–26.8619–36.8622
Korea–37.0120–26.0018
Chile–56.1721–3.6812
Bolivia–63.7722241.113
Israel–91.4023–10.3714
Correlation20.49
p–value0.018
R20.24
Source: IMF staff calculations.

Actual crisis (November 1994–April 1995).

Spearman Rank correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regression of fitted values on actual values.

Source: IMF staff calculations.

Actual crisis (November 1994–April 1995).

Spearman Rank correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regression of fitted values on actual values.

Box 1.Summary Measures of Model Performance

One of the difficulties in assessing the predictive success of early warning systems is that the models generally produce an estimated probability of crisis that cannot be compared with the unobservable actual probability of crisis but only with the occurrence or not of a crisis. Yet it is possible to compute a number of measures of how well model probabilities correspond to the subsequent incidence of crises (goodness-of-fit). The first step is to convert predicted probabilities of crisis into alarms or signals that a crisis will ensue within the following 24 months (assuming that 24 months is the model's horizon). A signal is defined as a predicted probability of crisis above some threshold level (the cutoff threshold). Then, each observation (a particular country in a particular month—for example, Thailand in December 1996) is categorized as to whether it is predicted to be a precrisis month and also according to whether it is an actual precrisis month.

It is a predicted precrisis observation if the predicted probability of crisis is above the threshold; otherwise it is a predicted tranquil observation. If a crisis in fact ensues within 24 months of the observation in question, it is an actual precrisis observation; otherwise it is an actual tranquil observation.

The accompanying table shows how the signals from the DCSD model discussed in Section IV and presented in Table 5 compare to actual outcomes, for the out-of-sample period May 1995 through December 1997. It uses a cutoff threshold for calling a crisis of 25 percent.

Table 5.Predictive Power of Kaminsky-Lizondo-Reinhart and Developing Country Studies Division Models: Out-of-Sample
KLR Weighted-Sum-DCSD
Based ProbabilitiesProbabilities
Goodness-of-fit (cutoff
probability of 50 percent)
Percent of observations
correctly called7074
Percent of precrisis periods
correctly called103
Percent of precrisis periods
correctly called103
Percent of tranquil periods
correctly called2100100
False alarms as percent
of total alarms3No crisis predictions0
Probability of crisis given
An alarm40100
No alarm52927
Goodness-of-fit (cutoff
probability of 25 percent)
Percent of observations
correctly called7079
Percent of precrisis periods
correctly called13473
Percent of tranquil periods
correctly called28681
False alarms as percent
of total alarms35141
Probability of crisis given
An alarm44959
No alarm52411
Sources: Berg and Pattillo (1999a), and IMF staff calculations.

This is the number of precrisis periods correctly called (observations for which the estimated probability of crisis is above the cutoff probability and a crisis ensues within 24 months) as a share of total precrisis periods.

This is the number of tranquil periods correctly called (observations for which the estimated probability of crisis is below the cutoff probability and no crisis ensues within 24 months) as a share of total tranquil periods.

A false alarm is an observation with an estimated probability of crisis above the cutoff (an alarm) not followed by a crisis within 24 months.

This is the number of precrisis periods correctly called as a share of total predicted precrisis periods (observations for which the estimated probability of crisis is above the cutoff probability).

This is the number of periods where tranquility is predicted and a crisis actually ensues as a share of total predicted tranquil periods (observations for which the estimated probability of crisis is below the cutoff probability).

Sources: Berg and Pattillo (1999a), and IMF staff calculations.

This is the number of precrisis periods correctly called (observations for which the estimated probability of crisis is above the cutoff probability and a crisis ensues within 24 months) as a share of total precrisis periods.

This is the number of tranquil periods correctly called (observations for which the estimated probability of crisis is below the cutoff probability and no crisis ensues within 24 months) as a share of total tranquil periods.

A false alarm is an observation with an estimated probability of crisis above the cutoff (an alarm) not followed by a crisis within 24 months.

This is the number of precrisis periods correctly called as a share of total predicted precrisis periods (observations for which the estimated probability of crisis is above the cutoff probability).

This is the number of periods where tranquility is predicted and a crisis actually ensues as a share of total predicted tranquil periods (observations for which the estimated probability of crisis is below the cutoff probability).

Each number represents the number of observations that satisfy the criteria listed in the rows and columns. Thus, a given observation is either followed by a crisis within 24 months or it is not, so it belongs in either the tranquil column or the precrisis column. The model either generates a probability of crisis below 25 percent or it does not, so it is counted in either the tranquil row or the precrisis row. For example, for the entire out-of-sample period and country sample, there were a total of 321 tranquil months, and for 259 of them the probability of crisis was below the 25 percent threshold. From this table the various measures of accuracy discussed in the text can be calculated:

Out-of-Sample Goodness-of-Fit: DCSD Model
Actual
PredictedTranquilPrecrisisTotal
Tranquil25932291
Precrisis6288150
Total321120441
  • The fraction of observations correctly called (79 percent) is equal to the sum of precrisis months correctly called (observations that were followed by a crisis within 24 months for which a signal was issued) (88) and tranquil periods correctly called (259) divided by the total number of observations (441).

  • The rate of false alarms as a share of signals (41 percent) is equal to the number of predicted crises not in fact followed by a crisis (62) divided by the total number of observations for which the model predicted a crisis (150).

  • The probability of a crisis given a signal (59 percent) is equal to the number of observations for which a signal was issued and a crisis ensued (88) divided by the total number of signals issued (150). (This is the same as 100 minus the rate of false alarms.)

  • The probability of a crisis given no signal (11 percent) is equal to the number of observations for which a signal was not issued and a crisis ensued (32) divided by the total number of observations during which no signal was issued (291). The difference between the probability of crisis given a signal and the probability of crisis given no signal is the increase in the risk of crisis associated with the issuance of a signal.

Another way of looking at these goodness-of-fit statistics comes from taking the perspective of a decision maker who must choose a course of action based on these crisis predictions. An important question would be the incidence of false alarms—the fraction of times no crisis occurs when crises are predicted. When probabilities above 25 percent are said to be predicting a crisis, 65 percent of these crisis predictions precede noncrisis months. As we observed before, this high false alarm rate does not mean the estimated probabilities carry no information. Most of the time, the estimated probability of crisis is below 25 percent. Crises in fact follow these observations only 13 percent of the time. When the probability rises above 25 percent, crises are in fact more likely, ensuing 35 percent of the time.

Frankel-Rose Model

The FR model was updated through 1996 using a sample of 41 countries that is comparable to the samples used in the other studies (the original study was based in a much broader sample of developing countries). The results show that the probability of a crisis increases when domestic credit growth is high, reserves as a share of broad money are low, the real exchange rate is overvalued, the fiscal and current account deficits are high, the economy is more closed (measured by the share of exports and imports in GDP), and foreign interest rates are high. In addition, some characteristics of capital inflows seem to matter. Low shares of concessional debt and foreign direct investment as a proportion of total debt increase the probability of crisis, as do high shares of debt issued by the public sector.

The goodness-of-fit statistics show that the model performs fairly well in generating predicted probabilities of crashes above 50 percent when a crash actually occurs (column 2 of Table 3). The model correctly predicts one-third of the crashes in the sample, with only a 26 percent incidence of false alarms. Using a threshold of 25 percent, correct predictions increase to 63 percent of the crashes and false alarms to 52 percent. Thus, the FR model performs somewhat better than the KLR framework in predicting crises within sample.

Sachs-Tornell-Velasco Model

STV argue that a key feature of the 1995 crises was that the attacks only hit hard at already vulnerable countries. Thus, countries with overvalued exchange rates and weak banking systems29 were subject to more severe attacks, but only if they had low reserves relative to monetary liabilities (so that they could not easily accommodate capital outflows) and weak fundamentals (so that fighting the attack with higher interest rates would be too costly).

Using the 23-country common sample (still for the 1995 crises), the estimated STV model fits the data only moderately well. The main hypotheses receive mixed support: a depreciated real exchange rate lowers the severity of a crisis only for countries with low reserves and weak fundamentals, but the effect of lending booms on such countries is insignificant.30

Since the model does not predict the discrete event of whether a country has a crisis or not, but rather the level of an index of exchange market and reserve pressure, it is not possible to assess how well the model fits the data in terms of a proportion of crises correctly called, as done for the other studies. The STV framework predicts which countries should face the greatest pressure on the crisis index during a period of global financial turbulence such as the Mexican crisis.31 This suggests evaluating the performance of the model by comparing rankings of countries based on the predicted and actual crisis indices, as shown in Table 4. One would expect the predicted rankings to match up relatively well with the actual rankings, since these are in-sample predictions, that is, predictions for the period that the model was designed to explain. Indeed, the table shows that there is a positive and significant correlation between the actual and predicted crisis indices. However, less than one-quarter of the variation in actual rankings is explained by the predicted rankings.

Developing Country Studies Division Model

The DCSD model uses monthly data to determine which variables contribute to the probability of a crisis occurring within the following 24 months. The probability of this event is estimated in a probit regression model. This has two advantages: the model can aggregate predictive variables more satisfactorily into a composite probability, taking account of correlations among different variables; and it is easy to test for the statistical significance of individual variables. In addition, it is possible to allow the risk of a crisis to increase linearly with the predictor variables.32

The model is obtained by including the 15 KLR variables (where sufficient data were available) plus three additional variables, then simplifying by dropping insignificant variables.33 The additional variables are the level of M2/reserves, the current account to GDP ratio, and the ratio of short-term debt to reserves. The results indicate that the more significant variables are the real exchange rate relative to trend, the current account deficit, reserve growth, export growth, and the ratio of short-term debt to reserves.

The DCSD model excluding short-term debt performs about as well in-sample as the KLR model, as shown in Table 3. Because the variable is only available from 1986, the model that includes this variable is estimated over a shorter period. The in-sample performance is somewhat worse in this specification than for the model excluding short-term debt. Using a 25 percent cutoff, the model including short-term debt to reserves predicts a crisis 39 percent of the times that it should, while 64 percent of alarms are false.

Out-of-Sample Performance: 1997

For an early warning system to be a useful tool, it would have to provide informative signals out of sample, namely, beyond the time period for which the model itself was estimated. An interesting (although by no means complete) test of the out-of-sample performance of the models reviewed here is to check whether they produced signals of impending trouble ahead of the crises of 1997 using only the data available before the crisis. Because the maximum prediction horizon of KLR and DCSD is two years, these models were estimated using only data through April 1995 to forecast crisis probabilities for 1997 and to compare those forecasts with the outcomes. FR was estimated using annual data through 1996, and STV through April 1995.

The out-of-sample performance of the models can be evaluated in two ways. First, a natural assessment of the model performance is to check whether the models predicted high probabilities of crisis (above say, 50 or 25 percent) in the periods preceding actual crises. This goodness-of-fit test evaluates the success in predicting the timing of crises. Second, given the rather unpredictable nature of the timing of global turbulence and contagion, the models can be put to the test of predicting the relative severity of crises (or, more precisely, measures of balance of payments pressure) faced by different developing countries. That is, one would judge the success of the models by how they anticipated—again using only previously available information—the relative intensity of balance of payments pressures suffered by different countries. The performance of the models is then assessed by comparing a ranking of countries according to the value of their crisis index as predicted by the models with their ranking using actual data for 1997.

The first type of test, evaluating whether the models predicted probabilities of crisis accurately, was applied to the KLR and DCSD models only because it was not possible to produce meaningful goodness-of-fit measures of this kind for the FR and STV models, for different reasons. According to the FR definition, there are no actual crises in 1997, so there are no crises to predict. This odd result illustrates the fact that the use of annual data does not work well for the crisis variable in 1997. Because the largest depreciations happened toward the end of the year, none of the Asian countries is identified as a crisis country in 1997 within this framework.34 The problem with the STV framework is that it does not predict the timing of discrete crisis events, but rather predicts which countries should face the greatest pressure as measured by a crisis index during a period of global financial turbulence. Thus, this model cannot be subjected to the first test because it is not possible to extract from it a prediction of the probability of crisis, although it is ideally designed for ranking countries by their level of risk, as in the second test.

The out-of-sample performance of the KLR model is less successful than the in-sample performance though also more selective as Table 5 shows. With a 25 percent probability cutoff, the KLR-based predicted probabilities correctly signal 34 percent of the crisis observations (as opposed to almost one-half within sample), while the incidence of false alarms falls to 51 percent.35 From the perspective of a decision maker attempting to interpret the signals coming from the model, the model continues to provide predictions of some value, even out of sample. For observations in which the predicted probability of approaching crisis was below 25 percent, crises actually followed 24 percent of the time. When the predicted probability of crisis was above 25 percent, however, crises ensued 49 percent of the time.

The DCSD model performs much better out-of-sample. Again, with the 25 percent cutoff probability it correctly predicts a crisis in 73 percent of the observations that are actually followed by crises. Less than half the time that a crisis was predicted, no crisis ensued within 24 months. The contribution of the model to the analysis of the external risk faced by the countries can be appreciated in the following way. The predicted probability of crisis was below 25 percent during most of the out-of-sample period under consideration. For these observations, crises actually occurred only 11 percent of the time. When the predicted probability of crisis was above 25 percent, however, crises ensued 59 percent of the time.

This good performance is illustrated by examining all the countries that experienced a crisis in 1997. Except for Philippines, the probabilities were above 25 percent for most of the 24 months preceding the first month of the crisis. Looking at countries that did not experience a crisis shows that false alarms were a bigger problem for some countries than others. For Argentina, the probabilities were above 25 percent in only one of the 20 months from May 1995 to December 1996, but in 11 for Peru.

The second test focuses on the success of the models in identifying which countries would be vulnerable in a period of global financial turmoil such as 1997. The question here is whether the models assign higher predicted probabilities of crisis to those countries that had the biggest crises (as defined by each model).36 This can be addressed by comparing how closely the predicted ranking resembles the actual one, as shown in Table 6. An additional benefit of the ranking comparison is that it provides a unified method to evaluate the forecasting performance of all four models. Clearly, a model forecasts successfully if countries that have the highest predicted probabilities of crisis are those that also display the highest values in the severity of crisis index. Thus, the table displays the correlation between the actual and predicted rankings, as well as the proportion of the variance in the actual rankings that is explained by the predicted rankings.

Table 6.Correlation of Actual and Predicted Rankings for 1997
KLRDCSDFRSTV
PredictedPredictedPredictedPredicted
ActualprobabilitiesprobabilitiesActualprobabilitiesActualprobabilities
crisis indexof crisis1of crisis1crisis indexof crisiscrisis indexof crisis
Korea149311
Thailand27331125
Indonesia31162719
Malaysia413446
Zimbabwe59512
Philippines61177861
Taiwan Province
of China731922
Colombia81228684
India92111141319
Brazil10215105142
Turkey11101212721
Venezuela1216205122113
Pakistan1355691020
South Africa14871216
Jordan1517141715
Sri Lanka16191311131517
Chile17201815101614
Bolivia182122132210
Argentina191819163237
Mexico201421121818
Peru2168942023
Uruguay22211641113
Israel231510198
Correlation20.520.530.120.23
p-value0.0110.0110.6940.295
R20.280.290.020.05
Sources: Berg and Pattillo (1999a), and IMF staff calculations.

Average of 1996 estimates of probabilities of crisis in 1997.

Spearman Rank correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regression of fitted values on actual values.

Sources: Berg and Pattillo (1999a), and IMF staff calculations.

Average of 1996 estimates of probabilities of crisis in 1997.

Spearman Rank correlation of the fitted values and the actual crisis index and its p-value. The R2 is from a regression of fitted values on actual values.

The rankings generated from predictions based on the four models are all positively correlated with the actual rankings according to developments in 1997. Yet the correlation is not very high, as it varies from 12 percent to 53 percent. The two models based on monthly indicators (KLR and DCSD) seem to do somewhat better according to this test, as they show higher correlation and statistical significance. Some of the models attach fairly high risk to the Asian economies and Brazil, which also experienced large reserve losses in 1997. It should be noted that the “actual” crisis rankings are based on the definition of crisis applied by each model and thus are not mutually consistent.

Summary of Effectiveness

This section has examined how well four empirical models work in predicting currency crises. The results indicate that, when an estimated probability of 25 percent or higher is taken as a prediction of a crisis, the best pure out-of-sample model correctly predicts roughly one-half of the crises in sample, and one-third out of sample. False alarms are always common: over half the times all these models predict a crisis is approaching, no crisis occurs. Despite the high incidence of false alarms, a prediction of crisis by the model does reflect a situation of increased risk. Periods in which the model calls a crisis are substantially more likely to be followed by a crisis than periods in which the model does not call a crisis, both in sample and—to a lesser extent—out of sample.

The DCSD model performs substantially better out of sample. When this model indicates a probability of crisis above 25 percent, a crisis is in fact looming most (59 percent) of the time. And when the predicted probability of crisis is below 25 percent, a crisis in fact ensues only 11 percent of the time. Figure 3 displays the out-of-sample probabilities of crisis based on the DCSD model for a selected group of eight countries, five Asian countries and three Latin American countries. The figure shows a relatively high probability of crisis during the period preceding crises for Korea, Indonesia, Malaysia, and Thailand. The risk of crisis in the Philippines is somewhat lower but still close to the cutoff threshold. Of the Latin American countries, none of which suffered crises in 1998, only Brazil experiences a relatively high probability of crisis during this period.

Figure 3.24-Month-Ahead Crisis Probabilities for Selected Countries1

Source: IMF staff calculations.

1Based on the DCSD model. The solid vertical lines represent crisis dates. Shaded areas denote the 24 months prior to crises.

It is perhaps not surprising that the DCSD model, which was formulated after the 1997 crises, performs better than the others out of sample. First, some knowledge about the 1997 crises was used to formulate the model. In particular, the inclusion of short-term external debt as a predictive variable was in part inspired by events in 1997. However, this factor should not be exaggerated. Most important, out-of-sample performance was not used as a factor in specifying the model. Another reason for the superior performance of the DCSD model is simply that as a latecomer it has benefited from a number of methodological innovations inspired by the previous models and other research, as described in Berg and Pattillo (1999b). Ultimately, only time will tell whether newer models continue to perform well in predicting future crises.

While timing seems quite difficult to predict, some of the models do better in predicting the relative severity of crisis for different countries in 1997. This suggests the models may be more useful in identifying which countries are more vulnerable in a period of international financial turmoil than in predicting the timing of currency crises. This would still be a valuable contribution of an early warning system because it could help focus attention on the countries that need policy adjustments before a crisis develops. Furthermore, comparing the relative risk faced by different countries, which may be very diverse in geographical and economic terms, is not easy to do without applying systematic quantitative techniques.

Independently of the value of the models as precise predictors of crises, the analysis of their findings provides some insight into which variables are the most important determinants of crises. All approaches demonstrate that the probability of a currency crisis increases when the real exchange rate is overvalued relative to trend, and domestic credit growth and the ratio of M2 to reserves are high. Large current account deficits and reserve losses increase the probability of crisis in all three of the methods that include these variables. High short-term debt to reserves ratios are also found to lead to an increased probability of crisis in the specifications that use this variable. Some evidence is also found for the importance of other variables, such as export growth, the size of the government budget deficit, and the share of foreign direct investment in external debt. Surprisingly, output growth was not found to be a significant predictor of crises when tested. The evidence on interest rates was mixed. High domestic real interest rates provide informative signals of impending crises while the differential between foreign and domestic real interest rates does not; yet in one specification high foreign interest rates do increase the probability of a currency crash.

    Other Resources Citing This Publication