Non-Farm Payroll (NFP) Part II – Forecasting

Following from my previous post, I will attempt to find the least inaccurate estimate of following month NFP. Yesterday’s NFP (Apr 17) turned out to be +98k. But further revisions might be applied on April NFP result in the future. [Updated: Was revised down to +50k]

Getting data from FRED, I applied Ridge regression on relevant predictors (from most recent months):

  • last month’s individual sector’s NFP result (MoM), revised for last month
  • ADP NFP (MoM), 2 days back
  • Core CPI (YoY), 1month back
  • Manufacturing Sales (YoY), 1month back
  • Core Retail Sales (YoY), 1month back
  • JOLTS Openings (YoY), 2 months back
  • JOLTS Hires (YoY), 2 months back
  • Leading Index seasonally adjusted, 1month back
  • Chicago National Activity Diffusion Index (YoY), 1month back
  • Housing Starts (YoY), 1month back
  • 4 Weeks Average Initial Claims (YoY) (Weekly->Monthly)
  • Dummy Variables for Jan to Dec

A total of 34 predictors and assume no overfitting given each predictors have 5 observations. Most notable is the fall and sharp rebounce in mining and logging hires in 2016 since Trump took office.


To see how ADP NFP been performing being the most recent data with the same measurement objective, we check the tracking error of AD NFP minus NFP. Some differences are:

  • The ADP estimate includes only private non-farm payrolls, while the BLS estimate includes both private and government non-farm payrolls.
  • The ADP releases just one estimate for non-farm payrolls addition, while the BLS releases an initial figure that’s revised twice to include results from companies that sent their responses late. The first BLS estimate includes results from ~70% of the survey size, while the second and third revisions include an additional 20% and 1% to 2% of survey responses, respectively.
  • The ADP-NER releases two days prior to the BLS non-farm payrolls release
  • Although the Automatic Data Processing (ADP) data base and survey methodology are different from the sample used to compute the non-farm payrolls report issued by the Bureau of Labor Statistics (BLS), ADP has estimated the data correlation between the private payroll additions reported in the ADP-NER and the final BLS Employment Situation report at ~0.96.


count 179.000000
mean -2.175626
std 93.732430
min -390.276000
25% -65.393000
50% -0.209000
75% 60.474500
max 255.778000
x[[‘ADP NFP’,’NFP’]].corr()
ADP NFP 1.00000 0.91187
NFP 0.91187 1.00000
  • errors are within +-94k 66% of the time
  • correlation is high at 91.2%
  • No seasonality found in Tracking Error (TE)

Using cross validation on ridge regression to choose the optimal* alpha for penalising unstable Bi coefficients, it turns out to be around 1/3 power.

alphas = 10**np.linspace(10,-2,500)*0.5
ridgecv = RidgeCV(alphas=alphas, scoring=’mean_squared_error’, normalize=True), y_train)

Data was scale to min-max before regression was done,otherwise, large scales coefficients would be given to small scale data and make the predictors influence on NFP unreliable and hard to interpret. Note that Jan 2017 NFP is reporting on Dec 2016 net hires. In general, hiring picks up from Jan till May, while layoffs from Jun to Sep. Previous month’s NFP for each sectors don’t have much influence on following month’s NFP given its smaller coefficients, same for housing starts.


Fit turns out pretty well, but we need to examine the errors closer as it is wrong within. You need to reverse the min-max scale to get back correct scales. Tracking error is within +/-97k 66% of the time, which is relatively huge. We aim to reduce this.


More closer look at more recent forecast against actual (revised) NFP figures.


Looking at the tracking errors forecast minus actual, there is some information in the white noise particularly at 4 months lag in ACF and PACF, suggesting a ARMA(4,4). This was not obvious with the TE by months plot 😦



Doing a ARMA(4,4), turns out that TE had some trend and is not entirely stationary. Hence ARIMA(4,1,4) was used.


We have to initalise the first value from TE[0] into the fitted values from ARIMA model cumulatively to reverse the first order differencing. However, it turns out futile that ARMA on regression errors led to greater TE deviations but less noise. This is the opposite of the usefulness of a forecast, can we want to catch big surprises rather than less noise, which the market would accept.

In conclusion:

  • there might be other missing predictors we omitted
  • Non-seasonal sector hiring do add up to be huge. If you look at Jul and Aug 2017, it is actually the non seasonal components that caused a huge negative tracking error
  • Its hard when NFP figures are overall net changes with revisions and small noise from a few components can be huge to total NFP change i.e. +/- 90k deviation is very small from the total NFP of 140.6 million. A good grasp of each sector’s business conditions is required, not just the macro picture.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s