Bock references a 1998 paper written by Schmidt and Hunter as the scientific backing for Google’s interview practices, specifically the use of “work sample tests” and “structured interviews”.
In 2016 Schmidt wrote an updated paper integrating data from 20 additional years of research and improved analysis methods:
The clear “winner” in its ability to predict job performance on a standalone basis according to Schmidt’s analysis are “General Mental Ability” (GMA) tests, such as the O*NET Ability Profiler, the Slosson Intelligence Test and the Wonderlic Cognitive Ability Test. These are on average able to predict 65% of a candidate’s job performance. This represents a 14% increase in their predictive ability compared to the ’98 data, unseating “work-sample test” (’98–54%, ’16–33%). The average here only tells part of the story as more refined analysis suggest a significant difference in its predictive ability depending on job type: 74% for professional and managerial jobs, and 39% for unskilled jobs.
Interestingly, no organization I’ve ever worked for or heard of seems to be using GMA. One reason might be that the consistency and precision in the method, coupled with the large sample sizes make it easier to prove that these tests introduce both gender and racial bias. This seems unfortunate, since none of the other evaluation methods are bias-free, it’s just harder to measure. Being able to measure bias precisely allows us to correct for it, in the short-term — post-hoc, and in the long-term — through better test design.
Next up are employment interviews (58%), where “structured interviews” refer to interviews in which both questions and answers evaluation criteria are consistent across candidates. The MSA and PSQ questions I discussed here are a good example of structured interview questions. The list goes down from there all the way to graphology and age with little to no predictive power. While the two don’t seem to differ in predictive power, unstructured interviews are certainly more bias-prone.
Since GMA seems to be the best measure for making hiring decisions, Schmidt looks at all other measures relative to it, asking the following question:
When used in a properly weighted combination with a GMA measure, how much will each of these measures increase predictive validity for job performance over the .65 that can be obtained by using only GMA?
In this case, the focus shifts from looking solely at their standalone predictive ability and instead also taking into account their covariance with GMA (smaller covariance = better).
The more extensive summary table is shown below but the bottom-line is this:
Overall, the two combinations with the highest multivariate validity and utility for predicting job performance were GMA plus an integrity test (mean validity of .78) and GMA plus a structured interview (mean validity of .76)
So where does all of this leave us? In my opinion it seems like the pendulum in recruiting may have swung too far from quantitative assessment pole to the qualitative assessment pole. It seems like we’d get much better outcomes from our recruiting efforts if GMA and Integrity assessments replaced some of our structured interviews, all the while as we work diligently to remove bias out of our recruiting efforts, regardless of the assessment methods we use.