When One Hiring Algorithm Screens Everyone: What a New Study Reveals

Researchers from Northeastern University and Stanford University have published a study examining what happens when a single vendor's screening algorithm stands between millions of job seekers and every position they apply for — and the results matter for anyone interested in how AI makes decisions about hiring.
The paper, "Algorithmic Monocultures in Hiring", is authored by Rishi Bommasani, Sarah Bana, Kathleen A. Creel, Dan Jurafsky, and Percy Liang. It analyzes data from 3 million job applicants who submitted 4 million applications, all screened by algorithms from the same vendor — a configuration that sounds niche but actually describes how many large recruiting pipelines operate right now.
What the Study Examined
The researchers focus on a concept called algorithmic monoculture: what happens when a single AI model, or nearly identical models from one vendor, makes important decisions about millions of people all at once. The term comes from agriculture, where planting one crop variety everywhere maximizes short-term efficiency but creates fragility — if that crop fails, everything fails at once.
In hiring, the comparison is apt. When human recruiters each apply their own judgment, their mistakes tend to be different — one person's blind spot gets caught by another person's strength, and the overall system has some wiggle room. When one algorithm screens every application at a company (or across many companies using the same vendor), that diversity disappears. The algorithm's biases become built into the system itself, rather than being isolated problems.
The researchers had access to exactly this kind of data: one vendor's algorithm applied uniformly to all 4 million applications. This gave them a rare opportunity to measure both what happened to individual applicants and what happened across the entire population.
How They Measured Correlated Failure
One of the study's more interesting technical contributions is how the researchers used the algorithm's consistency as a measurement tool. Because these hiring algorithms produce the same decision every time given the same input — they are deterministic, in technical terms — the researchers could calculate what would have happened if each applicant had applied to every single job, not just the ones they actually applied to. This counterfactual, or hypothetical, reconstruction allowed them to spot patterns that you would normally only see if you had complete data on who applied for what.
This matters because traditional audits can only measure outcomes that actually happened. By using the algorithm's consistency, the team essentially filled in the missing pieces and created a complete picture from partial data.
The 4% Problem and Correlated Rejections
One key finding concerns what the researchers call homogeneous outcomes. Among applicants who submitted to 10 or more positions, 4% were rejected across all of them. On its surface, a 4% all-rejection rate might not sound remarkable — some candidates genuinely are not a fit for the roles they pursue. But here is the issue: this rate is higher than what you would expect by chance, given how often the algorithm rejects people for individual roles.
What this means is that the algorithm is not making independent decisions. It is making the same mistakes over and over. An applicant the model disfavors tends to get rejected by it consistently, even when applying to jobs with different requirements. From the applicant's perspective, they just see a string of rejections that looks like ordinary bad luck — but it is actually the same automated decision repeated.
This is the core risk that researchers worry about with monocultures. The failure mode is not random noise; it is systematic error that gets replicated everywhere at once.
Racial Disparities in Outcomes
The study also examines the results against U.S. anti-discrimination standards — specifically, whether the algorithm produces meaningfully lower approval rates for different racial groups.
The findings are concrete: 14.74% of applications submitted by Asian applicants landed in situations where the algorithm was treating that group differently, under U.S. legal standards. For Black applicants, that figure was 25.87% — roughly one in four applications. That is not an edge case or a rare occurrence buried in the data. It is a substantial portion of all applications, and it reflects conditions that are likely already happening in hiring systems in use today.
The broader context matters here. Adverse impact under U.S. employment law is a statistical measure of unequal outcomes, not a judgment about whether anyone intended discrimination. What the study measures is outcome disparity at scale. That distinction is important for legal purposes, but it does not change the fact that the disparity exists and affects real applicants.
Why This Pattern Should Sound Familiar
We have seen something similar before, in a different form. When a handful of tech companies consolidated control over web search, social media feeds, and advertising during the 2010s, the diversity of outcomes for publishers and users narrowed along with them. Publishers could no longer rely on varied gatekeepers; one company's algorithm became everyone's gate. The consequences — fewer diverse news sources, harder visibility for small publishers, narrower information diet — took years to become obvious because they were spread across the entire system rather than appearing as isolated incidents.
The hiring monoculture problem has the same structural pattern: one decision-making system consolidates power, and the correlated outcomes only become visible when you look at population-level data.
The difference is that employment decisions are governed by well-established anti-discrimination law in ways that content distribution is not. That creates a clearer regulatory framework here — and more concrete legal risk for employers who have outsourced all their screening to a single dominant vendor.
What This Means in Practice
For organizations building or buying AI hiring systems, this study highlights several real pressure points.
Vendor concentration is a risk by itself — not just because of business continuity concerns, but because of bias and legal exposure. If a large portion of the job market is being evaluated by the same underlying model, the disparity risk is not contained within a single company's use of the system; it is spread across everyone using it simultaneously.
Second, the very feature that makes these systems auditable — the fact that they produce consistent, predictable outputs — is also what makes their errors so hard to escape. A human recruiter with biases can be retrained, reassigned, or corrected by colleagues. A biased algorithm produces the same error, exactly, every time it sees the same input. Nothing changes until the model is rebuilt or replaced.
Third, the methodology the researchers use here — reconstructing full counterfactual outcomes — is becoming the standard for how regulators expect companies to audit algorithmic decisions. The European Union's AI Act and several U.S. states already require this kind of testing for employment algorithms. Organizations that have not built the infrastructure to do this audit should treat this paper as a preview of what regulators will ask for.
The study does not propose specific fixes — it diagnoses the problem. But the diagnosis is grounded in a large, real dataset: three million applicants, four million applications, one vendor's algorithm. This is not a theoretical risk scenario. This is how the market works today.
The solutions that researchers have suggested — using multiple vendors, combining different algorithms, continuously checking for disparate impact, and having humans review edge cases — are not technically difficult. The research community has understood these approaches for years. What has been missing is the organizational will and business incentive to implement them. Studies like this one make it harder to leave the problem unaddressed.


