Human Judgment is the Artificial Intelligence Edge

Published July 3, 2026
@ 6:00 am

By Jonathan Aberman

The Department of Defense (DoD) is moving fast on artificial intelligence (AI). Very fast.

With $13.4 billion requested for AI and autonomous systems in fiscal year 2026 and close to $30 billion requested in 2027 for infrastructure and AI technology, deployment targets are climbing across every branch and combatant command. AI momentum in DoD is real and the intent is serious.

But momentum toward what, exactly?

Putting aside questions of lethality in AI-driven approaches to kinetic and cyberwarfare, I am struck by how one-sided the approach seems to be on the matter of human and AI hybridization, the integration of human judgment and expertise with AI capabilities to enhance decision-making and performance.

Most of what I hear centers around scale: how many tools are deployed, how many workflows are automated, how many use cases have cleared the acquisition hurdle. Those are procurement metrics, not operational ones. And in a domain where the cost of a flawed decision is measured in lives and strategic credibility, the difference matters enormously.

Adoption isn’t the same thing as advantage. That distinction is one the DoD cannot afford to blur.

The big problem with artificial intelligence

The problem is that AI systems don’t automatically improve human judgment. They improve the speed and consistency of outputs. That’s valuable in the right context, but dangerous in the wrong one.

The Government Accountability Office (GAO) has noted that even the best-performing AI systems can complete only about 30 percent of complex tasks autonomously without error. In a logistics workflow, the remaining 70 percent is an inconvenience. The operational level of war, where decisions compress into minutes and the environment defies clean categorization, that same 70 percent lands on a commander or warfighter who may not have the context or bandwidth to catch the error before it shapes a consequential choice.

The still largely unacknowledged problem is the inherent limitation of GenAI, which generates outputs based on patterns in existing data rather than true understanding, reasoning or independent judgment.

Even emerging models that claim to create novelty are grounded in this same architectural limitation. The ability to acknowledge the unimagined, or to shape the contextual understanding to react, continues to require original thinking that only humans provide.

Moreover, with humans still the ultimate determinants of the value of any outcome, human choices are not something that a predictive model can anticipate. Nor should it.

The operational contexts DoD cares about most, such as near-peer competition, gray zone conflict and multi-domain crisis response, are precisely the environments where the right answer does not exist in the training data.

Human experience counts

The value of experienced military judgment is not that it’s faster than AI. It’s that it can recognize when the current situation doesn’t fit the model, hold contradictory information in tension, and still commit to a course of action.

The same is true in the massive operational backbone of the defense establishment. The predictability of GenAI’s sameness will generate consistency and efficiency in many situations. Still, there remain many choices, large and small, that have significant effects on our national security.

The overarching point is that U.S. military must devote greater attention to human-machine teaming. And it’s not merely asking humans to see when AI is wrong.

Frankly, the level of AI development is at the point where their knowledge is so comprehensive that an AI derived response can look right to many human observers.

Analysts will defer to AI-generated assessments because they look authoritative and arrive fast. Commanders will receive decision packages shaped by the AI’s framing of the problem.

The illusion of competence

Anyone who has watched AI embed into organizational workflows has seen this pattern. The tool becomes the default. Questioning it feels like friction. Speed and confidence converge in ways that look like competence until something goes wrong.

This has significant and lasting implications for the successful use of GenAI. The human observer must be trained properly to receive and manage the output. This is not a governance problem alone. You can limit output access to get it to the “right” people or define rules for use. But the military must determine what cognitive attributes the right person needs to have.

This is where the lack of metrics is most telling.

Cognitive capability in the defense establishment must be measured and protected. The real risk of rapid AI adoption isn’t one dramatic failure. It’s the gradual replacement of the analytical judgment that makes military leaders effective.

The DoD should be asking not just how many operators are using AI, but whether their capabilities for human insight match the roles and AI output they’re expected to use.