Partnering on AI Standards in GEOINT

By: Professor Todd Bacastow, Penn State Academy Professor and Teaching Professor Emeritus, Department of Geography, The Pennsylvania State University

The GEOINT community has increasingly turned to artificial intelligence (AI) to manage the overwhelming volume of spatial data and to enhance analytic capabilities. This "data glut" presents a serious challenge—one that AI technologies are well suited to address, though not without introducing new risks.

This discussion provides an initial look at the standards ecosystem shaping how the GEOINT community uses AI to support defense missions. These missions carry broad societal implications, especially since much of the analytic work occurs without public scrutiny. In this context, reliance on untrusted AI systems creates risk. Biased models, altered data, and misleading outputs can lead to serious real-world consequences, underscoring the need for rigorous, mission-relevant standards. While implementing standards comes with costs, those costs must be weighed against the potential benefits that AI standards can bring to GEOINT.

AI Standards
AI standards fall into several interrelated categories. Terminology and concept standards provide shared definitions and conceptual foundations. Data standards ensure consistency, quality, privacy, and interoperability. Model and algorithm standards guide training, evaluation, documentation, and benchmarking for quality and robustness. System and lifecycle standards address development and deployment processes. Ethics and governance standards promote transparency and accountability, while security and safety standards help defend against threats and ensure reliability. Performance and benchmarking standards measure effectiveness, and interoperability standards enable integration across platforms. Finally, education and training standards emphasize the essential role of skilled personnel in sustaining effective AI. Together, these standards form the foundation for trustworthy, effective, and ethical AI. The success of this ecosystem of standards depends on collaborative effort across government, academia, and industry.

AI Applications
Just as there are many types of standards, AI in GEOINT brings with it a multitude of applications. Examples include computer vision that provides drones the capability to "see" and navigate a flight path, generative AI applied to analysis of competing hypotheses, and various geospatial methods such as those contained in Esri’s GeoAI toolbox (Esri, n.d.). The GeoAI toolbox includes applications to train and use models that perform classification and regression on feature and tabular datasets, as well as to classify, transform, and extract information from unstructured text using natural language processing. Each of these GEOINT applications introduces its own nuances for standards. For instance, geospatial data involves not only locational accuracy but also attributional accuracy and geopolitical context. While general AI standards may address ethical use or model robustness, they may not consider whether a model uses correct borders when answering questions about countries or territories. These specifics matter in GEOINT, where operational consequences may hinge on such details. Hence, geospatial standards must be distinctly addressed in our standards framework.

Operational AI
However, this raises a key question: what defines mission success for GEOINT systems that incorporate AI? What is the “Gold Standard” for evaluating AI’s performance when GEOINT functions as a seamless, fully integrated capability in support of missions? Stated another way, what is the benchmark against which system outcomes are judged when applied in an operational setting? Some suggest the goal should be a level of trust and performance that matches human experts (Gonzalez et al., 2021). Here, the “Gold Standard” metric might be thought of as a “trust-value.” Just as a higher statistical t-value provides stronger evidence against a null hypothesis, a higher trust-value would indicate how the AI-generated output differs meaningfully from what a human analyst might produce in a similar operational setting. However, achieving this benchmark is neither straightforward nor always what we truly want.

AI models are trained on labeled data assumed to reflect ground truth. Yet, as Lowenthal (2008) reminds us, intelligence work is not about truth—it is about forming the best possible approximation of reality, what he calls “proximate reality.” This means intelligence analysis often operates in gray areas, where conclusions must be drawn under uncertainty, and verification is elusive.

History offers powerful illustrations. The fall of the Berlin Wall in 1989 and the collapse of the Soviet Union were both surprises to the intelligence community, particularly the CIA, which had devoted decades to monitoring the Eastern Bloc. Analysts underestimated internal dissent and overestimated the regime’s grip on power. The lesson is that even with good data and extensive expertise, analytic judgments can fail to predict important events.

Still, failure doesn’t always stem from missing data or misinterpretation. Analytic judgment often depends on perceptions—or biases—shaped by experience. Hoffman et al. (2010) illustrate this with a terrain analysis example in which an expert, given just two minutes to view an aerial image, predicted the presence of harmful bacteria—not through direct observation, but by integrating cues from vegetation, soil, and bedrock. What seemed like inference was, in fact, perceptual judgment, shaped by deep familiarity with environmental patterns. In GEOINT, experts routinely make judgments based on incomplete and ambiguous data through such perceptual learning—and they continuously adapt, re-learn, and even redefine what constitutes ground truth as data types evolve.

The bottom line is that as AI systems continue to advance, we must remain grounded in the mission-oriented and inherently uncertain nature of geospatial analysis. What qualifies as expert “truth” is often provisional—shaped as much by perception and context as by data. Despite this ambiguity, GEOINT’s use of AI must still yield reliable and actionable outcomes. These results may not always attain the level of trust placed in human experts. Still, regardless of how we define the Gold Standard, reaching it will require more than just technical benchmarks—it demands a shared understanding of what constitutes value and mission success.

Collaboration on AI Standards
Ultimately, the standards community must collaborate with operators to accelerate innovation and define metrics that go beyond technical benchmarks to capture how effectively AI supports achieving a decision advantage. Developing these GEOINT-specific standards requires strong partnerships. Broad collaboration is essential. What other partnerships make sense? Could other intelligence disciplines contribute different standards while aligning on shared principles?

Collaboration across the intelligence community—and with geospatial-intelligence partners in government, industry, and academia globally—can help surface unseen biases and build more comprehensive, inclusive standards. It is no longer enough to say that parts of the system meet technical standards. We need a meaningful metric that assesses how well AI helps achieve mission goals in the real-world context of GEOINT.

Note from the Author
This work reflects the author's original ideas and analysis. Assistance from ChatGPT (OpenAI, 2025) was used to support editing. All interpretations and conclusions are the author's own.

References

Bowen, S. A. (2024). “If it can be done, it will be done:” AI ethical standards and a dual role for public relations. Public Relations Review, 50, 102513. https://doi.org/10.1016/j.pubrev.2024.102513

Esri. (n.d.). An overview of the GeoAI toolbox. ArcGIS Pro. https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/an-overview-of-the-geoai-toolbox.htm

Gonzalez, D., Lingel, S., Flanagan, S., Geist, E., Han, B., & Heath, R. (2021). Artificial intelligence and the military: Real opportunities and enduring challenges (RAND Report No. RRA464-1). RAND Corporation. https://www.rand.org/pubs/research_reports/RRA464-1.html

Hoffman, R. R., Feltovich, P. J., Fiore, S. M., Klein, G., Missildine, W., & DiBello, L. (2010). Accelerated proficiency and facilitated retention: Recommendations based on an integration of research and findings from a working meeting (AFRL-RH-AZ-TR-2011-0001). Air Force Research Laboratory, Human Effectiveness Directorate. Florida Institute for Human and Machine Cognition.

Ish, D., Ettinger, J., & Ferris, C. (2021). Evaluating the effectiveness of artificial intelligence systems in intelligence analysis (Report No. RRA464-1). RAND Corporation. https://www.rand.org/pubs/research_reports/RRA464-1.html

Lea, A. S. (2024). Pyrite standards: Medical uncertainty, ground truth, and AI model evaluation in historical perspective. Journal of General Internal Medicine. https://doi.org/10.1007/s11606-024-08844-1

Lowenthal, M. M. (2008). Towards a reasonable standard for analysis: How right, how often on which issues? Intelligence and National Security, 23(6), 203–315.

McMahon, G. (2023). Artificial intelligence and analytic tradecraft: Challenges and opportunities for the U.S. intelligence community. Belfer Center for Science and International Affairs, Harvard Kennedy School. https://www.belfercenter.org/publication/artificial-intelligence-and-analytic-tradecraft

National Geospatial-Intelligence Agency. (n.d.). GEOINT artificial intelligence. https://www.nga.mil/news/GEOINT_Artificial_Intelligence_.html

Office of the Director of National Intelligence. (2020). Artificial intelligence ethics framework for the intelligence community. https://www.intelligence.gov/ai/ai-ethics-framework

OpenAI. (2025). ChatGPT (June 2025 version) [Large language model]. https://chat.openai.com/

U.K. Cabinet Office. (2023). Human-centred ways of working with AI in intelligence analysis. Government Office for Science. https://www.gov.uk/government/publications/human-centred-ways-of-working-with-ai-in-intelligence-analysis

U.S. Government Accountability Office. (2023). Guide to AI for the Intelligence Community. https://www.gao.gov/assets/ai-intelligence-community-guide.pdf

U.S. Joint Military Intelligence College. (n.d.). Failures of the “intelligence/operations communities”. Defense Intelligence Agency.

Wirtz, B. W., Weyerer, J. C., & Geyer, C. (2024). AI governance: A new architecture for regulating artificial intelligence in organizations. Government Information Quarterly, 41(1), 101860. https://doi.org/10.1016/j.giq.2024.101860

What do you think? Share your thoughts and join in the discussion around data centricity and literacy in GEOINT at the 22nd annual DGI conference in London from Feb 23-25, 2026.

Return to Blog

DGI 2026

Partnering on AI Standards in GEOINT