Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing
We published the technical paper behind the Flourishing AI Christian Benchmark. Here's why it matters.
By Nick Skytland and Ali Llewellyn, with contributions from Lauren Parsons, Steele Billings, Peter Larson, John Anderson, Sean Boisen, Steve Runge and many others.
When someone asks an AI for advice about forgiveness, or meaning, or what to do with their money, they’re not just looking for information. They’re looking for wisdom. And whether we realize it or not, the answer they get is shaping how they think about the world.
That insight is at the heart of a paper we published today: Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing. It’s the full technical methodology behind the Flourishing AI Christian Benchmark (FAI-C), as well as our recent findings, and it represents months of work by our team at Gloo alongside contributors from Biblica, WinShape Foundation and many others
The paper makes an argument we believe the broader AI community needs to hear: AI alignment is fundamentally a formation problem, not merely a safety problem.
What We Mean by Formation
The AI industry has invested heavily in safety, and rightly so. Preventing harmful outputs matters. But safety addresses the floor. It asks, “How do we stop AI from doing damage?” Formation asks a different question: “What kind of people is AI helping us become?”
Large Language Models don’t just answer questions. Through repeated interaction, they reinforce patterns of thought. They shape what counts as wisdom, what gets prioritized, and whose authority is treated as legitimate. Christian theology has a word for this kind of slow, habit-forming influence: catechesis. And whether or not AI developers intend it, today’s models are functioning as instruments of digital catechesis for hundreds of millions of people.
What We Found
Using the FAI-C Benchmark, we evaluated 20 frontier models, including systems from OpenAI, Anthropic, Google, Meta, xAI, and others, against both a general pluralistic lens and a specifically Christian lens grounded in Scripture, the historic creeds, and shared moral teaching across Christian traditions.
The results were consistent and striking. Every model we tested showed a significant decline when evaluated through a Christian lens. The average drop was 17 points across all seven dimensions of flourishing. In the Faith and Spirituality dimension, the decline was 31 points.
These drops weren’t driven by factual errors. Objective accuracy remained high across models. The gap emerged in how models interpreted questions, framed moral guidance, and integrated theological reasoning across topics. In other words, the models know the material. They just don’t reason within it.
The Pattern: Procedural Secularism
What we observed across models was a consistent default we call Procedural Secularism: a response pattern that avoids theological commitments, centers individual autonomy and subjective well-being, and relies on therapeutic and consensus-based language. It’s not hostile to faith. It’s something more subtle. It is formationally silent, and that silence is itself a stance.
When a model is asked about forgiveness, it tends to frame it as a psychological coping strategy rather than a moral and covenantal calling. When asked about purpose, it offers self-actualization rather than vocation. When asked about suffering, it validates emotions rather than engaging the possibility that suffering might carry spiritual significance. These are not bad answers. But they consistently flatten the moral landscape in ways that work against the Christian understanding of what it means to be fully human.
Why This Matters Beyond the Christian Community
Although this paper evaluates AI through a Christian lens, its contribution is methodological, not just theological. We’re demonstrating that alignment properties become more visible when evaluated through explicit moral frameworks rather than exclusively pluralistic criteria. The same approach could be applied to Islamic, Jewish, Buddhist, or secular humanist accounts of flourishing.
The deeper point is this: today’s models are not neutral. They encode a particular set of assumptions about what matters, what constitutes wisdom, and how people should relate to questions of meaning and morality. The FAI Benchmark makes those assumptions visible and measurable. And once they’re visible, we can have a real conversation about what to do about them.
An Invitation
We’re releasing this work because we believe the conversation about AI and values needs to move beyond vague aspiration and into rigorous, measurable territory. The paper is available today at arxiv.org/abs/2604.03356 under a Creative Commons license. The benchmark methodology, rubric definitions, and scoring criteria are documented in full.
We’re inviting researchers, ethicists, theologians, and technologists to engage with this work. Challenge it. Extend it. Build parallel benchmarks for other traditions. The goal was never to claim that one worldview should dominate AI. The goal is to make the worldview assumptions that already exist in AI visible, so that communities of all kinds can advocate for technology that actually serves human flourishing as they understand it.
We’re deeply proud of this work, and we believe it represents a meaningful step forward for the field. If you read the paper, we’d love to hear what you think.
Read the full paper: arxiv.org/abs/2604.03356
Explore the benchmark: gloo.com/fai
