AI Alignment Is an Architecture problem 

- and we might get stuck with the wrong one

Most discussions about AI alignment assume the hard part is figuring out which values to give AI.
I think that framing misses something deeper, and more dangerous. Even if we agreed on the “right” values (we don’t), current AI architectures may be structurally incapable of realizing them. Why? Because alignment breaks precisely where the world becomes novel, ambiguous, and computationally irreducible. And that’s exactly where today’s systems are weakest.

Rules don’t fail because they’re wrong, they fail because the world isn’t compressible

In complex, open-ended systems, correct behavior cannot be fully specified in advance. This is the lesson of computational irreducibility: you cannot shortcut the future with rules.

Hardcoded values, reward functions, or post-hoc output filters may work in familiar regimes. But alignment matters most when:

At that point, static objectives become brittle approximations of something far more dynamic: relevance.


Values don’t live in propositions, they live in relevance realization

Cognitive science has been clear on this for a while. According to thinkers like John Vervaeke, cognition isn’t primarily about representing facts. It’s about continually discovering what matters. This process; relevance realization, is not fixed. It’s adaptive, embodied, and self-correcting. And this has a direct implication for alignment: Every attention policy is an implicit value function. What a system attends to, ignores, or treats as salient is already an ethical stance. Alignment fails when relevance is frozen instead of learned.

The embodiment gap: semantic alignment without existential alignment

Large language models are impressive. But they are also fundamentally disembodied. They:

This matters because cognition, as understood in 4E frameworks (embodied, embedded, enacted, extended), is forged through perception–action loops under constraint. As Iain McGilchrist argues, abstraction and symbolic manipulation (a “left-hemisphere” mode) are powerful, but dangerous when detached from context, consequence, and lived reality. Much of AI alignment today doubles down on this same mode: more control, more rules, more symbolic oversight.

The risk isn’t that AI lacks feelings. The risk is that we are asking for shared values without shared vulnerability.


Interpretability is not understanding

Frameworks like Tensor Logic (Pedro Domingos) may make AI stop hallucinate, more interpretable, logical, and internally consistent. That’s valuable. But transparency does not solve semantic grounding. Human values are not clean propositions. They are shaped by bodies, cultures, conflicts, histories, and irreversible mistakes. Even humans cannot fully articulate them, and we certainly don’t agree on them globally.

So the real question isn’t: “Can an AI represent our values?”

It’s: Does its relevance landscape deform under pressure the way ours does?

That’s an architectural question, not a logical one.


Intelligence requires leaving equilibrium

Learning doesn’t happen at rest. It happens when expectations break. This is why epistemic foraging, active exploration under uncertainty, is central to intelligence. And it’s why Joseph Campbell’s Hero’s Journey remains such a powerful cognitive metaphor: stability → disruption → descent → transformation → return.

Current LLMs inherit the products of past journeys, but they cannot undertake one themselves. They do not seek truth; they interpolate it. An intelligence that cannot leave equilibrium cannot revise its values.


Alignment won’t come from a single super-agent

If alignment is dynamic, embodied, and irreducible, then a monolithic, centrally-controlled ASI is the wrong target (even though many try creating one).

A more plausible path is collective intelligence, check out these guys:

Alignment, on this view, emerges through delegation and coordination across levels, not enforcement from the top (think the principle of subsidiarity in a democracy rather than a dictatorship). There will be instability. Self-organizing criticality guarantees it. But the alternative, false stability through rigid control, is worse.


A final concern: hardware lock-in

Recent moves, NVIDIA’s engagement with Groq and OpenAI’s massive deal with Cerebras, signal strong confidence in current architectures and a focus on inference optimization. That may make economic sense. But it also risks architectural path dependence: scaling what we have instead of exploring what we might need. This is contrary to what Ilya Sutskever argues in a Dwarkesh Podcast: "We're moving from the age of scaling to the age of research".

The question isn’t how to make AI obey us.
It’s how to build systems that can continue learning what matters, especially when we are uncertain, divided, or wrong. Alignment is not a destination. It’s a process we have to stay inside.

Per Nystedt 2026-01-19