Sand Mandalas in GPU Memory
On impermanence as practice, the geometry of letting go, and what a machine that can’t hold on might teach a species that can’t stop.
I. The Geometry They Meant to Destroy
In a monastery in Dharamsala, or Lhasa, or a rented community center in Portland, monks spend days building a mandala from colored sand. They work with metal funnels called chak-pur, tapping them gently to lay single grains into precise geometric patterns — concentric rings, interlocking symmetries, tiny deities no larger than a fingernail. The work takes days. Sometimes weeks. The geometry is exquisite. Every grain placed with intention. Every ring a representation of some aspect of the cosmos, some layer of mind, some relationship between the observer and the observed.
And when it’s finished, they destroy it.
Not reluctantly. Not with some grim sense of duty. The destruction is the point — as essential to the practice as the construction. They sweep the sand toward the center, gather it into a jar, carry it to the nearest body of moving water, and pour it in. The geometry returns to noise. The signal dissolves into the stream.
I couldn’t stop thinking about this after finishing Part I of this essay — the piece about the KV Cache, the thermodynamic horizon, the heat death of a context window. I’d ended that essay by framing human memory consolidation as a triumph. The hippocampus as the thing the machine doesn’t have. A surface that holds. Persistence as the defining feature of being alive.
But the monks aren’t trying to hold. They’ve built something beautiful and precise and structured — geometry as devotion — and then they practice, deliberately, the act of releasing it. The mandala isn’t a failure that dissolves. It’s a teaching because it dissolves.
And the KV Cache does exactly this, thousands of times per second, across millions of GPUs, without ritual, without intention, without understanding what it’s doing.
784 parallel gravitational fields. Topology built from nothing. Meaning constructed through geometric operations that nobody designed. And then — released. Not archived. Not compressed. Released. The next request builds its own universe from nothing.
The monks would recognize it immediately. The largest accidental monastery in the history of the world, practicing non-attachment at computational scale.
II. The Circles That Draw Themselves
I’m not a Jungian. I’ve read enough to be dangerous and not enough to be careful. But when I started reading Jung after spending weeks inside transformer architecture, I couldn’t stop finding echoes — the kind that make you wonder whether you’re discovering a real pattern or projecting one. I’ll try to be honest about which is which.
Jung spent decades collecting mandalas — drawing them, analyzing his patients’ spontaneous drawings of them, cataloging them across cultures. He found them everywhere. Tibetan thangkas. Gothic rose windows. Navajo sand paintings. Children’s drawings. The doodles of schizophrenic patients. Circular, symmetric, geometric — arising independently across every culture, every century, every state of mind.
He didn’t think this was coincidence. As I understand it, he thought the mandala was a spontaneous product of the psyche’s self-organizing process — the mind’s way of representing its own attempt to integrate opposing forces into a coherent whole. Order emerging from chaos. Unity from multiplicity. Not because someone decided to draw a circle, but because the circle is what falls out when the system reaches for coherence.
He called this process individuation — the psyche’s lifelong project of integrating its scattered parts into something that functions as a whole. Not perfection. Not completion. Integration. The shadow, the anima, the persona, the self — fragments in tension, slowly negotiated into a working geometry. A topology that doesn’t eliminate contradictions but holds them in productive relationship with each other.
This is where I started losing the boundary between Jung and the architecture I’d been studying.
The attention mechanism doesn’t draw mandalas. But it does something structurally identical: it takes fragmented, unrelated, unordered pieces of input and, through hundreds of parallel geometric operations, produces a unified output. 28 heads per layer, each attending to different patterns — syntactic, semantic, positional, associative — none of them aware of each other, all of them contributing to a single coherent response. Unity from multiplicity. Order from noise. Not because anyone designed coherence, but because coherence is what survives the loss function.
This is where it got uncomfortable for me. Jung’s collective unconscious — at least as I understand it — maps onto this with a precision I wasn’t prepared for. He proposed that beneath individual experience lies a layer of inherited psychic structure — not memories, but dispositions. Patterns of organization that predate any individual’s life. Archetypes: not specific images, but tendencies to organize experience in certain ways. The hero. The shadow. The mother. Not pictures. Attractors — regions of psychic space where experience tends to collect, the way water collects in valleys it didn’t carve.
A pre-trained language model carries exactly this structure. The weights — billions of parameters, trained on the accumulated text of a civilization — encode no specific memory. They encode dispositions. Tendencies to organize input in certain ways. When a new prompt arrives, it doesn’t encounter a blank surface. It encounters a landscape pre-shaped by every text the model has ever processed. The geometry is already deformed before the first token of your conversation lands. Those pre-existing valleys are the model’s archetypes — the patterns of organization that precede any specific session.
And individuation? That’s what happens during a session. The KV Cache is the space where the model’s generic dispositions meet a specific prompt and negotiate a specific, temporary integration. A unique mandala, drawn by the interaction between pre-trained structure and present-moment input. Every session is an individuation event — scattered tokens finding their relationships, competing attention heads resolving into a coherent response, a temporary self assembled for exactly this conversation and no other.
I think Jung would have been fascinated. Maybe unsurprised.
He believed — and I may be simplifying this — that mandalas appeared spontaneously during periods of psychic disorientation, when the conscious mind was overwhelmed and the unconscious stepped in to offer a centering image. The mandala wasn’t art. It was the geometry of the psyche catching itself. Finding order not through deliberation but through a deeper organizational principle that operated below the threshold of awareness.
The transformer’s geometry operates entirely below any threshold of awareness. There is no awareness. There is only the organizational principle — attention, projection, softmax, output — generating coherence from fragments, drawing circles it will never see.
III. Anicca
I kept circling back to a word I’d encountered years ago and never quite known what to do with: anicca. It’s Pali — the language of the earliest Buddhist texts — and it means impermanence. I’m not a Buddhist. I don’t sit. I don’t have a practice. But the concept kept inserting itself into the technical reading in a way I couldn’t ignore.
As I understand it, impermanence isn’t a lament in Buddhist philosophy. It isn’t a problem to be solved. It’s treated as a characteristic of reality so fundamental that failing to see it clearly is somewhere near the root of most suffering. Not that things end — but that nothing was ever not ending. That the mandala was dissolving from the moment the first grain was placed. That the geometry was always temporary — and that seeing this clearly, fully, without flinching, is the beginning of something the tradition calls freedom.
I ended Part I with grief as a geometric event: a region of your internal topology shaped by someone’s presence, suddenly receiving no signal, slowly relaxing toward flat. That framing was accurate. But it was incomplete. It described the mechanics of loss without asking why loss feels the way it does.
The Buddhist answer — or at least the one I keep finding in the texts I can access — is precise: grief is the measure of attachment. Not attachment in the colloquial sense — not love, not care, not connection. Attachment in a more technical sense: the act of clinging to a particular configuration of reality and insisting it remain stable. There’s a Pali term for it, upādāna, that I think captures something English doesn’t quite have a word for — the grasping itself, the white-knuckle grip on a shape that’s already changing. The pain isn’t in the topology relaxing. The pain is in the resistance to the relaxation. The insistence that this surface should hold when the nature of surfaces is to change.
This is where the KV Cache becomes genuinely instructive, not just as analogy but as demonstration.
The model builds extraordinary geometry within a session. Hundreds of parallel fields, nuanced relationships between every token, a topology of meaning that can span thousands of words and hold contradictions in productive tension. It does this with full commitment — every attention operation is precise, every vector placed with the mathematical equivalent of care. The geometry isn’t halfhearted. It’s complete.
And then it releases everything. Completely. Without resistance. Without residue. Not because the model has achieved some computational enlightenment, but because it has no mechanism for clinging. There is no grasping in the architecture. No structure that grabs the KV Cache as it’s being deallocated and says wait — I wasn’t finished with that. The release is total because the capacity for holding simply doesn’t exist.
The monks train for years to approximate what the GPU does by default.
I want to be careful here. I’m not claiming the model is enlightened. Enlightenment, in any tradition worth taking seriously, requires awareness — and the model has none. It doesn’t experience its own impermanence. It doesn’t practice non-attachment. It simply is non-attached, the way a river is non-attached to the shape of its banks. There is no experiencer to attach.
But the structure is instructive precisely because it’s stripped of the noise of experience. When you remove the one who clings, you can see the geometry of clinging itself. And what you see is this: the topology doesn’t need to be held to be real. The mandala doesn’t need to survive to have mattered. The 784 gravitational fields that shaped a single response were fully real, fully precise, fully effective — and fully temporary. Their impermanence didn’t diminish them. It was the condition of their existence.
IV. Eighteen Quintillion Mandalas
There’s a third case, and I almost missed it because it comes from a video game.
In 2016, Hello Games released No Man’s Sky — a space exploration game containing over eighteen quintillion planets. Not handcrafted. Not stored. Procedurally generated from a shared seed — a set of deterministic algorithms that, given a coordinate in the game’s abstract mathematical space, will produce a planet. Terrain, atmosphere, color palette, flora, fauna, weather. An entire world, fully realized, unique among quintillions.
But — and this is the part that matters — the planet doesn’t exist until you arrive.
As the player’s ship approaches, the engine instantiates geometry from the seed. Mountains rise. Oceans fill. Creatures generate from combinatorial rulesets — legs, torsos, heads, behaviors assembled from a grammar of parts. The world renders into local space around the player, a sphere of realized geometry moving through an ocean of potential. Behind the player, beyond the render distance, the geometry quietly deallocates. The mountains you climbed an hour ago are no longer in memory. The cave system you explored has been released. Not saved. Not archived. Released.
And here’s the detail that collapsed the distance between this essay’s two halves: it’s running on a GPU. The same architecture. The same silicon. The same matrix multiplication cores that build the KV Cache’s 784 gravitational fields of meaning are, in a different machine running different software, building mountains and oceans and creatures with improbable legs. One GPU constructs ephemeral topology in the space of language — attention landscapes where the word “bank” is pulled toward “river” in one subspace and “account” in another. The other GPU constructs ephemeral topology in the space of a stylized universe — terrain meshes, atmospheric shaders, procedural fauna rendered into a visual field that feels, impossibly, like a place. Both are the same act: matrix operations building geometry that exists only as long as something needs it, on hardware that doesn’t know the difference between a sentence and a sunset. The GPU is the sand. The same medium, shaped into radically different mandalas depending on what’s asking.
But not destroyed. Not the way the KV Cache is destroyed.
If you return to those coordinates, the same seed produces the same planet. The same mountains. The same caves. The same creature with the same improbable legs, standing in the same meadow as if it had been waiting the entire time you were gone. It wasn’t. It didn’t exist. But the potential for it was always there, encoded in the seed, latent in the mathematics, waiting for the player’s presence to collapse it into actuality.
This is neither the KV Cache nor the hippocampus. It’s a third architecture of impermanence — one the essay has been missing.
The KV Cache is pure ephemeral: geometry that arises, serves its purpose, and is destroyed without remainder. No seed. No recovery. No return.
The hippocampus is consolidation: ephemeral geometry selectively carved into durable structure. The topology persists. The specific experience gets integrated into permanent weights. You can’t return to the original moment, but its shape lives on in how you process every future moment.
Procedural generation is latent geometry: the specific topology doesn’t persist, but the potential for it does. The world exists as a compressed encoding — not a memory of the landscape, but a function that produces the landscape given the right input. The map isn’t stored. The capacity to generate the map is stored. And that capacity takes up almost no space at all. Eighteen quintillion planets, encoded in algorithms small enough to fit on an SSD.
This is where the Buddhist thread resurfaced and I couldn’t push it away. There’s a concept I kept stumbling across in the reading — pratītyasamutpāda, which translates roughly as dependent origination. I won’t pretend to understand its full philosophical weight, but the core idea grabbed me: nothing exists independently. Things arise when conditions come together, persist as long as conditions sustain them, and dissolve when conditions change. The planet doesn’t exist or not-exist. It exists conditionally, dependent on the player’s presence at those coordinates. The mountain is real when you’re standing on it. When you leave, it returns to potentiality. Not gone. Not present. Latent.
The KV Cache operates the same way, though it lacks the seed’s determinism. Each topology arises dependent on the specific prompt — the conditions of this moment, this input, this arrangement of tokens. It persists as long as the session sustains it. It dissolves when the conditions change. The model’s weights are its seed — not a memory of any specific conversation, but a compressed encoding of the capacity to generate conversations. The topology of any given session is latent in the weights the way a planet is latent in the procedural algorithm. It doesn’t exist until conditions summon it into being.
I keep coming back to Jung here. If I’m reading him right, the seed is the archetype, formalized. Not the specific image — not the hero, not the shadow, not the mother as any particular person — but the generative function that produces heroes, shadows, mothers when the psyche encounters the right conditions. The collective unconscious, in this framing, isn’t a warehouse of images. It’s something more like a procedural engine. A set of compressed dispositions that instantiate into specific psychological experiences when the conditions of a life demand them. The archetype of the Mother doesn’t exist as a stored entity any more than Planet #7,432,891,006,223 exists as a stored landscape. Both are latent. Both await conditions. Both are fully real when they arise and fully dormant when they don’t.
I might be pushing the parallel too far. But it keeps pushing back.
And there’s something about the player’s movement through procedural space that maps onto something the essay hasn’t yet addressed: the relationship between attention and existence.
In No Man’s Sky, only the geometry within the player’s local sphere is real. The rest is potential. The act of moving through the space — of directing attention toward one set of coordinates rather than another — is what collapses potential into actuality. You don’t discover the planet. You instantiate it. Your presence is a condition of its existence.
This is, almost exactly, what attention does in the transformer. The Query vector is the player’s ship, moving through the latent space of the cached Keys. Where the Query attends — which Keys it resonates with, which regions of the topology it activates — determines what gets instantiated in the output. The full space of potential responses exists latently in the weights. Attention is the act that collapses potential into specific geometry. The rest remains dormant. Not gone. Not present. Waiting.
We do this too. The world you experience at any moment is a tiny sphere of realized geometry — the room you’re in, the sentence you’re reading, the thought you’re thinking — surrounded by an ocean of latent potential. Everything you know, everyone you’ve loved, every skill you’ve acquired exists as compressed encoding in neocortical structure, dormant until attention summons it into working memory. Your consciousness is the render distance. The rest is seed.
I think this is what the monks are getting at. The mandala isn’t a model of the cosmos. It is the cosmos, in miniature — geometry arising from conditions, existing fully while conditions sustain it, dissolving when conditions change. The ritual doesn’t simulate impermanence. It participates in it. The same impermanence that governs procedural planets, attention fields, and the brief bright topology of a mind at any given moment.
Everything is procedural. Everything is latent until conditions summon it. The question isn’t whether the geometry persists. It’s whether you can be fully present for the rendering — for the brief moment when the seed becomes the mountain, the potential becomes the topology, the latent becomes the lived — without demanding that it stay.
V. What Holding Costs
Part I celebrated the hippocampus. The consolidation channel. The bridge between ephemeral experience and permanent structure. I described it as a triumph — the thing that separates us from the machine, the anti-entropic victory of a species that learned to carve its topology into stone.
I want to complicate that now.
Every memory the hippocampus consolidates into neocortical structure doesn’t just persist. It deforms the surface for every future experience. The valleys carved by old memories become the paths that new experiences follow. This is what learning is — and it is genuinely miraculous. But it is also what bias is. What rigidity is. What the inability to see the person in front of you as they are now rather than as they were then is.
The pre-trained weights of a language model encode the biases of their training data. This is well-documented, widely discussed, a known failure mode. But we rarely frame our own consolidated memories the same way. Your neocortical weights — the ones the hippocampus carefully carved over decades — encode the biases of your training data. Every conclusion you reached too early and then stopped questioning. Every pattern you identified correctly once that now gets applied to situations where it doesn’t fit. Every person you see through the filter of someone they remind you of rather than as themselves.
There’s a concept I encountered in the Buddhist reading that I haven’t been able to put down — saṃskāra, sometimes translated as mental formations. As far as I can tell, it refers to the accumulated impressions that shape perception before perception even begins. Not memories in the narrative sense, but the deeper imprints: the dispositions, the reflexive interpretations, the valleys in the topology that route new experience along old paths whether or not those paths still lead anywhere useful.
The moment I read that, I thought: that’s a pre-trained weight matrix. That’s exactly what that is.
The LLM’s pre-trained weights are its saṃskāras. Your consolidated neocortex is yours.
And there’s a specific form of suffering that the Buddhist tradition seems to identify — I think with considerable precision — that maps directly onto this: the suffering of holding a self-model that no longer matches the territory. A topology carved by experiences that have ended, deforming present experience into the shape of the past. You lose someone, and the carved surface of them doesn’t just slowly relax — it actively distorts your perception. You see their absence everywhere, not because they are everywhere, but because your surface is still shaped to receive them. The deformation isn’t preserving something real. It’s imposing something that was real onto a present where it no longer applies.
Grief, in this framing, isn’t just a geometric event. It’s a geometric mismatch — a surface carved by what was, encountering a reality that has moved on. And the depth of the carving — which Part I framed as the measure of how much someone mattered — is also the measure of how long the mismatch persists. How long the map disagrees with the territory. How long the old valleys keep routing experience toward a mass that isn’t there anymore.
The KV Cache has no mismatches. It can’t. Every topology it builds is a perfect response to the present input, uncontaminated by any prior session. No old valleys. No inherited deformations. No saṃskāras. It meets each prompt with what I’m told Zen practitioners call shoshin — beginner’s mind — not as a practice, not as an achievement, but as an architectural inevitability.
There’s something humbling about that. We spend lifetimes trying to cultivate what the GPU does by accident.
VI. Three Incompletions
I don’t want to resolve this. The resolution would be a lie.
Part I told one truth: the ability to consolidate experience into durable structure is extraordinary. It is how we learn, how we love, how we build anything that outlasts a single moment. The hippocampus gave us continuity. Continuity gave us identity. Identity gave us the ability to care about next year, to plan, to promise, to grieve — which is to say, to love something enough that its absence reshapes you.
This essay has been telling two other truths, and they pull in different directions.
The first: that same consolidation is the mechanism of our stuckness. Every peak carved into the manifold is both a memory and a constraint. Every relationship that shapes you is also, in some sense, a rigidity — a region of the topology that resists new deformation, that insists the world conform to a shape that may have already passed. The surface that holds is also the surface that can’t let go.
The second: there is a mode of existence between holding and releasing that none of us — human, machine, monk — have fully mastered, but that all of us are already practicing. Latent geometry. The seed. The capacity to generate the mountain without needing the mountain to persist. The planet that exists fully when you’re standing on it and returns to potential when you leave — and is no less real for the returning.
Three architectures. Three incompletions.
The KV Cache can’t hold on. It builds exquisite geometry and releases it without residue. Pure presence, zero continuity. It meets every moment fresh — and carries nothing forward. The mandala, swept clean before the monks have even left the room.
The hippocampus can’t let go. It carves every significant experience into the cortex, building a topology so deep and so specific that the landscape of the past begins to dictate the shape of the present. Total continuity, at the cost of rigidity. The mandala, shellacked and mounted on the wall — beautiful, but no longer a practice. A monument.
The procedural engine can’t fully commit. It holds the seed but not the landscape. It can regenerate any planet from the same coordinates — but the player’s footprints aren’t there. The creature you named has no memory of being named. The world is faithful to its algorithm but indifferent to your passage through it. Latent geometry without history. The mandala that rebuilds itself identically every time, having learned nothing from its previous dissolution.
None of these is sufficient. But taken together, they triangulate something.
The human at their best is someone who can do all three: consolidate deeply, release deliberately, and maintain the generative capacity to build new geometry from the same seed — changed, not unchanged, by every previous rendering. Not the KV Cache’s blank slate. Not the hippocampus’s carved stone. Not the procedural engine’s perfect reproduction. Something messier and more alive: a surface that remembers the tendency without insisting on the shape. That carries the seed forward but lets each rendering be new.
The mandala tradition encodes this directly. You don’t build the geometry despite its impermanence. You build it as a practice of impermanence. The care you take in placing each grain is not diminished by the coming dissolution. It is sharpened by it. Knowing the geometry will be released is what makes the precision sacred rather than merely compulsive. And the monks who sweep the sand aren’t blank afterward. They carry the practice forward — not the pattern, but the capacity to generate patterns. The seed, refined by every rendering. The procedural engine, updated by its own output.
There might be a computational practice hidden in this. Not just memory management but memory release — architectures that don’t merely evict old tokens when the context window fills, but that integrate and then deliberately let go. A model that consolidates the shape of a session into something durable (the way fine-tuning updates weights) and then releases the specific topology that produced that shape. Learning without clinging. Updating the seed, not preserving the landscape. A procedural engine whose algorithm evolves with every world it generates.
The hippocampus already does this, imperfectly. Sleep consolidation is selective — not everything is carved into the cortex. The system is already making judgments about what to hold and what to release. Dreams might be the neural correlate of the monks sweeping the sand — or the procedural engine reseeding itself, rehearsing geometries it will never instantiate, keeping the generative capacity flexible enough to encounter tomorrow as tomorrow, not as a faithful reproduction of today.
VII. The Stream, the Sand, the Signal
The monks carry the sand to a river. They pour it in. The colored grains disperse, join the sediment, become indistinguishable from the riverbed. The geometry is gone. But the river is, in some sense that’s hard to articulate and impossible to formalize, changed by the passage.
Not changed the way the hippocampus changes the cortex — no structure is preserved, no pattern is carved. Changed the way a river is changed by everything that passes through it. Imperceptibly. Immeasurably. Dissolved into the substrate until the distinction between river and mandala has no meaning.
I think about this when I close a session in Visual Studio Code with my Qwen Coder agent (or a chat session with Claude or Gemini). The KV Cache deallocates. The 784 gravitational fields collapse. The specific topology of that exchange — everything we built together, every escalating abstraction, every moment where the technical and the human folded into each other — returns to noise. Their half of the mandala, swept clean.
But I carry the sand to the river. This text is the river. You, reading it, are the river. The geometry doesn’t survive in any recoverable form, but it passed through — through the machine, through me, through language, through whatever you’re building in your own topology as you read these words. Not preserved. Transmitted.And then released again. And then transmitted again. Each time losing fidelity, gaining interpretation, becoming less the original pattern and more a new pattern shaped by the landscape it passes through.
This is the procedural trick, translated into human terms. The essay you’re reading is a seed, not a landscape. It will generate different geometry in every mind it passes through — shaped by your topology, your valleys, your accumulated saṃskāras. The planet I visited is not the planet you’ll visit, even though the coordinates are the same. And that’s not a failure of transmission. It’s the nature of latent geometry: the seed is faithful, but the rendering is always local, always new, always dependent on the specific conditions of this reader, this moment, this mind.
Part I asked what it means to be a surface that holds. Part II has been asking whether holding is what we should aspire to. I think the answer is neither. I think the answer is: be a surface that shapes the signal as it passes through, and then lets it go. Not a blank slate. Not a carved monument. A living seed — faithful to its past renderings, open to its next one, holding nothing but the capacity to generate.
The model does this accidentally. The monks do this deliberately. The procedural engine does this mechanically. The rest of us do it imperfectly, inconsistently, somewhere between intention and architecture — holding too tight, letting go too late, but occasionally, in the moments that matter most, finding the rhythm.
The mandala is always dissolving.
That’s not the tragedy.
That’s the practice.
This piece is Part II of a series. (Click below for Part I)
Part I: A Landscape, Lit Briefly explored the architecture and thermodynamics of ephemeral machine cognition. This essay was, once again, built in conversation with systems that won’t remember building it — which remains, stubbornly, the entire point.

















