C.5 Abstractions: Overview
Why abstractions?
Why abstractions?
Abstraction is about throwing out information while only keeping the parts that are useful for achieving one’s goals or predicting the future. When predicting a star’s trajectory, for instance, the total mass matters much more than the exact configuration of particles inside it. Any useful world model does this constantly: it discards nearly all low-level detail and keeps the latent structure that supports prediction and action.
This matters for alignment because humans’ world models throw out a lot of information about the physical world, and the things we care about correspond to abstractions or latent variables in our world models rather than to precise low-level physical states. When you want a strawberry, you want something picked out by the latent variable “strawberry” in your model of the world, not a specific microphysical configuration that happens to realize one. As argued in the pointers problem, transferring our goals to an AI therefore requires translating human latents into the AI’s world model.
This translation is more tractable if many agents converge on the same abstractions in some sense. Without convergence, each agent has its own idiosyncratic internal ontology, and value translation becomes an intractable case-by-case problem. With convergence, uniqueness or agreement theorems can guarantee that the AI’s “strawberry” and the human’s “strawberry” point to approximately the same real-world structure. We also want abstractions to be robust to ontology shifts, so an AI continues caring about the right things even as it radically revises its world model.
The natural latents framework formalizes this via two conditions: a latent must mediate between observables, which become independent given the latent, and be redundant, meaning recoverable from any individual observable. A key result is that the mediator determines the redundant variable, and these conditions jointly pin down a unique natural latent. The payoff is a guaranteed translatability theorem: if two agents both use natural latents, each agent’s latents are guaranteed to be a function of the other’s. The Bayes net algebra developed alongside this framework lets one reason about such latent structures diagrammatically, with clean approximate versions of each rule.
Condensation addresses a complementary question. Whereas information theory asks how to compress data efficiently, condensation asks how to organize it so that it is easy to use: forming discrete, interpretable conceptual structure rather than a compressed blob. It proves a similar agreement result: different approximately efficient condensations will posit approximately isomorphic latent variables.
Finally, factored space models extend Bayesian networks to handle deterministic relationships, which arise naturally when macro-level variables are functions of micro-level ones. Standard causal graphs break down because they cannot faithfully represent certain deterministic relationships, such as XOR. Factored space models resolve this by expressing the sample space as a Cartesian product, enabling a faithfulness condition that Bayesian networks cannot satisfy in this setting.
Reading session
Readings
- The pointers problem — read entirely
- Natural abstractions: Key claims, theorems, critiques — read entirely
- Understanding abstraction as a robust bottleneck — read the linked section
- Ontology identification — read entirely
- Abstraction as redundant information — read entirely
- Minimal latent approach to abstraction — read entirely
- Algebra of Bayes nets — read for the main rules and diagrammatic intuitions
- Natural latents: the concepts — read entirely
- Natural latents — read for the agreement theorem
- Condensation motivation — read entirely
- Condensation paper — read the introduction, definitions, and main theorem statements
- Factor space model — read the introduction and examples
- Softwareness in the natural world — read the introduction and conceptual sections
Why do deterministic relationships break Bayesian networks?
Ordinary Bayesian networks are best behaved when the variables are connected by genuinely stochastic conditional distributions. Trouble appears when one variable is an exact deterministic function of others. In that case, many different DAGs can represent the same joint distribution, and conditional-independence structure stops telling the whole story.
The simple intuition is that if \(Z = f(X,Y)\) exactly, then the support of the joint distribution is forced onto a lower-dimensional surface. Standard Bayes-net manipulations treat this like an ordinary conditional distribution, but deterministic constraints behave more like equations than like noisy channels. They can create dependencies that are structural rather than merely probabilistic.
For example, if \(Z = X \oplus Y\), then knowing any two of \(X,Y,Z\) determines the third. A plain DAG can encode the factorization, but it does a poor job of capturing the symmetry and exact constraint structure. Factored space models are meant to handle this more naturally by treating deterministic structure as first-class rather than as a degenerate special case of a stochastic graph.
Exercises
After the reading block, move to the topic pages for the afternoon session. The exercises are collected by topic with setup, statements, and hints:
Prerequisites
If you want a single read-ahead document, see the shared prerequisites refresher.
Information theory refresher
The abstractions exercises lean heavily on a short list of information-theoretic identities:
- entropy \(H(X)\) and conditional entropy \(H(X \mid Y)\);
- mutual information \(I(X;Y)\) and its entropy expansions;
- KL divergence \(D_{\mathrm{KL}}(P \| Q)\);
- conditional independence and screening off;
- the fact that if \(Y = f(X)\) deterministically, then \(H(Y \mid X) = 0\) and \(I(X;Y) = H(Y)\).
If these feel rusty, review them before diving into the natural latents exercises; they do real work in almost every proof.
For a fuller version, see the shared prerequisites refresher.
Bayesian networks refresher
The main graphical ideas you need are:
- a Bayesian network factorizes a joint distribution according to a DAG;
- chain, fork, and collider are the three local patterns to remember;
- d-separation is the criterion for when conditioning blocks or opens information flow;
- deterministic relationships can make ordinary conditional-independence reasoning misleading.
For this session, the most important practical point is understanding when a latent variable screens off observables and how conditional independence shows up in graph structure.
For a fuller version, see the shared prerequisites refresher.
Category theory: universal properties (optional)
This is optional. The only intuition worth keeping in mind is that a universal property characterizes an object by the maps into or out of it, together with a uniqueness condition. If you do not already know category theory, you can safely skip this on a first pass.