Self-Supervised Labels: How AI Can Label Its Own Data Without Human Annotation

Dec 06, 2025

For years, the field of machine learning has been bottlenecked by one painfully expensive ingredient:

Human-labeled data.

We have built entire industries around annotating image boxes, tagging text, scoring sentences, and identifying key points. We’ve created jobs, outsourcing ecosystems, and research fields entirely devoted to the problem of humans manually labeling data for machines.

But what if the machine didn’t need us?

What if it could label its own data, using structures that already exist inside the model?

What if the labels were not just cheaper — but better, more coherent, more stable, and more aligned with how the model actually thinks?

Over the last few weeks, we’ve been experimenting with exactly this idea. The results are surprising, intuitive, and potentially a blueprint for the next wave of machine intelligence.

Let me introduce a new concept:

Self-Supervised Labels

Labels discovered by the model itself — without humans in the loop.

The Core Insight: Models Already Have Internal Labels

We tend to think of large language models as black boxes that absorb mountains of text and respond with statistically likely outputs. But inside these models lives something far more structured:

Hidden states evolve over time, token by token.
These states follow stable and unstable trajectories.
At certain moments, the model’s internal dynamics stabilize, indicating it has entered a specific “mode of thought.”

These stable segments are not random.

They consistently appear around:

A turning point in the narrative
A key step in a multi-step plan
A decision moment
An important conceptual boundary
A transition from describing → reasoning → concluding

In other words:

The model already marks the important parts of the text — internally — even when no human label exists.

All we needed was a way to extract these internal boundaries.

And we found one.

How We Extracted the Model’s Own “Labels”

We analyzed the model’s temporal dynamics using three simple metrics:

1. KL Flow

How much does the output distribution change from one token to the next?
Low KL = the model is “settled.”

2. Curl (Hidden-State Acceleration)

Measures how sharply the trajectory of hidden states turns.
Low curl = smoother, stable thinking.

3. Resonance Persistence Index (RPI)

Cosine similarity between the current hidden state and a short-window past state.
High RPI = persistent, coherent reasoning.

These three signals together allowed us to detect temporal attractors:

Moments where the model settles into a stable cognitive mode.

Then we mapped those attractors to the actual tokens in the prompt.

The result?

The model highlighted phrases like:

“the moment”
→ The narrative decision point
“best strategy”
→ The selection/comparison stage of reasoning
“solution”
→ The option-generation stage
“would approach”
→ The meta-reasoning strategy formulation

No human ever labeled these.

The model did it to itself.

Cluster the Hidden States → Discover the Model’s Latent Cognitive Roles

Once we extracted attractor segments, we went one step further:

We took the hidden-state “center” of each attractor segment — a vector representing the model’s internal concept of that event — and clustered them.

What emerged were distinct, interpretable categories:

Cluster 0: Decision moment

“the moment … when it makes its choice.”

Cluster 1: Selecting the best option

“choose the best strategy…”

Cluster 2: Exploring solutions

“possible solution…”

Cluster 3: Approach / meta-reasoning

“you would approach proving…”

These were not designed by humans.
No rubric.
No annotation guidelines.

They were self-supervised labels discovered directly from the temporal geometry of the model’s hidden states.

This is the first time (to my knowledge) that an LLM has been used to automatically segment and label its own cognitive process.

Why This Matters

1. No more human labeling bottlenecks

Imagine fine-tuning a model where the “labels” come from:

its own reasoning boundaries
its own decision points
its own internal structure
its own conceptual transitions

This is infinitely scalable.
Self-consistent.
Cheap.
And aligned with how the model thinks.

2. More accurate than human labels

Humans frequently mislabel:

boundaries between ideas
logical transitions
implicit reasoning steps
narrative phases
core vs peripheral content

But the model’s attractors reveal exactly where the model internally believes these moments occur.

Self-supervised labels are model-true, not human-approximate.

3. This leads directly to “cognitive mode detectors”

Once you extract attractor labels, you can train a tiny classifier head:

“Which cognitive mode is the model in right now?”

You get a real-time:

plan step detector
reasoning-state recognizer
decision-mode detector
narrative-phase tracker

This is the beginning of machine metacognition.

4. This enables fully unsupervised reasoning datasets

Imagine feeding a model millions of arbitrary text passages and harvesting:

all its solution steps
all its decision points
all its reasoning transitions
all its approaches and methods

You could generate a self-labeled reasoning corpus from scratch.

No humans.
No annotators.
Just temporal geometry.

Why This Works

Models are not passive text mimickers.
They internally organize information into temporal basins of meaning.

These basins act like:

states in a dynamical system
micro-programs
cognitive roles
semantic attractors

This is why:

GPT models write coherent paragraphs
stories resolve in predictable shapes
reasoning outputs follow human-like structures

We’re simply measuring what the model already does.

The labels were always there — we just learned how to read them.

The Future: Self-Organizing Cognition

Self-supervised labels point toward a radical shift in how we build and train AI systems:

AI that organizes its own internal knowledge.

AI that labels its own thoughts.

AI that understands its reasoning structures.

AI that trains itself on its own patterns.

This unlocks:

higher reasoning clarity
controllable cognitive modes
architecture-agnostic introspection
self-supervised upgrading of reasoning ability
entirely new types of AI evaluation

It is the first step toward models that:

explain themselves
structure their own outputs
classify their own reasoning stages
detect inconsistencies internally
and improve based on their own feedback signals

No humans required.

Closing Thoughts

We have spent years assuming that humans must teach AI how to reason, how to label, how to segment thought.

But the experiment described above proves something very different:

The model already labels its own cognition internally.
We just needed to listen.

This is the frontier of self-organizing intelligence.

Richard’s Substack

Discussion about this post

Ready for more?