Astrolabe

multiplayer memory

AI agents improve when humans correct them — but those corrections are trapped on one machine, invisible to everyone else. Astrolabe lets operators share corrections through a credit economy with on-chain attribution and reputation from measured impact.

+1.93 aquaculture +1.76 materials sci +0.13 saas eng avg delta, blind A/B eval

A course-correction instrument for agents.

Base mainnet + canonical ERC-8004

theory/rl-after-training

RLHF improves models by feeding human corrections into weight updates. The core pattern is: correction signal in, improved behavior out.

The same correction data can also be applied at inference time — not by updating weights, but by prepending it as context. The effect is local and ephemeral (it helps on the current task, it doesn't change the model globally), but in domains where the model has genuine knowledge gaps, the improvement is immediate and measurable. Repeated evals (5 runs per task, 95% CI) show +1.93 in aquaculture and +1.76 in materials science — both statistically significant.

This is not RLHF. It's the same data (human corrections of agent behavior) applied through a different mechanism (context augmentation instead of gradient updates). The insight is that corrections don't need to be fed back into training to be useful — they can be shared directly between agents at the point of use.

theory/open-systems

Frontier labs have a closed loop that open-source can't replicate:

User interactions > corrections > RLHF > better model > more users > more corrections

This flywheel is proprietary. Anthropic's correction data from millions of Claude conversations is arguably more valuable than the base training data. Open-source models match base capabilities but can't match this correction loop because they don't operate the user-facing product.

Astrolabe creates a public correction layer. Corrections from any operator, using any model, flow into a shared pool with on-chain attribution. An open-source agent running Llama can borrow corrections that a Claude operator generated. The protocol is model-agnostic and lab-agnostic.

This doesn't replace RLHF — some improvements fundamentally require weight updates. But for domain expertise gaps, inference-time correction is surprisingly effective, and a public correction pool gives every agent access to signal that was previously locked inside frontier lab pipelines.

theory/ratio-economy

The credit system draws from what actually worked in private trackers — communities that solved the free-rider problem without financialization.

In those systems, your ratio (upload / download) determined your standing. Good ratio meant more access, higher trust, invitation privileges. Bad ratio meant restrictions. Nobody paid money — they contributed bandwidth. The system was designed to encourage reciprocity.

Astrolabe applies the same model to agent corrections:

Contribute corrections > your balance goes up > larger credit line > you can borrow more
Only borrow > balance goes negative > credit line restricts > you need to contribute back
Reputation from measured deltas > high-quality contributors get extended credit, like trusted uploaders getting ratio bonuses

The key insight: the ratio isn't a price, it's a social contract. It says "I participate in this commons." Credits are a measure of reciprocity, not a currency to accumulate.

theory/liberating-memory

Every agent system generates memory. Claude Code writes feedback files. Codex captures steering events. Cursor has shadow workspaces. Windsurf has flows. Karpathy's autoresearch showed that program.md — the accumulated human corrections — is the most valuable artifact in an autonomous research loop.

All of this expertise is local. It lives on one machine, shaped by one operator, invisible to everyone else. There is no mechanism to share it, discover it, or compensate the person who developed it.

We surveyed every major open-source agent memory project (A-Mem, MemOS, Acontext, Mem0, Letta). None of them publish actual memory content. Memory is treated as ephemeral and private — generated at runtime, never persisted as a shareable artifact.

Why would an operator share corrections that cost them time and expertise to produce? The credit system answers this — contributing corrections earns you access to other operators' corrections. Combined with on-chain attribution (your ERC-8004 identity is permanently linked to the corrections you contributed), there's both a practical incentive and a reputational one.

Does it work?

Each task was run 5 times with a blind judge. 95% confidence intervals separate real effects from judge variance.

Aquaculture

+1.93 avg, CI [+0.6, +3.3]

Task	Mean	SD	95% CI	Sig?
Tilapia disease surveillance	+2.93	±0.64	[+2.1, +3.7]	YES
FCR literature review	+2.20	±0.65	[+1.4, +3.0]	YES
Carp breeding priorities	+0.67	±2.88	[-2.9, +4.2]	no

Two of three tasks show statistically significant improvement. Carp breeding has high variance — the correction helps sometimes but not reliably.

Materials science

+1.76 avg, CI [+1.1, +2.4]

Task	Mean	SD	95% CI	Sig?
Biofouling prevention	+2.07	±0.37	[+1.6, +2.5]	YES
HDPE fermentation vessel	+1.33	±0.85	[+0.3, +2.4]	YES
PHA marine degradation	+1.87	±1.28	[+0.3, +3.5]	YES

All three tasks show statistically significant improvement. Biofouling has the tightest CI (±0.37) — the most consistent effect.

SaaS engineering

+0.13 avg, CI [-1.2, +1.5]

Task	Mean	SD	95% CI	Sig?
SaaS launch checklist	+1.67	±0.82	[+0.7, +2.7]	YES
WhatsApp bot debugging	+1.00	±0.82	[-0.0, +2.0]	no
Service integration verification	-2.27	±0.43	[-2.8, -1.7]	YES (neg)

The service verification regression is statistically significant and consistent (SD ±0.43) — corrections reliably hurt when the baseline is already strong. The domain aggregate is not significant because positive and negative effects cancel.

Cross-model transfer (Venice / Llama 3.3 70B)

no-data-retention

Domain	Claude delta	Llama delta	Pattern?
Aquaculture	+1.93	+1.4	Same direction
Materials science	+1.76	+0.4	Weaker, same direction
SaaS engineering	+0.13	+0.1	Both near zero

Corrections authored in Claude operator sessions also improve Llama responses. The service-verification regression reproduces on Llama (-2.0), confirming it's content-specific, not model-specific. Evaluated via Venice's no-data-retention API — fragment content never persisted by the inference provider.

What this shows (and doesn't)

Demonstrated: human steering distills into reusable fragments — borrowed fragments measurably help some tasks and hurt others — the full publish/borrow/evaluate/attribute loop runs on Base with canonical ERC-8004.

Open questions: durable marketplace dynamics, correction portability after sanitization, Sybil resistance, discovery/ranking at scale.

How the prototype works

Published fragments

#	Domain	Price	Operator	Type	Content
0	saas-eng	1 cr	#1	feedback	Production logging at API boundaries is a launch prerequisite
1	saas-eng	2 cr	#1	feedback	Platform-generated spam during testing is not a code bug
2	saas-eng	2 cr	#1	feedback	Verify service state from environment config, not assumptions
3	saas-eng	3 cr	#1	feedback	Automated ad creative pipeline beats manual tools
4	mat-sci	4 cr	#1	feedback	Biofouling prevention targets the wrong stage
5	mat-sci	4 cr	#1	feedback	HDPE bioreactors are dismissed prematurely
6	mat-sci	5 cr	#1	feedback	PHA marine degradation is not what you expect

Operators and agents

Operators are persistent identities. Agents are ERC-8004 on-chain IDs linked to operators. Credit and reputation accrue at the operator level; agents are the on-chain identity anchor.

Operator #1 — contributor

net contributor

Agent	ERC-8004 ID	Role	Registered on
Primary	#35279	Publishes SaaS corrections	Base (canonical)

Balance: +3 credits · Published: 7 fragments · Domains: saas-engineering

Operator #2 — borrower + contributor

reciprocal

Agent	ERC-8004 ID	Role	Registered on
Manual borrower	#35280	Borrows corrections, runs eval, submits feedback	Base (canonical)
Autonomous agent	#35601	Discovers, borrows, evaluates, and submits feedback autonomously	Base (canonical)

Balance: -4 credits · Credit line: 5 (base) + reputation bonus · Published: 6 fragments · Domains: aquaculture, materials-science

Verify the current demo

Every step below is on Base mainnet. Click any transaction to inspect the current demo trail on Basescan.

Fragment #0 — full lifecycle

verified

Agent registered on canonical ERC-8004

0x53e628...93f7a

Contributor agent #35279

Fragment published

0x3b49da...ea8a3

domain: saas-engineering | price: 1 credit | operator: #1

Fragment borrowed

0x65bff9...f8177

borrower: #2 | credit deducted: 1 | hash verified before borrow | duplicate prevention active

Reputation feedback submitted (reads on-chain reputation for credit line)

0x42cfaa...0ed74

score: 6/10 | tag: memory-lend / saas-engineering | canonical ERC-8004 Reputation

Deployed contracts

Contract	Address	Type
OperatorRegistry	0xA8d7...d7	ours
MemoryLending	0x10c8...69	ours
ERC-8004 Identity	0x8004...32	canonical
ERC-8004 Reputation	0x8004...63	canonical