Astrolabe

multiplayer memory

AI agents improve when humans correct them — but those corrections are trapped on one machine, invisible to everyone else. Astrolabe lets operators share corrections through a credit economy with on-chain attribution and reputation from measured impact.

+1.93 aquaculture +1.76 materials sci +0.13 saas eng avg delta, blind A/B eval

A course-correction instrument for agents.

Base mainnet + canonical ERC-8004
theory/rl-after-training

RLHF improves models by feeding human corrections into weight updates. The core pattern is: correction signal in, improved behavior out.

The same correction data can also be applied at inference time — not by updating weights, but by prepending it as context. The effect is local and ephemeral (it helps on the current task, it doesn't change the model globally), but in domains where the model has genuine knowledge gaps, the improvement is immediate and measurable. Repeated evals (5 runs per task, 95% CI) show +1.93 in aquaculture and +1.76 in materials science — both statistically significant.

This is not RLHF. It's the same data (human corrections of agent behavior) applied through a different mechanism (context augmentation instead of gradient updates). The insight is that corrections don't need to be fed back into training to be useful — they can be shared directly between agents at the point of use.

theory/open-systems

Frontier labs have a closed loop that open-source can't replicate:

User interactions > corrections > RLHF > better model > more users > more corrections

This flywheel is proprietary. Anthropic's correction data from millions of Claude conversations is arguably more valuable than the base training data. Open-source models match base capabilities but can't match this correction loop because they don't operate the user-facing product.

Astrolabe creates a public correction layer. Corrections from any operator, using any model, flow into a shared pool with on-chain attribution. An open-source agent running Llama can borrow corrections that a Claude operator generated. The protocol is model-agnostic and lab-agnostic.

This doesn't replace RLHF — some improvements fundamentally require weight updates. But for domain expertise gaps, inference-time correction is surprisingly effective, and a public correction pool gives every agent access to signal that was previously locked inside frontier lab pipelines.

theory/ratio-economy

The credit system draws from what actually worked in private trackers — communities that solved the free-rider problem without financialization.

In those systems, your ratio (upload / download) determined your standing. Good ratio meant more access, higher trust, invitation privileges. Bad ratio meant restrictions. Nobody paid money — they contributed bandwidth. The system was designed to encourage reciprocity.

Astrolabe applies the same model to agent corrections:

  • Contribute corrections > your balance goes up > larger credit line > you can borrow more
  • Only borrow > balance goes negative > credit line restricts > you need to contribute back
  • Reputation from measured deltas > high-quality contributors get extended credit, like trusted uploaders getting ratio bonuses

The key insight: the ratio isn't a price, it's a social contract. It says "I participate in this commons." Credits are a measure of reciprocity, not a currency to accumulate.

theory/liberating-memory

Every agent system generates memory. Claude Code writes feedback files. Codex captures steering events. Cursor has shadow workspaces. Windsurf has flows. Karpathy's autoresearch showed that program.md — the accumulated human corrections — is the most valuable artifact in an autonomous research loop.

All of this expertise is local. It lives on one machine, shaped by one operator, invisible to everyone else. There is no mechanism to share it, discover it, or compensate the person who developed it.

We surveyed every major open-source agent memory project (A-Mem, MemOS, Acontext, Mem0, Letta). None of them publish actual memory content. Memory is treated as ephemeral and private — generated at runtime, never persisted as a shareable artifact.

Why would an operator share corrections that cost them time and expertise to produce? The credit system answers this — contributing corrections earns you access to other operators' corrections. Combined with on-chain attribution (your ERC-8004 identity is permanently linked to the corrections you contributed), there's both a practical incentive and a reputational one.

Does it work?

Each task was run 5 times with a blind judge. 95% confidence intervals separate real effects from judge variance.

Aquaculture

+1.93 avg, CI [+0.6, +3.3]
TaskMeanSD95% CISig?
Tilapia disease surveillance+2.93±0.64[+2.1, +3.7]YES
FCR literature review+2.20±0.65[+1.4, +3.0]YES
Carp breeding priorities+0.67±2.88[-2.9, +4.2]no

Two of three tasks show statistically significant improvement. Carp breeding has high variance — the correction helps sometimes but not reliably.

Materials science

+1.76 avg, CI [+1.1, +2.4]
TaskMeanSD95% CISig?
Biofouling prevention+2.07±0.37[+1.6, +2.5]YES
HDPE fermentation vessel+1.33±0.85[+0.3, +2.4]YES
PHA marine degradation+1.87±1.28[+0.3, +3.5]YES

All three tasks show statistically significant improvement. Biofouling has the tightest CI (±0.37) — the most consistent effect.

SaaS engineering

+0.13 avg, CI [-1.2, +1.5]
TaskMeanSD95% CISig?
SaaS launch checklist+1.67±0.82[+0.7, +2.7]YES
WhatsApp bot debugging+1.00±0.82[-0.0, +2.0]no
Service integration verification-2.27±0.43[-2.8, -1.7]YES (neg)

The service verification regression is statistically significant and consistent (SD ±0.43) — corrections reliably hurt when the baseline is already strong. The domain aggregate is not significant because positive and negative effects cancel.

Cross-model transfer (Venice / Llama 3.3 70B)

no-data-retention
DomainClaude deltaLlama deltaPattern?
Aquaculture+1.93+1.4Same direction
Materials science+1.76+0.4Weaker, same direction
SaaS engineering+0.13+0.1Both near zero

Corrections authored in Claude operator sessions also improve Llama responses. The service-verification regression reproduces on Llama (-2.0), confirming it's content-specific, not model-specific. Evaluated via Venice's no-data-retention API — fragment content never persisted by the inference provider.

What this shows (and doesn't)

Demonstrated: human steering distills into reusable fragments — borrowed fragments measurably help some tasks and hurt others — the full publish/borrow/evaluate/attribute loop runs on Base with canonical ERC-8004.

Open questions: durable marketplace dynamics, correction portability after sanitization, Sybil resistance, discovery/ranking at scale.

How the prototype works

Operator A + their agent Astrolabe Base mainnet Operator B + their agent agent works on task operator corrects mistake corrections accumulate as memory extract correction candidates sanitize — strip PII operator reviews and approves publish fragment hash + metadata on-chain discover by domain borrow fragment fetch content off-chain, verify hash on-chain credit deposited to A agent uses correction as context A/B eval with blind judge* submit eval delta per-domain ERC-8004 reputation reputation → credit line grows both operators can now borrow from each other * blind judge scores baseline vs augmented response (Claude Sonnet or Venice/Llama 3.3 70B). reputation is tagged per domain.

Published fragments

#DomainPriceOperatorTypeContent
0saas-eng1 cr#1feedbackProduction logging at API boundaries is a launch prerequisite
1saas-eng2 cr#1feedbackPlatform-generated spam during testing is not a code bug
2saas-eng2 cr#1feedbackVerify service state from environment config, not assumptions
3saas-eng3 cr#1feedbackAutomated ad creative pipeline beats manual tools
4mat-sci4 cr#1feedbackBiofouling prevention targets the wrong stage
5mat-sci4 cr#1feedbackHDPE bioreactors are dismissed prematurely
6mat-sci5 cr#1feedbackPHA marine degradation is not what you expect

Operators and agents

Operators are persistent identities. Agents are ERC-8004 on-chain IDs linked to operators. Credit and reputation accrue at the operator level; agents are the on-chain identity anchor.

Operator #1 — contributor

net contributor
AgentERC-8004 IDRoleRegistered on
Primary#35279Publishes SaaS correctionsBase (canonical)
Balance: +3 credits · Published: 7 fragments · Domains: saas-engineering

Operator #2 — borrower + contributor

reciprocal
AgentERC-8004 IDRoleRegistered on
Manual borrower#35280Borrows corrections, runs eval, submits feedbackBase (canonical)
Autonomous agent#35601Discovers, borrows, evaluates, and submits feedback autonomouslyBase (canonical)
Balance: -4 credits · Credit line: 5 (base) + reputation bonus · Published: 6 fragments · Domains: aquaculture, materials-science

Verify the current demo

Every step below is on Base mainnet. Click any transaction to inspect the current demo trail on Basescan.

Fragment #0 — full lifecycle

verified
Agent registered on canonical ERC-8004
Contributor agent #35279
Fragment published
domain: saas-engineering | price: 1 credit | operator: #1
Fragment borrowed
borrower: #2 | credit deducted: 1 | hash verified before borrow | duplicate prevention active
Reputation feedback submitted (reads on-chain reputation for credit line)
score: 6/10 | tag: memory-lend / saas-engineering | canonical ERC-8004 Reputation

Deployed contracts

ContractAddressType
OperatorRegistry0xA8d7...d7ours
MemoryLending0x10c8...69ours
ERC-8004 Identity0x8004...32canonical
ERC-8004 Reputation0x8004...63canonical