{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"required": ["id", "entry", "nodes", "edges", "evals"],
"properties": {
"nodes": {
"items": { "required": ["id", "type"] }
},
"edges": {
"items": { "prefixItems": [
{ "type": "string" }, { "type": "string" }
]}
}
}
}
Executable brain graphs
Create & evolve self-improving brain graphs.
Compose functions, scripts, model calls, evals, training jobs, mutators, and sub-brains into systems that run, measure, mutate, and improve.
Experiment loop
Executable graphs, from primitive functions to recursive brains.
A brain can contain tiny functions, scripts, APIs, model calls, datasets, scorers, verifiers, train steps, distillers, mutators, agents, and whole sub-brains.
Example graph
Model and infrastructure agnostic
Integrate with your existing tooling.
FrankenBrain plugs into the shape of modern frontier-lab workflows: JAX/PyTorch/TensorFlow code, accelerator-backed runs, eval harnesses, traces, artifacts, workflow orchestration, experiment tracking, inference stacks, and internal or open-source runners. Pull models, APIs, datasets, code, checkpoints, and artifacts from places like OpenAI, Anthropic, Gemini, Hugging Face, GitHub, Kaggle, S3/GCS/R2, W&B, MLflow, or internal stores.
Toy Demo
Load a small sample graph, simulate a measured run, suggest a mutation, compare variants, and inspect the exported artifact.
Experiments as ordinary artifacts.
A credible brain is more than a diagram. Its goal, graph, trace, metrics, diff, lineage, checkpoints, and distillation data should be inspectable as ordinary artifacts.
{
"id": "compression-loop.v3-b",
"goal": "lossless text compression",
"entry": "planner",
"nodes": ["planner", "candidate", "exact_decode", "score"],
"edges": [["planner", "candidate"], ["candidate", "exact_decode"]],
"evals": ["exact_decode"],
"archive": "runs/compression-loop/run_0042"
}
{"step":1,"node":"planner","tokens":812}
{"step":2,"node":"candidate","file":"codec.py"}
{"step":3,"node":"exact_decode","passed":true}
{"step":4,"node":"score","bytes":18741}
{"step":5,"node":"archive","variant":"v3-b"}
{
"run_id": "run_0042",
"command": "franken run brain.json",
"started_at": "2026-05-02T14:31:08Z",
"seed": 1842,
"fixture_set": "canterbury+synthetic-v1",
"git_sha": "9c3a1f7",
"artifacts": ["trace", "eval", "diff", "lineage"]
}
--- compression-loop.v3 +++ compression-loop.v3-b @@ mutation - planner.temperature: 0.70 + planner.temperature: 0.35 - candidate.strategy: "dictionary-v2" + candidate.strategy: "bpe-hybrid" + exact_decode.cases: 64 + score.penalty.runtime_ms: 3000
- v1baseline21,904 bytes
- v2dictionary-v219,338 bytes
- v3-bbpe-hybrid18,741 bytes
Composable building blocks.
Use tiny primitives when you need control, full sub-brains when you want reuse, and training or mutation nodes when the graph should improve its own parts.
Primitive to Recursive Nodes
A node can be a tiny function, shell command, parser, scorer, API call, model call, eval, training job, distiller, mutator, agent, or an entire brain packaged as one reusable node.
Fast Graph Setup
Start from a blank brain, template, or JSON file. Add primitive code, model, prompt, tool, memory, dataset, eval, verifier, training, mutator, API, agent, or sub-brain nodes without burying the experiment in glue code.
$ franken init research-brain $ franken add node planner --model gpt $ franken run brain.json
Goal-Driven Runs
Define what the graph is trying to improve: compressed size, exact decode, runtime, accuracy, cost, robustness, task difficulty, or any custom metric you can measure.
Manual or Meta Mutation
Edit the graph yourself, suggest directions to a meta-brain, or let mutation nodes generate, run, score, and select new variants under your constraints.
Training and Distillation
Nodes can be tuned, specialized, swapped, cached, distilled, or trained during an experiment. Successful traces can become datasets for cheaper or sharper specialist models.
Graph as a Node
Collapse any working graph into a reusable sub-brain. Bigger graphs can call it, train around it, replace it, mutate it, or embed an improvement loop inside another system.
Metrics Beside the Architecture
Evals, judges, exactness checks, benchmarks, adversarial generators, regression suites, and score models are nodes in the graph. The measuring system stays next to the system being improved.
CLI / SDK First
The visual editor is for convenience. The same brain graphs should run headlessly through a CLI, Python SDK, API, local workers, Docker, Slurm, Ray, or your own runner.
For researchers, builders, and hobbyists.
FrankenBrain can sit on top of serious research stacks, but it should also let curious builders frankenbrain weird executable systems together and see what actually happens.
Keep the tooling you already use.
Connect existing models, evals, runners, datasets, checkpoints, logs, and artifact stores without forcing the experiment into a new framework.
Let a goal drive iteration.
For a compression benchmark, define exact decode, compressed size, runtime, and regression metrics, then let the graph propose, test, score, archive, and mutate variants.
Frankenbrain ideas together.
Start with primitive functions, scripts, model calls, tools, memory, and small evals. Build by hand, ask for mutation ideas, or let a loop keep trying while you inspect results.
Get started
Start with a narrow, measurable loop. Define the goal, wire the runnable graph, run it locally, inspect the trace, then let mutations compete against the baseline.
$ franken init compression-loop --template hutter-search $ franken add metric compressed-bytes --minimize $ franken add verifier exact-decode --required $ franken run brain.json --runner local --trace runs/001.jsonl $ franken compare runs/baseline runs/001 $ franken mutate brain.json --agent meta --budget 20 --keep-best $ franken export-distill runs/best --format jsonl