Executable brain graphs

Create & evolve self-improving brain graphs.

Compose functions, scripts, model calls, evals, training jobs, mutators, and sub-brains into systems that run, measure, mutate, and improve.

Experiment loop

Definegoal
Runtrace
Measuremetrics
Improvevariant
Tracklineage

Executable graphs, from primitive functions to recursive brains.

A brain can contain tiny functions, scripts, APIs, model calls, datasets, scorers, verifiers, train steps, distillers, mutators, agents, and whole sub-brains.

Example graph

Input
Router
Planner
Model A
Tools
Memory
Model B
Verifier
Eval
Dataset
Mutator
Archive

Model and infrastructure agnostic

Integrate with your existing tooling.

FrankenBrain plugs into the shape of modern frontier-lab workflows: JAX/PyTorch/TensorFlow code, accelerator-backed runs, eval harnesses, traces, artifacts, workflow orchestration, experiment tracking, inference stacks, and internal or open-source runners. Pull models, APIs, datasets, code, checkpoints, and artifacts from places like OpenAI, Anthropic, Gemini, Hugging Face, GitHub, Kaggle, S3/GCS/R2, W&B, MLflow, or internal stores.

JAX PyTorch TensorFlow TPUs GPUs XLA Kubernetes containers distributed training inference optimization eval harnesses experiment tracking workflow orchestration artifact stores trace logs lineage OpenAI Anthropic Gemini Hugging Face GitHub Kaggle S3/GCS/R2 Ray Slurm vLLM TGI W&B MLflow DVC checkpoints distillation data

Toy Demo

Load a small sample graph, simulate a measured run, suggest a mutation, compare variants, and inspect the exported artifact.

Load a preset, simulate a run, suggest a mutation, then compare variants.

Experiments as ordinary artifacts.

A credible brain is more than a diagram. Its goal, graph, trace, metrics, diff, lineage, checkpoints, and distillation data should be inspectable as ordinary artifacts.

brain.schema.json Schema
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "required": ["id", "entry", "nodes", "edges", "evals"],
  "properties": {
    "nodes": {
      "items": { "required": ["id", "type"] }
    },
    "edges": {
      "items": { "prefixItems": [
        { "type": "string" }, { "type": "string" }
      ]}
    }
  }
}
brain.json Runnable Graph
{
  "id": "compression-loop.v3-b",
  "goal": "lossless text compression",
  "entry": "planner",
  "nodes": ["planner", "candidate", "exact_decode", "score"],
  "edges": [["planner", "candidate"], ["candidate", "exact_decode"]],
  "evals": ["exact_decode"],
  "archive": "runs/compression-loop/run_0042"
}
run_0042.trace.jsonl Trace Output
{"step":1,"node":"planner","tokens":812}
{"step":2,"node":"candidate","file":"codec.py"}
{"step":3,"node":"exact_decode","passed":true}
{"step":4,"node":"score","bytes":18741}
{"step":5,"node":"archive","variant":"v3-b"}
eval/exact_decode Eval Score
Exact decode 100%
Compressed bytes 18,741
Runtime 2.8s
Regression 0 fail
run_0042.manifest.json Run Manifest
{
  "run_id": "run_0042",
  "command": "franken run brain.json",
  "started_at": "2026-05-02T14:31:08Z",
  "seed": 1842,
  "fixture_set": "canterbury+synthetic-v1",
  "git_sha": "9c3a1f7",
  "artifacts": ["trace", "eval", "diff", "lineage"]
}
variant.diff Mutation Diff
--- compression-loop.v3
+++ compression-loop.v3-b
@@ mutation
- planner.temperature: 0.70
+ planner.temperature: 0.35
- candidate.strategy: "dictionary-v2"
+ candidate.strategy: "bpe-hybrid"
+ exact_decode.cases: 64
+ score.penalty.runtime_ms: 3000
lineage.json Lineage View
  1. v1baseline21,904 bytes
  2. v2dictionary-v219,338 bytes
  3. v3-bbpe-hybrid18,741 bytes

Composable building blocks.

Use tiny primitives when you need control, full sub-brains when you want reuse, and training or mutation nodes when the graph should improve its own parts.

Primitive to Recursive Nodes

A node can be a tiny function, shell command, parser, scorer, API call, model call, eval, training job, distiller, mutator, agent, or an entire brain packaged as one reusable node.

Script Function Model Eval Train Sub-brain

Fast Graph Setup

Start from a blank brain, template, or JSON file. Add primitive code, model, prompt, tool, memory, dataset, eval, verifier, training, mutator, API, agent, or sub-brain nodes without burying the experiment in glue code.

$ franken init research-brain
$ franken add node planner --model gpt
$ franken run brain.json

Goal-Driven Runs

Define what the graph is trying to improve: compressed size, exact decode, runtime, accuracy, cost, robustness, task difficulty, or any custom metric you can measure.

Manual or Meta Mutation

Edit the graph yourself, suggest directions to a meta-brain, or let mutation nodes generate, run, score, and select new variants under your constraints.

Training and Distillation

Nodes can be tuned, specialized, swapped, cached, distilled, or trained during an experiment. Successful traces can become datasets for cheaper or sharper specialist models.

Graph as a Node

Collapse any working graph into a reusable sub-brain. Bigger graphs can call it, train around it, replace it, mutate it, or embed an improvement loop inside another system.

Metrics Beside the Architecture

Evals, judges, exactness checks, benchmarks, adversarial generators, regression suites, and score models are nodes in the graph. The measuring system stays next to the system being improved.

Verifier Judge Dataset Score Trace Lineage

CLI / SDK First

The visual editor is for convenience. The same brain graphs should run headlessly through a CLI, Python SDK, API, local workers, Docker, Slurm, Ray, or your own runner.

For researchers, builders, and hobbyists.

FrankenBrain can sit on top of serious research stacks, but it should also let curious builders frankenbrain weird executable systems together and see what actually happens.

Research stacks

Keep the tooling you already use.

Connect existing models, evals, runners, datasets, checkpoints, logs, and artifact stores without forcing the experiment into a new framework.

Search loops

Let a goal drive iteration.

For a compression benchmark, define exact decode, compressed size, runtime, and regression metrics, then let the graph propose, test, score, archive, and mutate variants.

Hobbyists and builders

Frankenbrain ideas together.

Start with primitive functions, scripts, model calls, tools, memory, and small evals. Build by hand, ask for mutation ideas, or let a loop keep trying while you inspect results.

Get started

Start with a narrow, measurable loop. Define the goal, wire the runnable graph, run it locally, inspect the trace, then let mutations compete against the baseline.

$ franken init compression-loop --template hutter-search
$ franken add metric compressed-bytes --minimize
$ franken add verifier exact-decode --required
$ franken run brain.json --runner local --trace runs/001.jsonl
$ franken compare runs/baseline runs/001
$ franken mutate brain.json --agent meta --budget 20 --keep-best
$ franken export-distill runs/best --format jsonl