Wrap without rewriting
Keep your code, launchers, trainer scripts, benchmark runners, and run commands. FrankenBrain adds a programmable control plane around them.
Graph-native autoresearch workbench
Import a working context, expose the objective, run variants, compare evidence, and promote winners without losing traces, artifacts, or lineage.
A brain can contain code, model calls, tools, data, evals, controllers, artifacts, services, workflows, and reusable sub-brains.
Works beside your stack
FrankenBrain sits beside existing research infrastructure. It turns scattered code, data, evals, services, runners, and logs into an executable view with enough structure to inspect, compare, improve, and preserve evidence.
Executable context import
Import repos, folders, scripts, evals, configs, services, datasets, and run commands into a reviewable capsule. Public material stays outcome-level; private runbooks and recipes stay gated.
Keep your code, launchers, trainer scripts, benchmark runners, and run commands. FrankenBrain adds a programmable control plane around them.
Make the objective, inputs, metrics, constraints, artifacts, and promotion gate explicit so every candidate is judged the same way.
Start locally, then move proven work to configured runners as each target is connected and proofed.
$ franken import-workflow ./research-stack $ franken run brain.json $ franken evidence runs/latest $ franken compare runs/baseline runs/candidate
The result is an executable, editable brain that can be opened, evaluated, compared, and reused inside larger brains.
Run an interactive proof sketch: inputs on the left, an executable brain graph in the middle, eval output on the right, and trace/artifact updates as the graph fires.
Reusable strategy programs, typed AI control, and replayable evidence cards make the loop usable by people and AI workers through the same brain package.
Select and reuse autoresearch programs such as a discrete optimizer, evidence-grounded council, obstruction hunter, scale ladder, and auto-autoresearch loop. The selected stack belongs in the brain package evidence bucket.
The workbench operations are exposed as typed AI-worker controls: inspect, patch, validate, run, compare, promote, and replay brains.
Public summaries should show what was measured, the fixed budget, the promotion boundary, and the replay shape while keeping private run ids, recipes, benchmark packs, and target details gated by default.
Target class: imported runnable system Metric: explicit objective and direction Budget: fixed variants / cost / runtime Boundary: exploratory, private, or benchmark-comparable Evidence: graph, run, artifacts, lineage, replay command
A credible brain carries its goal, graph, trace, metrics, diff, lineage, checkpoints, runner proof, and training data as inspectable artifacts.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"required": ["id", "entry", "nodes", "edges", "evals"],
"properties": {
"nodes": {
"items": { "required": ["id", "type"] }
},
"edges": {
"items": { "prefixItems": [
{ "type": "string" }, { "type": "string" }
]}
}
}
}
{
"id": "candidate-system.v3-b",
"goal": "improve task score under fixed constraints",
"entry": "planner",
"nodes": ["planner", "candidate", "verifier", "scorer"],
"edges": [["planner", "candidate"], ["candidate", "verifier"]],
"evals": ["verifier"],
"archive": "runs/task-agent/run_0042"
}
{"step":1,"node":"planner","event":"plan"}
{"step":2,"node":"candidate","file":"policy.py"}
{"step":3,"node":"verifier","passed":true}
{"step":4,"node":"scorer","score":0.742}
{"step":5,"node":"archive","variant":"v3-b"}
{
"run_id": "run_0042",
"command": "franken run brain.json",
"started_at": "2026-05-02T14:31:08Z",
"seed": 1842,
"fixture_set": "holdout-suite-v1",
"git_sha": "9c3a1f7",
"artifacts": ["trace", "eval", "diff", "lineage"]
}
--- candidate-system.v3 +++ candidate-system.v3-b @@ variant - planner.temperature: 0.70 + planner.temperature: 0.35 - candidate.strategy: "single-pass" + candidate.strategy: "verify-then-revise" + eval_cases: 64 + score.penalty.runtime_ms: 3000
Use tiny primitives when you need control, full sub-brains when you want reuse, and controlled improvement nodes when the graph should revise its own parts.
A node can be a function, service, model call, eval, data source, training job, controller, reusable sub-brain, or another workbench-controlled system.
Start from a repo, folder, script, template, or JSON file. Add models, tools, data, evals, controllers, APIs, services, and sub-brains without burying the experiment in glue code.
$ franken init research-brain $ franken add node planner --runtime local-or-remote $ franken run brain.json
Define what the graph is trying to improve: accuracy, runtime, cost, robustness, task difficulty, or any custom metric you can measure.
Edit the graph yourself or let controlled improvement nodes propose, run, score, and select variants under explicit constraints.
Winning traces can become training and review artifacts. Promote reusable systems or model artifacts only after measurement gates pass.
A node can be a function, job, remote service, simulator, workflow, reusable sub-brain, or another running FrankenBrain workbench. Collapse, call, compare, or zoom into it.
Every run can keep metric cards, candidate rankings, raw details, artifacts, diffs, and promotion actions next to the system being improved.
The visual editor is for convenience. The same brains should run headlessly through a CLI, SDK, API, local workers, and configured runners with explicit proof status.
FrankenBrain can sit on top of serious research stacks while making executable systems easier to inspect, measure, compare, and reuse.
Bring existing code, evals, runners, datasets, checkpoints, logs, and artifact stores into one reviewable experiment view.
Define the measurement surface and promotion gate; then compare bounded variants and keep the lineage behind the result.
Turn successful traces into reusable training or review artifacts, then promote only after held-out measurement passes.
Private technical demo
FrankenBrain is available by technical demo for labs, research teams, and serious AI systems builders. Share the stack, goal, and conversation type.