ARI — Autonomous Research Infrastructure

Philosophy

Research should be for everyone.

"The gap between 'I have an idea' and 'I have a result'
should be measured in hours — not months."

Today, running experiments, reviewing prior work, writing papers, and verifying findings each require separate expertise and enormous time. ARI removes those barriers.

Whether you are a student with a laptop and a free local AI, or a researcher with access to a supercomputer and the latest cloud models — ARI works the same way. You describe the goal. ARI does the rest.

Computation is only the beginning. ARI is architecturally designed to grow beyond software — into robotics, sensors, and laboratory equipment. This is not yet implemented, but the plugin architecture exists precisely for this purpose.

Inspired By · Prior Work

Standing on the shoulders of giants.

ARI was built on lessons from these pioneering works.

🧪

The AI Scientist v2

Sakana AI's fully autonomous scientific research system — from idea generation to peer-review-ready papers. A foundational reference for ARI's end-to-end pipeline design.

arxiv.org/abs/2504.08066 →

⚡

HPC-AutoResearch

An LLM-driven framework for automated HPC performance optimization. Demonstrated autonomous compiler flag and thread-count tuning on real supercomputer workloads — a direct predecessor to ARI's search engine.

researchgate.net/publication/403797672 →

🧬

VirSci

A multi-agent scientific deliberation system. Multiple AI personas with different research backgrounds debate a hypothesis, producing richer and more diverse research ideas than single-agent generation. Integrated into ARI's idea generation stage.

arxiv.org/abs/2410.09403 →

📋

PaperBench

OpenAI's benchmark for evaluating whether AI agents can reproduce frontier ML papers from scratch. Each paper is decomposed into a fine-grained TaskNode rubric scored by an LLM judge ("SimpleJudge"). v0.7.0 vendors PaperBench under ari-skill-paper-re/vendor/paperbench as the deterministic core of ARI's reproducibility check (ORS Phase 2). v0.7.2 adds paper-audit mode: the same rubric machinery inverted to audit whether a paper itself describes enough to be reproducible — venue-conditioned templates (sc / neurips / nature), multimodal figure inspection, and a paper_audit prompt patch that breaks the structural ceiling on Result Analysis leaves. v0.8.0 ships the 3-stage bridge contract (rollout_submission / reproduce_submission / judge_submission) honouring vendor protocol with container_image end-to-end plumbing, fail-loud sandbox/GPU preconditions, salvage retries, executed-submission tarballs, code_only Stage 1↔3 consistency, agent.env injection, host-FS source-removal guard, and env-truth guardrails (probe-before-scaffold, language-choice counter-prime, host-truthful ADDITIONAL NOTES with auto-detected binaries / GPU / network / Phase-2 isolation).

arxiv.org/abs/2504.01848 →

🧠

MemGPT

Hierarchical memory + tool-mediated paging that lets an LLM agent operate as if it had unbounded context. The MemGPT paper became Letta, which v0.6.0 adopted as ARI's memory backend — replacing the v0.5.x JSONL store with ancestor-scoped archival memory and a portable per-checkpoint snapshot.

arxiv.org/abs/2310.08560 →

How It Works

One file. Full pipeline.

Write a short description of what you want to optimize. ARI takes it from there — automatically, without any human intervention.

📝

Describe

Write your research goal in plain Markdown. Even 3 lines is enough.

🔍

Survey

ARI searches arXiv and Semantic Scholar, then VirSci multi-agent deliberation — multiple AI personas with different research backgrounds — debates the results to produce diverse, well-grounded ideas.

⚗️

Experiment

Best-First Tree Search (BFTS) with five node types — DRAFT, IMPROVE, DEBUG, ABLATION, VALIDATION — explores the hypothesis space in parallel. The LLM selects which frontier node to expand next based on peer-review scores, not heuristics. Failed nodes generate debug children, not retries. All generated code is captured per node.

📄

Write

A complete academic paper with figures and citations is generated automatically.

🔬

Analyze

ARI traverses the full experiment tree — root, improvements, ablations, and validation runs — and uses an LLM to extract methodology, setup, and key findings directly from raw artifacts. Nothing is hardcoded.

✅

Verify

A separate AI agent independently reconstructs and re-runs the experiment from the paper text alone — no original code shared.

🖥️

Web Dashboard — Real-time Control

A 10-page React/TypeScript SPA dashboard (built with Vite) with an Overleaf-like LaTeX editor, React Flow visual workflow editor (BFTS / Paper / Reproduce phase toggles per skill), D3 experiment tree, VLM figure review loop, rubric-driven paper review with ensemble + Area Chair meta-review and few-shot example manager, container runtime management, Letta-backed memory admin, recursive sub-experiments, and a 4-step experiment wizard — all from your browser. Component-based architecture with separate modules for each page. All output is isolated per project.

▶ See the live walkthrough in the Demo section ↑

🏠 Home Dashboard 📋 Experiments 📡 Monitor + D3 Tree 🌳 Tree + Code Viewer 📊 Results + LaTeX Editor ✨ Experiment Wizard 💡 VirSci Ideas 🔄 React Flow Workflow ⚙️ Settings + Container 🔬 Sub-Experiments

        ari viz ./checkpoints/my_run/ --port 8765
      

Universal

Works everywhere. For everyone.

ARI is not built for one environment or one type of user. It scales across five dimensions.

💻

Any Computer

Environment profiles (laptop / hpc / cloud) auto-detect your scheduler (SLURM, PBS, LSF, SGE, Kubernetes) and configure parallelism, memory, and container runtime automatically.

Laptop → Supercomputer

🧠

Any AI Model

Free local models (Ollama) or commercial APIs (Claude, GPT, Gemini) via litellm. Per-experiment model selection in the dashboard wizard — Ollama models support free-form name entry.

Free Local → Cloud API

📋

Any Expertise Level

Write 3 lines as a beginner, or 200 lines with precise technical controls as an expert.

Novice → Expert

🌍

Any Domain

Add new capabilities as plug-in modules. No changes to the core system needed.

Software → Physical World

📐

Any Format

Papers can target arXiv, NeurIPS, ICPP, SC, ISC, ACM, or any custom venue template.

Preprint → Top Conference

Vision

Beyond software.

The current version automates digital computation experiments. Physical world integration is on the roadmap — not yet implemented.

✅ Available Now

⚙️ Computational Research

Compiler tuning, algorithm benchmarking, ML hyperparameters, systems performance.

🔜 Coming Soon

🦾 Robotics

Robot arm trajectories, motion planning parameters, control system tuning via ROS2.

📅 Planned

🧪 Laboratory Automation (roadmap)

Liquid handlers, plate readers, reaction conditions — planned, not yet implemented.

🌟 Vision

🌍 Any Physical System (roadmap)

If it has a goal and a parameter space, ARI can explore it. The same system, infinite domains.

Quick Start

Running in minutes.

# 1. Clone the repository
git clone https://github.com/kotama7/ARI
cd ARI

# 2. Install everything
bash setup.sh

# 3. Choose your AI model
# Free option (runs offline):
ollama pull qwen3:8b

# Or use Claude / OpenAI:
export ARI_BACKEND=claude
export ANTHROPIC_API_KEY=sk-...

# 4. Run your first experiment
ari run experiment.md

# 5. Launch the web dashboard
ari viz ./checkpoints/my_run/  # → http://localhost:8765

# Other useful commands
ari projects          # list all runs
ari resume ./ckpt/    # resume interrupted run
ari settings          # view/modify config

The simplest experiment file looks like this:

# experiment.md — free-form text is fine
# Headings are optional; ARI reads any format.

Maximize performance of matrix
multiplication on this machine.

# Or use structured Markdown (optional):
## Research Goal
Maximize the target metric for
my experiment on this machine.

## Evaluation Metric
Primary score

## Constraints
- Describe your environment here

ARI will survey related papers, run experiments, generate figures, write a complete paper, and verify reproducibility independently — automatically. A unique LLM-generated title is assigned to each experiment project.

Full Setup Guide 📚 PaperBench Quickstart (v0.8.0) 📄 Sample Paper (PDF) View on GitHub →

Community

Built together.

ARI is open source and welcomes contributions of all kinds — new skills, pipeline ideas, bug reports, and more.

🔌

Add a Skill

Have a domain ARI doesn't cover yet? Wrap it as an MCP skill and submit a PR. Any benchmark, any tool.

🔄

Propose a Flow

Have an idea for a new research pipeline or workflow step? Open a discussion and shape ARI's direction.

🐛

Report & Fix

Found a bug or a rough edge? Issues and PRs are always welcome. No contribution is too small.

View on GitHub → Join the Discussion → 💬 Discord →

Research, automated end-to-end.

See ARI in action.

📄 Read the sample paper inline