Autonomous Research Infrastructure

Research, automated end-to-end.

Describe your goal in plain text. ARI runs the experiments, writes the paper, and verifies the results — entirely on its own. Currently supports digital computation experiments (HPC, ML, systems).

Get Started ▶ Watch Demo Learn More
Scroll

See ARI in action.

Watch a live walkthrough of the ARI web dashboard, then download the sample paper that ARI generated end-to-end — a 10-page Stratum-Roofline CSR-SpMM study on Fujitsu A64FX/SVE-512.

▶ Live walkthrough of the ARI dashboard (auto-loop)

📄 Download sample paper Get started ↓

📄 Read the sample paper inline

Scroll inside the window below — this is the actual paper that ARI generated end-to-end.

sample_paper.pdf — generated by ARI
Sample paper — page 1 Sample paper — page 2 Sample paper — page 3 Sample paper — page 4 Sample paper — page 5 Sample paper — page 6 Sample paper — page 7 Sample paper — page 8 Sample paper — page 9 Sample paper — page 10

Research should be for everyone.

"The gap between 'I have an idea' and 'I have a result'
should be measured in hours — not months."

Today, running experiments, reviewing prior work, writing papers, and verifying findings each require separate expertise and enormous time. ARI removes those barriers.

Whether you are a student with a laptop and a free local AI, or a researcher with access to a supercomputer and the latest cloud models — ARI works the same way. You describe the goal. ARI does the rest.

Computation is only the beginning. ARI is architecturally designed to grow beyond software — into robotics, sensors, and laboratory equipment. This is not yet implemented, but the plugin architecture exists precisely for this purpose.

Standing on the shoulders of giants.

ARI was built on lessons from these pioneering works.

🧪

The AI Scientist v2

Sakana AI's fully autonomous scientific research system — from idea generation to peer-review-ready papers. A foundational reference for ARI's end-to-end pipeline design.

arxiv.org/abs/2504.08066 →

HPC-AutoResearch

An LLM-driven framework for automated HPC performance optimization. Demonstrated autonomous compiler flag and thread-count tuning on real supercomputer workloads — a direct predecessor to ARI's search engine.

researchgate.net/publication/403797672 →
🧬

VirSci

A multi-agent scientific deliberation system. Multiple AI personas with different research backgrounds debate a hypothesis, producing richer and more diverse research ideas than single-agent generation. Integrated into ARI's idea generation stage.

arxiv.org/abs/2410.09403 →
📋

PaperBench

OpenAI's benchmark for evaluating whether AI agents can reproduce frontier ML papers from scratch. Each paper is decomposed into a fine-grained TaskNode rubric scored by an LLM judge ("SimpleJudge"). v0.7.0 vendors PaperBench under ari-skill-paper-re/vendor/paperbench as the deterministic core of ARI's reproducibility check (ORS Phase 2). v0.7.2 adds paper-audit mode: the same rubric machinery inverted to audit whether a paper itself describes enough to be reproducible — venue-conditioned templates (sc / neurips / nature), multimodal figure inspection, and a paper_audit prompt patch that breaks the structural ceiling on Result Analysis leaves. v0.8.0 ships the 3-stage bridge contract (rollout_submission / reproduce_submission / judge_submission) honouring vendor protocol with container_image end-to-end plumbing, fail-loud sandbox/GPU preconditions, salvage retries, executed-submission tarballs, code_only Stage 1↔3 consistency, agent.env injection, host-FS source-removal guard, and env-truth guardrails (probe-before-scaffold, language-choice counter-prime, host-truthful ADDITIONAL NOTES with auto-detected binaries / GPU / network / Phase-2 isolation).

arxiv.org/abs/2504.01848 →
🧠

MemGPT

Hierarchical memory + tool-mediated paging that lets an LLM agent operate as if it had unbounded context. The MemGPT paper became Letta, which v0.6.0 adopted as ARI's memory backend — replacing the v0.5.x JSONL store with ancestor-scoped archival memory and a portable per-checkpoint snapshot.

arxiv.org/abs/2310.08560 →

One file. Full pipeline.

Write a short description of what you want to optimize. ARI takes it from there — automatically, without any human intervention.

📝

Describe

Write your research goal in plain Markdown. Even 3 lines is enough.

🔍

Survey

ARI searches arXiv and Semantic Scholar, then VirSci multi-agent deliberation — multiple AI personas with different research backgrounds — debates the results to produce diverse, well-grounded ideas.

⚗️

Experiment

Best-First Tree Search (BFTS) with five node types — DRAFT, IMPROVE, DEBUG, ABLATION, VALIDATION — explores the hypothesis space in parallel. The LLM selects which frontier node to expand next based on peer-review scores, not heuristics. Failed nodes generate debug children, not retries. All generated code is captured per node.

📄

Write

A complete academic paper with figures and citations is generated automatically.

🔬

Analyze

ARI traverses the full experiment tree — root, improvements, ablations, and validation runs — and uses an LLM to extract methodology, setup, and key findings directly from raw artifacts. Nothing is hardcoded.

Verify

A separate AI agent independently reconstructs and re-runs the experiment from the paper text alone — no original code shared.

🖥️

Web Dashboard — Real-time Control

A 10-page React/TypeScript SPA dashboard (built with Vite) with an Overleaf-like LaTeX editor, React Flow visual workflow editor (BFTS / Paper / Reproduce phase toggles per skill), D3 experiment tree, VLM figure review loop, rubric-driven paper review with ensemble + Area Chair meta-review and few-shot example manager, container runtime management, Letta-backed memory admin, recursive sub-experiments, and a 4-step experiment wizard — all from your browser. Component-based architecture with separate modules for each page. All output is isolated per project.

See the live walkthrough in the Demo section ↑

🏠 Home Dashboard 📋 Experiments 📡 Monitor + D3 Tree 🌳 Tree + Code Viewer 📊 Results + LaTeX Editor ✨ Experiment Wizard 💡 VirSci Ideas 🔄 React Flow Workflow ⚙️ Settings + Container 🔬 Sub-Experiments
ari viz ./checkpoints/my_run/ --port 8765

Works everywhere. For everyone.

ARI is not built for one environment or one type of user. It scales across five dimensions.

💻

Any Computer

Environment profiles (laptop / hpc / cloud) auto-detect your scheduler (SLURM, PBS, LSF, SGE, Kubernetes) and configure parallelism, memory, and container runtime automatically.

Laptop → Supercomputer
🧠

Any AI Model

Free local models (Ollama) or commercial APIs (Claude, GPT, Gemini) via litellm. Per-experiment model selection in the dashboard wizard — Ollama models support free-form name entry.

Free Local → Cloud API
📋

Any Expertise Level

Write 3 lines as a beginner, or 200 lines with precise technical controls as an expert.

Novice → Expert
🌍

Any Domain

Add new capabilities as plug-in modules. No changes to the core system needed.

Software → Physical World
📐

Any Format

Papers can target arXiv, NeurIPS, ICPP, SC, ISC, ACM, or any custom venue template.

Preprint → Top Conference

Beyond software.

The current version automates digital computation experiments. Physical world integration is on the roadmap — not yet implemented.

✅ Available Now

⚙️ Computational Research

Compiler tuning, algorithm benchmarking, ML hyperparameters, systems performance.

🔜 Coming Soon

🦾 Robotics

Robot arm trajectories, motion planning parameters, control system tuning via ROS2.

📅 Planned

🧪 Laboratory Automation (roadmap)

Liquid handlers, plate readers, reaction conditions — planned, not yet implemented.

🌟 Vision

🌍 Any Physical System (roadmap)

If it has a goal and a parameter space, ARI can explore it. The same system, infinite domains.

Running in minutes.

# 1. Clone the repository
git clone https://github.com/kotama7/ARI
cd ARI

# 2. Install everything
bash setup.sh

# 3. Choose your AI model
# Free option (runs offline):
ollama pull qwen3:8b

# Or use Claude / OpenAI:
export ARI_BACKEND=claude
export ANTHROPIC_API_KEY=sk-...

# 4. Run your first experiment
ari run experiment.md

# 5. Launch the web dashboard
ari viz ./checkpoints/my_run/  # → http://localhost:8765

# Other useful commands
ari projects          # list all runs
ari resume ./ckpt/    # resume interrupted run
ari settings          # view/modify config

The simplest experiment file looks like this:

# experiment.md — free-form text is fine
# Headings are optional; ARI reads any format.

Maximize performance of matrix
multiplication on this machine.

# Or use structured Markdown (optional):
## Research Goal
Maximize the target metric for
my experiment on this machine.

## Evaluation Metric
Primary score

## Constraints
- Describe your environment here

ARI will survey related papers, run experiments, generate figures, write a complete paper, and verify reproducibility independently — automatically. A unique LLM-generated title is assigned to each experiment project.

Full Setup Guide 📚 PaperBench Quickstart (v0.8.0) 📄 Sample Paper (PDF) View on GitHub →

Built together.

ARI is open source and welcomes contributions of all kinds — new skills, pipeline ideas, bug reports, and more.

🔌

Add a Skill

Have a domain ARI doesn't cover yet? Wrap it as an MCP skill and submit a PR. Any benchmark, any tool.

🔄

Propose a Flow

Have an idea for a new research pipeline or workflow step? Open a discussion and shape ARI's direction.

🐛

Report & Fix

Found a bug or a rough edge? Issues and PRs are always welcome. No contribution is too small.

View on GitHub → Join the Discussion → 💬 Discord →