ARI Documentation
Technical reference for developers and researchers.
Overview
ARI automates the full research cycle: hypothesis generation → experiment execution → paper writing → reproducibility verification. The system is built on three layers:
idea-skill, transform-skill, plot-skill, paper-skill, paper-re-skill, web-skill (partial).Installation
- Clone the repository
git clone https://github.com/kotama7/ARI && cd ari - Run setup
bash setup.sh - Install LaTeX
# With conda (no sudo needed) conda install -c conda-forge texlive-core # Or system package sudo apt install texlive-full # Debian/Ubuntu sudo dnf install texlive # RHEL/CentOS - Set LLM backend — see LLM Configuration
First Run
# Minimal experiment file
cat > experiment.md << 'EOF'
## Research Goal
Maximize the target metric for my experiment on this machine.
<!-- metric_keyword: score -->
EOF
# Run
ari run experiment.md
# With custom config
ari run experiment.md --config ari-core/config/workflow.yaml
# On a SLURM cluster
sbatch your_pipeline_job.sh
Output files appear in the checkpoint directory:
| File | Description |
|---|---|
nodes_tree.json | BFTS search tree (all explored configurations) |
science_data.json | Science-facing data (no internal terms) |
related_refs.json | arXiv references |
figures_manifest.json | Generated figure paths and captions |
full_paper.tex / .pdf | Generated paper |
review_report.json | Automated review score and feedback |
reproducibility_report.json | Reproducibility verification result |
Architecture
BFTS — Best-First Tree Search
ARI uses Best-First Tree Search to explore the hypothesis space. The LLM selects the most promising node to expand next, guided by real measurement data. Controlled via ari-core/config/default.yaml:
bfts:
max_total_nodes: 50 # maximum nodes to explore
max_depth: 5 # tree depth limit
max_parallel_nodes: 4 # concurrent experiments
score_threshold: 0.3 # minimum score to expand
ReAct Loop
Each node runs: Reason → Act (tool call) → Observe → Reason... until a JSON result is produced. The agent automatically polls async HPC jobs without consuming step budget.
Post-BFTS Pipeline
After BFTS completes, workflow.yaml drives a sequential pipeline. Stages are idempotent — re-runs skip already-completed stages.
Experiment Files
Experiment files are Markdown. No code changes needed — all domain knowledge lives here.
Minimal (3 lines)
## Research Goal
Maximize the target metric for my experiment on this machine.
<!-- metric_keyword: score -->
Full Reference
# Experiment Title
## Research Goal
Describe the optimization objective in plain language.
The LLM reads this to generate hypotheses.
## Required Workflow
1. Call `survey` to find related literature
2. Call `slurm_submit` with a SLURM script
3. Call `job_status` to wait for completion
4. Call `run_bash` to read the output file
5. Return JSON with measured values
## Hardware Limits
- Max CPUs: 64
- Compiler: gcc only
## SLURM Script Template
```bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=32
#SBATCH --time=00:30:00
python run_experiment.py
```
## Rules
- HARD LIMIT: never exceed 64 threads
- Always use absolute paths in slurm_submit
<!-- metric_keyword: score -->
<!-- min_expected_metric: 100 -->
| Section | Required | Purpose |
|---|---|---|
## Research Goal | ✔ | Drives LLM hypothesis generation |
## Required Workflow | Sets tool execution sequence | |
## Hardware Limits | Hard constraints injected at every step | |
## SLURM Script Template | Starting point for LLM modifications | |
## Rules | Agent constraints and invariants | |
<!-- metric_keyword --> | ✔ | Metric name for extraction |
<!-- min_expected_metric --> | Minimum acceptable value |
workflow.yaml
The single configuration file for the post-BFTS pipeline. Adding or reordering stages requires only YAML changes — no code changes.
version: '1'
slurm_partition: your_partition
author_name: "Your Name or Organization"
skills:
- name: paper-skill
path: '{{ari_root}}/ari-skill-paper'
description: LaTeX paper writing
pipeline:
- stage: transform_data # NEW: strips BFTS internals
skill: transform-skill
tool: nodes_to_science_data
inputs:
nodes_json_path: '{{ckpt}}/nodes_tree.json'
outputs:
file: '{{ckpt}}/science_data.json'
skip_if_exists: '{{ckpt}}/science_data.json'
- stage: generate_figures
skill: plot-skill
tool: generate_figures_llm
depends_on: [transform_data]
inputs:
science_data_path: '{{stages.transform_data.outputs.file}}'
output_dir: '{{ckpt}}'
outputs:
file: '{{ckpt}}/figures_manifest.json'
- stage: write_paper
skill: paper-skill
tool: write_paper_iterative
depends_on: [generate_figures, search_related_work]
inputs:
refs_json: '{{stages.search_related_work.outputs.file}}'
figures_manifest_json: '{{ckpt}}/figures_manifest.json'
venue: arxiv
outputs:
file: '{{ckpt}}/full_paper.tex'
Template Variables
| Variable | Value |
|---|---|
{{ckpt}} | Checkpoint directory (absolute path) |
{{ari_root}} | ARI project root |
{{paper_context}} | Science-facing experiment summary |
{{stages.NAME.outputs.file}} | Primary output of stage NAME |
{{author_name}} |
Skills Overview
| Skill | Tools | Type |
|---|---|---|
ari-skill-hpc | slurm_submit, job_status, job_cancel, run_bash, singularity_* | Deterministic |
ari-skill-evaluator | make_metric_spec | Deterministic △ |
ari-skill-idea | survey, generate_ideas | LLM ✷ |
ari-skill-web | web_search, fetch_url, search_arxiv, search_semantic_scholar, collect_references_iterative | Partial LLM △ |
ari-skill-memory | add_memory, search_memory, get_node_memory, clear_node_memory | Deterministic |
ari-skill-transform | nodes_to_science_data | LLM ✷ |
ari-skill-plot | generate_figures, generate_figures_llm | LLM ✷ |
ari-skill-paper | write_paper_iterative, review_compiled_paper, generate_section, ... | LLM ✷ |
ari-skill-paper-re | reproduce_from_paper, extract_metric_from_output | LLM ✷ |
ari-skill-coding | write_code, run_code, run_bash | Deterministic |
ari-skill-benchmark | analyze_results, plot, statistical_test | Deterministic |
ari-skill-review | parse_review, generate_rebuttal, check_rebuttal | LLM ✷ |
ari-skill-vlm | review_figure, review_table | LLM ✷ |
ari-skill-orchestrator | run_experiment, get_status, list_runs, get_paper | Deterministic |
✷ LLM-using tools are explicitly annotated. △ = LLM in some tools only. 14 skills total (9 default, 5 additional).
Adding a Skill
- Create the skill directory
ari-skill-yourskill/ ├── src/server.py ├── tests/test_server.py └── pyproject.toml - Implement the server
from mcp.server.fastmcp import FastMCP mcp = FastMCP("yourskill") @mcp.tool() def your_tool(param: str) -> dict: """Clear description for the LLM.""" result = pure_computation(param) # no LLM calls here return {"result": result} if __name__ == "__main__": mcp.run() - Register in workflow.yaml
skills: - name: yourskill path: '{{ari_root}}/ari-skill-yourskill' - Add a pipeline stage
pipeline: - stage: your_stage skill: yourskill tool: your_tool inputs: param: '{{paper_context}}' outputs: file: '{{ckpt}}/your_output.json'
LLM Configuration
# OpenAI
export ARI_LLM_MODEL=openai/gpt-4o
export OPENAI_API_KEY=sk-...
# Anthropic
export ARI_LLM_MODEL=anthropic/claude-sonnet-4-5
export ANTHROPIC_API_KEY=sk-ant-...
# Local Ollama (free, no API key)
export ARI_LLM_MODEL=qwen3:32b
export LLM_API_BASE=http://127.0.0.1:11434
# Any OpenAI-compatible API (vLLM, LM Studio, etc.)
export ARI_LLM_MODEL=your-model-name
export LLM_API_BASE=http://your-server:8000/v1
openai/gpt-5.2, not just gpt-5.2.HPC / Execution Backend
ARI uses a pluggable executor model. Set ARI_EXECUTOR to match your environment — no code changes needed.
# Environment variables
export ARI_EXECUTOR=slurm # local | slurm | pbs | lsf
export ARI_SLURM_PARTITION=your_partition # SLURM only
# workflow.yaml (reproducibility stage)
inputs:
cpus: 64
timeout_minutes: 15
tolerance_pct: 5.0
For experiments (BFTS), configure in default.yaml:
hpc:
mode: slurm # or "local" for laptop
scheduler: slurm
max_nodes: 4
max_walltime: "04:00:00"
To run without a cluster, set ARI_EXECUTOR=local. ARI will execute experiments as local subprocesses.
Experiment Monitor (ari viz)
ARI ships a real-time experiment tree visualiser. It shows every BFTS node, its status, metrics, and the full tool-call trace — all in a browser.
▶ Live walkthrough of the ARI dashboard.
Starting the monitor
ari viz --checkpoint <ckpt_dir> --port 9878
Open http://localhost:9878 in any browser. The dashboard polls /state and reconnects over WebSocket automatically.
Node detail panel
Click any node circle to open the four-tab detail panel:
- Overview — ID, status, type, execution time, parent, metrics, evaluation summary.
- Trace — Ordered list of every MCP tool call the agent made (name · step · result snippet). Fetched live from
/memory/<node_id>. - Code — Generated source files stored in node artifacts.
- Output — SLURM stdout / benchmark results stored in node artifacts.
Status indicators
- Green circle —
success - Red circle —
failed - Blue circle —
running - Grey circle —
pending
Architecture
The viz server (ari/viz/server.py) is a pure-stdlib asyncio HTTP + WebSocket handler — no external dependencies beyond the websockets package already installed by ARI. The dashboard is a React/TypeScript SPA built with Vite (ari/viz/frontend/), with modular components for each page (Home, Experiments, Monitor, Tree, Results, Wizard, Idea, Workflow, Settings). The production build is output to ari/viz/static/dist/ and served by the Python server.
Agent Memory
ARI maintains a persistent memory store (~/.ari/memory_store.jsonl) shared across all experiments. Each entry contains:
node_id— which node wrote the entryancestor_ids— the full parent chain, used for scoped recalltext— human-readable content (used for keyword search)metadata— tool name, step number, timestamps
When a node completes successfully, ARI automatically writes a RESULT SUMMARY entry containing the node label and all metric values. Child nodes can then retrieve parent results via search_memory using plain metric keywords.
The Trace tab in the Experiment Monitor reads these entries live via GET /memory/<node_id>.