ARI Documentation

Technical reference for developers and researchers.

Overview

ARI automates the full research cycle: hypothesis generation → experiment execution → paper writing → reproducibility verification. The system is built on three layers:

┌────────────────────────────────────────────┐ │ experiment.md / CLI │ └──────────────────┬─────────────────────────┘ │ ┌──────────────────▼─────────────────────────┐ │ ari-core │ │ BFTS Engine → ReAct Loop → Pipeline │ └──────────────────┬─────────────────────────┘ │ MCP protocol ┌─────────────┼─────────────────┐ │ │ │ Skills Skills Skills (deterministic) (deterministic) (LLM-annotated) hpc, web, idea, memory, paper, plot, evaluator transform paper-re
Key principle: MCP skills are deterministic where possible. LLM-using tools are explicitly annotated. Default skills using LLM: idea-skill, transform-skill, plot-skill, paper-skill, paper-re-skill, web-skill (partial).

Installation

  1. Clone the repository
    git clone https://github.com/kotama7/ARI && cd ari
  2. Run setup
    bash setup.sh
  3. Install LaTeX
    # With conda (no sudo needed)
    conda install -c conda-forge texlive-core
    
    # Or system package
    sudo apt install texlive-full        # Debian/Ubuntu
    sudo dnf install texlive             # RHEL/CentOS
  4. Set LLM backendsee LLM Configuration

First Run

# Minimal experiment file
cat > experiment.md << 'EOF'
## Research Goal
Maximize the target metric for my experiment on this machine.
<!-- metric_keyword: score -->
EOF

# Run
ari run experiment.md

# With custom config
ari run experiment.md --config ari-core/config/workflow.yaml

# On a SLURM cluster
sbatch your_pipeline_job.sh

Output files appear in the checkpoint directory:

Want to see what ARI produces? Download the sample paper (PDF) generated by an actual ARI run — figures, citations, and reproducibility verification included.
FileDescription
nodes_tree.jsonBFTS search tree (all explored configurations)
science_data.jsonScience-facing data (no internal terms)
related_refs.jsonarXiv references
figures_manifest.jsonGenerated figure paths and captions
full_paper.tex / .pdfGenerated paper
review_report.jsonAutomated review score and feedback
reproducibility_report.jsonReproducibility verification result

Architecture

BFTS — Best-First Tree Search

ARI uses Best-First Tree Search to explore the hypothesis space. The LLM selects the most promising node to expand next, guided by real measurement data. Controlled via ari-core/config/default.yaml:

bfts:
  max_total_nodes: 50      # maximum nodes to explore
  max_depth: 5             # tree depth limit
  max_parallel_nodes: 4    # concurrent experiments
  score_threshold: 0.3     # minimum score to expand

ReAct Loop

Each node runs: Reason → Act (tool call) → Observe → Reason... until a JSON result is produced. The agent automatically polls async HPC jobs without consuming step budget.

Post-BFTS Pipeline

After BFTS completes, workflow.yaml drives a sequential pipeline. Stages are idempotent — re-runs skip already-completed stages.

Experiment Files

Experiment files are Markdown. No code changes needed — all domain knowledge lives here.

Minimal (3 lines)

## Research Goal
Maximize the target metric for my experiment on this machine.
<!-- metric_keyword: score -->

Full Reference

# Experiment Title

## Research Goal
Describe the optimization objective in plain language.
The LLM reads this to generate hypotheses.

## Required Workflow
1. Call `survey` to find related literature
2. Call `slurm_submit` with a SLURM script
3. Call `job_status` to wait for completion
4. Call `run_bash` to read the output file
5. Return JSON with measured values

## Hardware Limits
- Max CPUs: 64
- Compiler: gcc only

## SLURM Script Template
```bash
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=32
#SBATCH --time=00:30:00
python run_experiment.py

```

## Rules
- HARD LIMIT: never exceed 64 threads
- Always use absolute paths in slurm_submit

<!-- metric_keyword: score -->
<!-- min_expected_metric: 100 -->
SectionRequiredPurpose
## Research GoalDrives LLM hypothesis generation
## Required WorkflowSets tool execution sequence
## Hardware LimitsHard constraints injected at every step
## SLURM Script TemplateStarting point for LLM modifications
## RulesAgent constraints and invariants
<!-- metric_keyword -->Metric name for extraction
<!-- min_expected_metric -->Minimum acceptable value

workflow.yaml

The single configuration file for the post-BFTS pipeline. Adding or reordering stages requires only YAML changes — no code changes.

version: '1'
slurm_partition: your_partition
author_name: "Your Name or Organization"

skills:
  - name: paper-skill
    path: '{{ari_root}}/ari-skill-paper'
    description: LaTeX paper writing

pipeline:
  - stage: transform_data           # NEW: strips BFTS internals
    skill: transform-skill
    tool: nodes_to_science_data
    inputs:
      nodes_json_path: '{{ckpt}}/nodes_tree.json'
    outputs:
      file: '{{ckpt}}/science_data.json'
    skip_if_exists: '{{ckpt}}/science_data.json'

  - stage: generate_figures
    skill: plot-skill
    tool: generate_figures_llm
    depends_on: [transform_data]
    inputs:
      science_data_path: '{{stages.transform_data.outputs.file}}'
      output_dir: '{{ckpt}}'
    outputs:
      file: '{{ckpt}}/figures_manifest.json'

  - stage: write_paper
    skill: paper-skill
    tool: write_paper_iterative
    depends_on: [generate_figures, search_related_work]
    inputs:
      refs_json: '{{stages.search_related_work.outputs.file}}'
      figures_manifest_json: '{{ckpt}}/figures_manifest.json'
      venue: arxiv
    outputs:
      file: '{{ckpt}}/full_paper.tex'

Template Variables

VariableValue
{{ckpt}}Checkpoint directory (absolute path)
{{ari_root}}ARI project root
{{paper_context}}Science-facing experiment summary
{{stages.NAME.outputs.file}}Primary output of stage NAME
{{author_name}}Top-level field from workflow.yaml

Skills Overview

SkillToolsType
ari-skill-hpcslurm_submit, job_status, job_cancel, run_bash, singularity_*Deterministic
ari-skill-evaluatormake_metric_specDeterministic △
ari-skill-ideasurvey, generate_ideasLLM ✷
ari-skill-webweb_search, fetch_url, search_arxiv, search_semantic_scholar, collect_references_iterativePartial LLM △
ari-skill-memoryadd_memory, search_memory, get_node_memory, clear_node_memoryDeterministic
ari-skill-transformnodes_to_science_dataLLM ✷
ari-skill-plotgenerate_figures, generate_figures_llmLLM ✷
ari-skill-paperwrite_paper_iterative, review_compiled_paper, generate_section, ...LLM ✷
ari-skill-paper-rereproduce_from_paper, extract_metric_from_outputLLM ✷
ari-skill-codingwrite_code, run_code, run_bashDeterministic
ari-skill-benchmarkanalyze_results, plot, statistical_testDeterministic
ari-skill-reviewparse_review, generate_rebuttal, check_rebuttalLLM ✷
ari-skill-vlmreview_figure, review_tableLLM ✷
ari-skill-orchestratorrun_experiment, get_status, list_runs, get_paperDeterministic

✷ LLM-using tools are explicitly annotated. △ = LLM in some tools only. 14 skills total (9 default, 5 additional).

Adding a Skill

  1. Create the skill directory
    ari-skill-yourskill/
    ├── src/server.py
    ├── tests/test_server.py
    └── pyproject.toml
  2. Implement the server
    from mcp.server.fastmcp import FastMCP
    mcp = FastMCP("yourskill")
    
    @mcp.tool()
    def your_tool(param: str) -> dict:
        """Clear description for the LLM."""
        result = pure_computation(param)   # no LLM calls here
        return {"result": result}
    
    if __name__ == "__main__":
        mcp.run()
  3. Register in workflow.yaml
    skills:
      - name: yourskill
        path: '{{ari_root}}/ari-skill-yourskill'
  4. Add a pipeline stage
    pipeline:
      - stage: your_stage
        skill: yourskill
        tool: your_tool
        inputs:
          param: '{{paper_context}}'
        outputs:
          file: '{{ckpt}}/your_output.json'

LLM Configuration

# OpenAI
export ARI_LLM_MODEL=openai/gpt-4o
export OPENAI_API_KEY=sk-...

# Anthropic
export ARI_LLM_MODEL=anthropic/claude-sonnet-4-5
export ANTHROPIC_API_KEY=sk-ant-...

# Local Ollama (free, no API key)
export ARI_LLM_MODEL=qwen3:32b
export LLM_API_BASE=http://127.0.0.1:11434

# Any OpenAI-compatible API (vLLM, LM Studio, etc.)
export ARI_LLM_MODEL=your-model-name
export LLM_API_BASE=http://your-server:8000/v1
Note: New models not in litellm's known list require an explicit provider prefix: openai/gpt-5.2, not just gpt-5.2.

HPC / Execution Backend

ARI uses a pluggable executor model. Set ARI_EXECUTOR to match your environment — no code changes needed.

# Environment variables
export ARI_EXECUTOR=slurm    # local | slurm | pbs | lsf
export ARI_SLURM_PARTITION=your_partition  # SLURM only

# workflow.yaml (reproducibility stage)
inputs:
  cpus: 64
  timeout_minutes: 15
  tolerance_pct: 5.0

For experiments (BFTS), configure in default.yaml:

hpc:
  mode: slurm          # or "local" for laptop
  scheduler: slurm
  max_nodes: 4
  max_walltime: "04:00:00"

To run without a cluster, set ARI_EXECUTOR=local. ARI will execute experiments as local subprocesses.

Experiment Monitor (ari viz)

ARI ships a real-time experiment tree visualiser. It shows every BFTS node, its status, metrics, and the full tool-call trace — all in a browser.

▶ Live walkthrough of the ARI dashboard.

Starting the monitor

ari viz --checkpoint <ckpt_dir> --port 9878

Open http://localhost:9878 in any browser. The dashboard polls /state and reconnects over WebSocket automatically.

Node detail panel

Click any node circle to open the four-tab detail panel:

Status indicators

The footer shows total node count, per-status counts, and the best metric value seen so far.

Architecture

The viz server (ari/viz/server.py) is a pure-stdlib asyncio HTTP + WebSocket handler — no external dependencies beyond the websockets package already installed by ARI. The dashboard is a React/TypeScript SPA built with Vite (ari/viz/frontend/), with modular components for each page (Home, Experiments, Monitor, Tree, Results, Wizard, Idea, Workflow, Settings). The production build is output to ari/viz/static/dist/ and served by the Python server.

Agent Memory

ARI maintains a persistent memory store (~/.ari/memory_store.jsonl) shared across all experiments. Each entry contains:

When a node completes successfully, ARI automatically writes a RESULT SUMMARY entry containing the node label and all metric values. Child nodes can then retrieve parent results via search_memory using plain metric keywords.

The Trace tab in the Experiment Monitor reads these entries live via GET /memory/<node_id>.