CLI Documentation

Installation

Prerequisites

Install uv first: brew install uv

Install the DiffUSE CLI with a single command. Choose HTTPS or SSH based on your setup:

Option 1: HTTPS (requires gh CLI)

uv tool install git+https://github.com/diff-use/webapp.git#subdirectory=cli

Uses GitHub's HTTPS protocol. Requires the GitHub CLI (gh auth login) for authentication to private repositories.

Option 2: SSH (requires SSH key)

uv tool install "diffuse-hub[cli]"

Installs from PyPI. Also available via pipx install "diffuse-hub[cli]" or pip install "diffuse-hub[cli]".

Verify installation:

# Show help (both -h and --help work for all commands)
diffuse -h
diffuse list -h

Upgrade to latest version:

uv tool upgrade diffuse-hub

The CLI automatically checks for updates once per day and notifies you when a new version is available.

Run without installing

For one-off invocations, use uvx --from "diffuse-hub[cli]" diffuse <command>. This runs the CLI in an ephemeral environment without touching your global tool install.

Authentication

Authenticate with GitHub to access private experiments and perform write operations:

Command	Description
`diffuse auth login`	Login via GitHub device flow
`diffuse auth status`	Check authentication and API health
`diffuse auth logout`	Clear stored credentials

Example: Login with GitHub

$ diffuse auth login
============================================================
Visit: https://github.com/login/device
Enter code: ABCD-1234
============================================================
Waiting for authorization (expires in 900s)...
✓ Authentication successful!

# Get your token from ~/.diffuse/config.json
TOKEN=$(cat ~/.diffuse/config.json | jq -r '.github_token')

# Check authentication status
curl -H "Authorization: Bearer $TOKEN" \
  https://hub.diffuse.science/api/whoami

API Keys

Manage personal API keys for programmatic access and CI/automation workflows:

Command	Description
`diffuse keys list`	List your personal API keys
`diffuse keys create NAME`	Create a new personal API key (`--description` to add a purpose, `--save` to store locally, `--expires-in-days N` for auto-expiry)
`diffuse keys revoke KEY_ID`	Revoke an API key
`diffuse keys set KEY`	Store an API key in local config
`diffuse keys clear`	Remove stored API key from local config

Example: Create and store an API key

# Create a key and save it to ~/.diffuse/config.json
$ diffuse keys create "ci-runner" --save
✓ API key created: dfk_xxxxxxxxxxxxxxxxxxxx
✓ Key saved to ~/.diffuse/config.json

# Use the DIFFUSE_API_KEY environment variable for CI/automation
export DIFFUSE_API_KEY=dfk_xxxxxxxxxxxxxxxxxxxx
diffuse list

Auth priority: DIFFUSE_API_KEY env var > api_key in config > github_token in config. Set DIFFUSE_API_KEY in CI to avoid committing credentials.

API_KEY=dfk_xxxxxxxxxxxxxxxxxxxx

# List personal keys
curl -H "Authorization: Bearer $API_KEY" \
  https://hub.diffuse.science/api/api-keys/personal

# Create a personal key
curl -X POST -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "ci-runner"}' \
  https://hub.diffuse.science/api/api-keys/personal

# Revoke a key
curl -X DELETE -H "Authorization: Bearer $API_KEY" \
  https://hub.diffuse.science/api/api-keys/personal/{key_id}

Experiments

Create, view, edit, and manage experiments from the command line:

Command	Description
`diffuse list`	List all experiments (tree format by default)
`diffuse view <id>`	View experiment details
`diffuse types`	List available experiment types
`diffuse create`	Create a new experiment
`diffuse edit <id>`	Edit an experiment
`diffuse delete <id>`	Delete an experiment
`diffuse publish <id>`	Publish to public catalog
`diffuse unpublish <id>`	Unpublish from catalog

CLI:

# List all experiments (tree format, interactive pagination)
diffuse list

# Query syntax (field:value, AND/OR/NOT, comparisons, wildcards)
diffuse list -q "tag:crystal AND type:mx"
diffuse list -q "author:smith OR author:jones"
diffuse list -q "created:>2024-01-01 AND public:true"
diffuse list -q "beamline:*"  # All experiments with beamline set

# Text search (searches title, summary, tags)
diffuse list -q "lysozyme"

# Output formats: tree (default), table, json, yaml
diffuse list -f table
diffuse list -f json

# Pagination: use --page/-p to get a specific page
diffuse list --page 1 --page-size 10
diffuse list -p 2 -n 20

# Sort experiments
diffuse list -s recent
diffuse list -s title

# Combine query and sorting
diffuse list -q "tag:processed AND type:mx" -s title

curl https://hub.diffuse.science/api/experiments

CLI:

# Interactive prompt
diffuse create

# With flags
diffuse create \
  --title "ResNet Training Run" \
  --summary "Training ResNet-50 on ImageNet" \
  --tags "vision,resnet,imagenet"

# With experiment type (see 'diffuse types' for available types)
diffuse create \
  -t "My Protocol" \
  --type protocol

# With markdown file
diffuse create \
  --title "Experiment Log" \
  --markdown ./experiment.md

curl -X POST https://hub.diffuse.science/api/experiments \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "ResNet Training Run",
    "summary": "Training ResNet-50 on ImageNet",
    "tags": ["vision", "resnet", "imagenet"]
  }'

CLI:

# View experiment details (tree format by default)
diffuse view EXP-123

# Output formats: tree (default), table, json, yaml
diffuse view EXP-123 -f table
diffuse view EXP-123 -f json

curl https://hub.diffuse.science/api/experiments/EXP-123

CLI:

# Update title or summary
diffuse edit EXP-123 --title "Updated Title"
diffuse edit EXP-123 --summary "New summary"

# Update tags
diffuse edit EXP-123 --tags "new,tags,here"

# Open markdown in $EDITOR
diffuse edit EXP-123 --editor

# Update from markdown file
diffuse edit EXP-123 --markdown ./updated.md

curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123 \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Updated Title",
    "summary": "New summary"
  }'

CLI:

# Publish experiment
diffuse publish EXP-123

# Unpublish experiment
diffuse unpublish EXP-123

# Publish
curl -X POST https://hub.diffuse.science/api/experiments/EXP-123/publish \
  -H "Authorization: Bearer $TOKEN"

# Unpublish
curl -X POST https://hub.diffuse.science/api/experiments/EXP-123/unpublish \
  -H "Authorization: Bearer $TOKEN"

Search & Filtering

Find experiments by text, tags, metadata fields, and more. All filtering happens server-side for efficiency.

Text Search

Title and summary

Parameter	Description	Example
`q`	Query string with search syntax	`?q=tag:crystal AND type:mx`
`search`	Legacy Text search	`?search=resnet`
`tag`	Legacy Filter by tag	`?tag=vision`
`sort`	Sort field	`?sort=title`
`order`	Sort direction	`?order=asc`
`limit`	Max results (1-200)	`?limit=20`
`offset`	Skip N results	`?offset=20`

Metadata

# Set individual metadata field
diffuse metadata set EXP-123 learning_rate 0.001

# Apply metadata from JSON file
echo '{"learning_rate": "0.001", "batch_size": "32"}' > metadata.json
diffuse metadata apply EXP-123 -f metadata.json

# Get metadata
diffuse metadata get EXP-123

# API: set metadata
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123/metadata \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"metadata": {"learning_rate": "0.001", "batch_size": "32"}}'

Discover Available Filter Values

# Get metadata facets - shows all field values and counts
curl https://hub.diffuse.science/api/metadata/facets

# Response example:
# [
#   {
#     "field_key": "dataset",
#     "field_display_name": "Dataset",
#     "values": {"mnist": 5, "cifar10": 3, "imagenet": 2}
#   },
#   {
#     "field_key": "optimizer",
#     "field_display_name": "Optimizer",
#     "values": {"adam": 8, "sgd": 4}
#   }
# ]

Experiment Relationships

Link experiments together to track dependencies, data reuse, and related work:

Relationship Types

uses: Indicates one experiment uses data or results from another (e.g., "Experiment A uses the dataset from Experiment B")
relates_to: General association between experiments (e.g., related analyses, follow-up studies)

Command	Description
`diffuse relationships list <id>`	List all relationships for an experiment
`diffuse relationships add <from> <to>`	Create a relationship between experiments
`diffuse relationships remove <from> <to>`	Remove a relationship between experiments

CLI:

# List all relationships for an experiment
diffuse relationships list EXP-5

# Output formats: tree (default), table, json, yaml
diffuse relationships list EXP-5 -f table
diffuse relationships list EXP-5 -f json

curl https://hub.diffuse.science/api/experiments/EXP-5/relationships \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Create a "uses" relationship (EXP-10 uses data from EXP-5)
diffuse relationships add EXP-10 EXP-5 --type uses

# Create a "relates_to" relationship (general association)
diffuse relationships add EXP-10 EXP-7 --type relates_to

# Default type is "uses" if not specified
diffuse relationships add EXP-10 EXP-5

curl -X POST https://hub.diffuse.science/api/experiments/EXP-10/relationships \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "from_experiment_id": "EXP-10",
    "to_experiment_id": "EXP-5",
    "relationship_type": "uses"
  }'

CLI:

# Remove a "uses" relationship
diffuse relationships remove EXP-10 EXP-5 --type uses

# Remove a "relates_to" relationship
diffuse relationships remove EXP-10 EXP-7 --type relates_to

curl -X DELETE "https://hub.diffuse.science/api/experiments/EXP-10/relationships?from_experiment_id=EXP-10&to_experiment_id=EXP-5&relationship_type=uses" \
  -H "Authorization: Bearer $TOKEN"

File Uploads & Downloads

Upload artifacts and download experiment data:

Command	Description
`diffuse upload <id> <paths...>`	Upload files or directories to experiment
`diffuse download <id> <name>`	Download an artifact

Upload Examples:

# Upload a single file
diffuse upload EXP-123 ./model.pth

# Upload a directory (recursive, preserves structure)
diffuse upload EXP-123 ./checkpoints/

# Upload multiple files via glob pattern
diffuse upload EXP-123 *.log
diffuse upload EXP-123 results_*.csv

# Upload specific files
diffuse upload EXP-123 model.pth config.yaml metrics.json

# Mix files and directories
diffuse upload EXP-123 ./data/ summary.txt

# Control upload parallelism (default: 32 concurrent streams)
diffuse upload EXP-123 large_file.bin -c 16

Smart Upload Optimizations

Deduplication: Files with matching SHA256 checksums are skipped if already on server
Checksum caching: Local cache (~/.diffuse/checksum_cache.json) avoids recomputing checksums for unchanged files
Resumable uploads: Use --resume to continue interrupted uploads
Parallel streams: 32 concurrent streams by default (tune with -c)
Auto-retry: Failed parts automatically retried with exponential backoff
Mtime preservation: Original file modification times are stored, enabling efficient delta sync without recomputing checksums

Directory Upload Exclusions

Directory uploads automatically exclude: __pycache__, .git, .venv, node_modules, *.pyc, and other common non-essential files.

# 1. Create upload session
curl -X POST https://hub.diffuse.science/api/uploads/session \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_id": "EXP-123",
    "filename": "model.pth",
    "content_type": "application/octet-stream",
    "size_bytes": 102400,
    "metadata_document": {}
  }'

# 2. Upload chunks to presigned URLs (from response above)
# Upload each part to its presigned URL
curl -X PUT "$PRESIGNED_URL_PART_1" \
  --data-binary @model.pth.part1 \
  -H "Content-Type: application/octet-stream"

# Capture ETag from response headers for finalization
# ETag: "abc123..."

# 3. Finalize upload
curl -X POST https://hub.diffuse.science/api/uploads/sess_xyz/finalize \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "sess_xyz",
    "parts": [
      {"part_number": 1, "etag": "abc123"},
      {"part_number": 2, "etag": "def456"}
    ],
    "metadata_document": {}
  }'

Download Example:

# Download artifact
diffuse download EXP-123 model.pth

# Save to specific location
diffuse download EXP-123 model.pth --output ./downloads/model.pth

# Get presigned download URL
curl https://hub.diffuse.science/api/experiments/EXP-123/artifacts/model.pth/download \
  -H "Authorization: Bearer $TOKEN"

# Download file using the presigned URL from response
curl -o model.pth "$PRESIGNED_DOWNLOAD_URL"

Artifact Management

New: M:N Architecture

Artifacts can now be connected to multiple experiments. Upload once, reuse everywhere!

Manage artifacts across experiments with connect/disconnect operations:

Command	Description
`diffuse artifacts list`	List all artifacts (global view)
`diffuse artifacts list --experiment EXP-5`	List artifacts connected to experiment
`diffuse artifacts info <artifact-id>`	Show artifact details and connections
`diffuse artifacts connect EXP-5 <id> [<id>...]`	Connect one or more artifacts to experiment
`diffuse artifacts disconnect EXP-5 <id> [<id>...]`	Disconnect one or more artifacts from experiment
`diffuse artifacts delete <artifact-id>`	Delete artifact (fails if connected)
`diffuse artifacts buckets`	List configured external buckets (OSN, S3)
`diffuse artifacts browse <bucket>`	Browse contents of an external bucket
`diffuse artifacts connect EXP-1 <keys> --bucket <name>`	Connect external artifacts to an experiment

CLI:

# List all artifacts (tree format by default)
diffuse artifacts list

# Output formats: tree (default), table, json, yaml
diffuse artifacts list -f table

# Filter by experiment
diffuse artifacts list --experiment EXP-5

# Show orphaned artifacts (no connections)
diffuse artifacts list --orphaned

# List all artifacts
curl https://hub.diffuse.science/api/artifacts \
  -H "Authorization: Bearer $TOKEN"

# Filter by experiment
curl https://hub.diffuse.science/api/artifacts?experiment_id=EXP-5 \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Show artifact details including all connected experiments
diffuse artifacts info <artifact-id>

# Example output:
# Artifact: model.pth
# Type: internal
# Size: 245.3 MB
# Status: ready
# Connected to 3 experiments:
#   - EXP-5: ./baseline-model.pth
#   - EXP-7: ./improved-model.pth
#   - EXP-12: ./final-model.pth

curl https://hub.diffuse.science/api/artifacts/<artifact-id> \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Connect existing artifact to experiment
diffuse artifacts connect EXP-7 <artifact-id>

# Connect multiple artifacts at once
diffuse artifacts connect EXP-7 artifact-1 artifact-2 artifact-3

# Disconnect artifact from experiment
diffuse artifacts disconnect EXP-7 <artifact-id>

# Disconnect multiple artifacts at once
diffuse artifacts disconnect EXP-7 artifact-1 artifact-2

# Example workflow: Upload to Exp A, connect to Exp B
diffuse upload EXP-5 ./model.pth  # Returns artifact-id: abc123
diffuse artifacts connect EXP-7 abc123

# Connect single artifact
curl -X POST https://hub.diffuse.science/api/experiments/EXP-7/artifacts/connect \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"artifact_id": "abc123"}'

# Batch connect multiple artifacts
curl -X POST https://hub.diffuse.science/api/artifacts/batch-connect \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_id": "EXP-7",
    "artifact_ids": ["artifact-1", "artifact-2", "artifact-3"]
  }'

# Disconnect artifact
curl -X DELETE https://hub.diffuse.science/api/experiments/EXP-7/artifacts/abc123 \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Delete artifact (only works if disconnected from all experiments)
diffuse artifacts delete <artifact-id>

# If still connected, you'll see:
# Error: Cannot delete artifact. Still connected to 2 experiment(s):
#   - EXP-5
#   - EXP-7
# Use 'diffuse artifacts disconnect' first.

# Admin force delete (even if connected)
diffuse artifacts delete <artifact-id> --force

Deletion Policy

Internal artifacts are permanently deleted from S3. External artifacts only remove the database record—files remain in their external bucket.

# Admin force delete
curl -X DELETE https://hub.diffuse.science/api/artifacts/<artifact-id>/force-delete \
  -H "Authorization: Bearer $TOKEN"

Collections

Collections provide logical grouping for artifacts, enabling organization, sharing, and versioning across multiple experiments. Collections support an M:M:M relationship: Experiments ↔ Collections ↔ Artifacts.

Display IDs

Collections use COL-N format (e.g., COL-1, COL-42) for easy citation and reference.

Command	Description
`diffuse collections list`	List all collections
`diffuse collections create "Name"`	Create a new collection
`diffuse collections info COL-5`	Show collection details
`diffuse collections update COL-5`	Update collection metadata
`diffuse collections delete COL-5`	Delete a collection
`diffuse collections add COL-5 <artifact-ids>`	Add artifacts to collection
`diffuse collections remove COL-5 <artifact-ids>`	Remove artifacts from collection
`diffuse collections connect COL-5 EXP-10`	Connect collection to experiment
`diffuse collections disconnect COL-5 EXP-10`	Disconnect from experiment
`diffuse collections snapshot COL-5`	Create immutable snapshot
`diffuse collections snapshots COL-5`	List collection snapshots
`diffuse collections add COL-5 <keys> --bucket <name>`	Add external artifacts to a collection

CLI:

# List all collections (tree format)
diffuse collections list

# Table format
diffuse collections list -f table

# Search by name
diffuse collections list --search "training"

# Filter by tag
diffuse collections list --tag ml --tag 2024

# Include snapshots
diffuse collections list --include-snapshots

# JSON output for scripting
diffuse collections list -f json

# List collections
curl https://hub.diffuse.science/api/collections \
  -H "Authorization: Bearer $TOKEN"

# With search and tag filters
curl "https://hub.diffuse.science/api/collections?search=training&tag=ml" \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Create a new collection
diffuse collections create "Training Data v2" \
  --description "Updated dataset with augmentations" \
  --tag ml --tag 2024

# Update collection metadata
diffuse collections update COL-5 --name "Training Data v2.1"
diffuse collections update COL-5 --description "New description"
diffuse collections update COL-5 --add-tag production
diffuse collections update COL-5 --remove-tag draft

# Delete (blocked if connected to experiments)
diffuse collections delete COL-5

# Force delete (admin only)
diffuse collections delete COL-5 --force

# Create collection
curl -X POST https://hub.diffuse.science/api/collections \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Training Data v2",
    "description": "Updated dataset with augmentations",
    "tags": ["ml", "2024"]
  }'

# Update collection
curl -X PATCH https://hub.diffuse.science/api/collections/COL-5 \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Training Data v2.1"}'

CLI:

# Add artifacts to a collection
diffuse collections add COL-5 <artifact-id>

# Add multiple artifacts at once
diffuse collections add COL-5 artifact-1 artifact-2 artifact-3

# Add with custom path (how it appears in collection)
diffuse collections add COL-5 artifact-1 --path "./custom/path.csv"

# Remove artifact from collection
diffuse collections remove COL-5 <artifact-id>

# Remove multiple artifacts
diffuse collections remove COL-5 artifact-1 artifact-2

Note: Removing an artifact from a collection does NOT delete the artifact—it only removes the membership. The artifact remains available in the artifact library.

# Add artifact
curl -X POST https://hub.diffuse.science/api/collections/COL-5/artifacts \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"artifact_id": "abc123", "artifact_path": "./data.csv"}'

# Batch add
curl -X POST https://hub.diffuse.science/api/collections/COL-5/artifacts/batch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"artifact_ids": ["artifact-1", "artifact-2"]}'

# Remove artifact
curl -X DELETE https://hub.diffuse.science/api/collections/COL-5/artifacts/abc123 \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Connect a collection to an experiment
diffuse collections connect COL-5 EXP-10

# Connect with an alias (context-specific name)
diffuse collections connect COL-5 EXP-10 --alias "Primary Training Data"

# Disconnect collection from experiment
diffuse collections disconnect COL-5 EXP-10

# Example workflow: Multiple experiments sharing one collection
diffuse collections connect COL-5 EXP-10  # Base experiment
diffuse collections connect COL-5 EXP-11  # Ablation study
diffuse collections connect COL-5 EXP-12  # Extended training

Tip: Collections enable artifact reuse without duplication. When 10+ experiments use the same dataset, connect them all to one collection instead of duplicating artifact connections.

# Connect collection to experiment
curl -X POST https://hub.diffuse.science/api/experiments/EXP-10/collections/connect \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"collection_id": "COL-5", "alias": "Primary Training Data"}'

# Disconnect
curl -X DELETE https://hub.diffuse.science/api/experiments/EXP-10/collections/COL-5 \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Create an immutable snapshot
diffuse collections snapshot COL-5 --note "Pre-publication freeze"

# List all snapshots of a collection
diffuse collections snapshots COL-5

# Example workflow: Publication versioning
diffuse collections create "ImageNet Training v1"
diffuse collections add COL-1 artifact-1 artifact-2 artifact-3
# ... later, before publishing ...
diffuse collections snapshot COL-1 --note "Paper submission v1"
# ... after revisions ...
diffuse collections add COL-1 artifact-4
diffuse collections snapshot COL-1 --note "Camera ready version"

Snapshots are Immutable

Once created, snapshots cannot be modified—you cannot add or remove artifacts. This ensures reproducibility for published research and citations.

# Create snapshot
curl -X POST https://hub.diffuse.science/api/collections/COL-5/snapshots \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"note": "Pre-publication freeze"}'

# List snapshots
curl https://hub.diffuse.science/api/collections/COL-5/snapshots \
  -H "Authorization: Bearer $TOKEN"

External Buckets (OSN / S3)

Link by Reference

External artifacts are linked by reference—no data is copied. Files stay in their original bucket (OSN, S3, etc.) and Diffuse tracks the connection.

Browse and attach files from external S3-compatible buckets (OSN, AWS S3, MinIO, etc.) to experiments and collections:

Command	Description
`diffuse artifacts buckets`	List all configured external buckets
`diffuse artifacts browse <bucket>`	Browse files in an external bucket
`diffuse artifacts connect <exp> <keys...> --bucket <name>`	Connect external files to an experiment
`diffuse collections add <col> <keys...> --bucket <name>`	Add external files to a collection

CLI:

# List all configured external buckets
diffuse artifacts buckets

# Table output (default):
#   Name                  Type   Bucket               Public  Endpoint
#   osn-diffuse-chess     osn    diffuse-chess-data   yes     https://sdsc.osn.xsede.org
#   aws-shared-datasets   s3     shared-datasets      no      —

# JSON output for scripting
diffuse artifacts buckets -f json

# Browse the root of a bucket
diffuse artifacts browse osn-diffuse-chess

# Browse a specific prefix (directory)
diffuse artifacts browse osn-diffuse-chess -p data/2024/

# Force refresh (bypass server-side cache)
diffuse artifacts browse osn-diffuse-chess -p data/ --refresh

# JSON output for scripting
diffuse artifacts browse osn-diffuse-chess -f json

# List external buckets
curl https://hub.diffuse.science/api/external-buckets \
  -H "Authorization: Bearer $TOKEN"

# Browse bucket contents
curl "https://hub.diffuse.science/api/external-buckets/osn-diffuse-chess?prefix=data/2024/" \
  -H "Authorization: Bearer $TOKEN"

# Force refresh
curl "https://hub.diffuse.science/api/external-buckets/osn-diffuse-chess?prefix=data/&refresh=true" \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Connect specific files to an experiment
diffuse artifacts connect EXP-1 data/scan001.h5 data/scan002.h5 \
  --bucket osn-diffuse-chess

# Connect all files under a prefix (recursive)
diffuse artifacts connect EXP-1 \
  --bucket osn-diffuse-chess --prefix data/2024/

# Skip confirmation prompt
diffuse artifacts connect EXP-1 data/scan001.h5 \
  --bucket osn-diffuse-chess -y

# Add external files to a collection
diffuse collections add COL-1 data/scan001.h5 data/scan002.h5 \
  --bucket osn-diffuse-chess

# Add all files under a prefix to a collection
diffuse collections add COL-1 \
  --bucket osn-diffuse-chess --prefix data/2024/november/

Tip: Use --prefix to recursively attach entire directories. The CLI resolves all files under the prefix via the browse API before attaching.

# Attach external artifacts to an experiment
curl -X POST https://hub.diffuse.science/api/external-artifacts/batch-attach \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "osn-diffuse-chess",
    "artifacts": [
      {"s3_key": "data/scan001.h5"},
      {"s3_key": "data/scan002.h5"}
    ]
  }'

# Attach external artifacts to a collection
curl -X POST https://hub.diffuse.science/api/collections/COL-1/external-artifacts/batch-attach \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "osn-diffuse-chess",
    "artifacts": [
      {"s3_key": "data/scan001.h5"},
      {"s3_key": "data/scan002.h5"}
    ]
  }'

Typical workflow for linking OSN data to experiments:

# 1. See what buckets are available
diffuse artifacts buckets

# 2. Browse to find the files you need
diffuse artifacts browse osn-diffuse-chess
diffuse artifacts browse osn-diffuse-chess -p experiments/run42/

# 3. Connect specific files to your experiment
diffuse artifacts connect EXP-15 \
  experiments/run42/results.h5 experiments/run42/config.yaml \
  --bucket osn-diffuse-chess

# 4. Or connect an entire directory at once
diffuse artifacts connect EXP-15 \
  --bucket osn-diffuse-chess --prefix experiments/run42/

# 5. Create a collection for a shared dataset
diffuse collections create "Chess OCR Training Set"
diffuse collections add COL-3 \
  --bucket osn-diffuse-chess --prefix training/ocr/v2/

# 6. Connect that collection to multiple experiments
diffuse collections connect COL-3 EXP-15
diffuse collections connect COL-3 EXP-16

Metadata Management

View and manage experiment metadata fields:

Command	Description
`diffuse metadata get <id>`	Get experiment metadata
`diffuse metadata set <id> <key> <value>`	Set a metadata field
`diffuse metadata apply <id> -f <file>`	Apply metadata from file
`diffuse metadata fields`	List available field definitions

Example: Set metadata field

# Set individual field
diffuse metadata set EXP-123 learning_rate 0.001

# Get all metadata
diffuse metadata get EXP-123

# Apply from JSON file
echo '{"learning_rate": 0.001, "batch_size": 32}' > metadata.json
diffuse metadata apply EXP-123 -f metadata.json

# Set metadata
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123/metadata \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "learning_rate": 0.001,
      "batch_size": 32
    }
  }'

# Get metadata
curl https://hub.diffuse.science/api/experiments/EXP-123/metadata

Activity & Audit Log

View recent activity:

# View last 20 activities (tree format by default)
diffuse activity

# Limit results
diffuse activity -n 50

# Output formats: tree (default), table, json, yaml
diffuse activity -f table
diffuse activity -f json

curl -H "Authorization: Bearer $TOKEN" \
  https://hub.diffuse.science/api/activity

Configuration

Manage CLI configuration stored in ~/.diffuse/config.json:

Command	Description
`diffuse config view`	View current configuration
`diffuse config set <key> <value>`	Set configuration value
`diffuse config reset`	Reset configuration
`diffuse config env`	Show current environment
`diffuse config env --list`	List available environments
`diffuse config env <name>`	Switch environment (local/staging/prod)

Environment Configuration

The CLI connects to production (https://app.diffuse.science) by default. Use diffuse config env --list to see all environments.

Switch Environments:

# Show available environments
diffuse config env --list

# Switch to staging
diffuse config env staging

# Switch to local development
diffuse config env local

# Switch back to production
diffuse config env prod

# Check current environment
diffuse config env

Environment Priority (highest to lowest):

--server flag on each command
DIFFUSE_API_URL environment variable (full URL)
DIFFUSE_ENV environment variable (local/staging/prod)
Persistent config file (~/.diffuse/config.json)
Default: Production (https://app.diffuse.science)

Other Configuration Examples:

# View current config
diffuse config view

# Use per-command override
diffuse list --server https://custom.example.com

# Use environment variable (temporary)
export DIFFUSE_API_URL=https://custom.example.com
diffuse list

# Check authentication status (shows current API and GitHub Client ID)
diffuse auth status

Security Note

Your GitHub token is stored in ~/.diffuse/config.json with restricted permissions (0600). Never commit this file to version control.

Metadata Fields

Admin Role Required

Create, update, and delete operations require admin privileges. List and view operations are available to all users.

Define metadata fields that can be associated with experiment types:

Command	Description
`diffuse fields list [--include-inactive]`	List all field definitions
`diffuse fields create -k KEY -n NAME`	Create a new metadata field
`diffuse fields update FIELD_ID`	Update a field's properties
`diffuse fields delete FIELD_ID`	Delete a metadata field

# List all field definitions
diffuse fields list

# Include inactive fields
diffuse fields list --include-inactive

# Create a numeric field with unit
diffuse fields create \
  -k resolution \
  -n "Resolution" \
  --type NUMBER \
  --unit "angstrom" \
  --required

# Create an enum field with allowed values
diffuse fields create \
  -k status \
  -n "Status" \
  --type ENUM \
  --allowed-values "pending,running,completed,failed"

# Create a field with a default value
diffuse fields create \
  -k priority \
  -n "Priority" \
  --type NUMBER \
  --default "5"

# Update a field name
diffuse fields update resolution --name "Crystal Resolution"

# Deactivate a field
diffuse fields update resolution --inactive

# Delete a field (with confirmation)
diffuse fields delete resolution

# Delete without confirmation prompt
diffuse fields delete resolution -y

Experiment Types

Admin Role Required

Create, update, and delete operations require admin privileges. List and view operations are available to all users.

Manage experiment types and their associated metadata fields:

Command	Description
`diffuse types list [--include-inactive]`	List all experiment types
`diffuse types view TYPE_ID`	View type details with associated fields
`diffuse types create -n NAME -d DISPLAY_NAME`	Create a new experiment type
`diffuse types update TYPE_ID`	Update a type's properties
`diffuse types delete TYPE_ID`	Delete an experiment type
`diffuse types set-fields TYPE_ID FIELD_IDS...`	Associate metadata fields with a type

# List all experiment types
diffuse types list

# Include inactive types
diffuse types list --include-inactive

# View type with associated fields
diffuse types view protocol

# Create a new experiment type
diffuse types create \
  -n crystallography \
  -d "Crystallography Experiment" \
  --description "Standard MX data collection"

# Create a type with a template file
diffuse types create \
  -n cryo-em \
  -d "Cryo-EM Experiment" \
  --template ./template.md

# Update a type's display name
diffuse types update crystallography --display-name "X-ray Crystallography"

# Update a type's template
diffuse types update crystallography --template ./new-template.md

# Deactivate a type
diffuse types update crystallography --inactive

# Associate metadata fields with a type
diffuse types set-fields crystallography resolution beamline wavelength

# Delete a type (with confirmation)
diffuse types delete crystallography

# Delete without confirmation prompt
diffuse types delete crystallography -y

Advanced Usage

Scripting & Automation

The CLI is designed for scripting and CI/CD integration:

#!/bin/bash
# Create experiment and upload results

EXP_ID=$(diffuse create \
  --title "Automated Run $(date +%Y%m%d)" \
  --tags "automation,ci" \
  --format json | jq -r '.id')

echo "Created experiment: $EXP_ID"

# Upload entire checkpoints directory (recursive)
diffuse upload $EXP_ID ./checkpoints/

# Or upload individual files
diffuse upload $EXP_ID ./model.pth
diffuse upload $EXP_ID ./metrics.json

# Set metadata
diffuse metadata set $EXP_ID commit_sha $GITHUB_SHA
diffuse metadata set $EXP_ID build_number $BUILD_NUMBER

# Publish if successful
if [ $? -eq 0 ]; then
  diffuse publish $EXP_ID
fi

GitHub Actions Integration

name: Train and Track
on: [push]
jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh
      - name: Install DiffUSE CLI
        run: |
          git clone https://github.com/diff-use/webapp.git /tmp/diffuse
          cd /tmp/diffuse/cli
          uv tool install .
      - name: Authenticate
        run: |
          mkdir -p ~/.diffuse
          echo '{"github_token": "${{ secrets.DIFFUSE_TOKEN }}"}' > ~/.diffuse/config.json
      - name: Create experiment
        run: |
          EXP_ID=$(diffuse create --title "Run ${{ github.run_number }}" --format json | jq -r '.id')
          echo "EXP_ID=$EXP_ID" >> $GITHUB_ENV
      - name: Train model
        run: python train.py
      - name: Upload results
        run: diffuse upload ${{ env.EXP_ID }} ./model.pth

Run Profiles

Run profiles define reusable execution configurations (container image, GPU requirements, runner type) for triggering experiment runs.

Command	Description
`diffuse profiles list`	List run profiles
`diffuse profiles view <profile>`	View profile details
`diffuse profiles create --name NAME --slug SLUG --image IMAGE`	Create a run profile
`diffuse profiles update <profile>`	Update a run profile

# List all active profiles
diffuse profiles list

# Filter by experiment type
diffuse profiles list --type crystallography

# Include inactive profiles
diffuse profiles list --all

# Table format
diffuse profiles list -f table

# JSON output for scripting
diffuse profiles list -f json

# View profile details
diffuse profiles view my-profile

# View in YAML format
diffuse profiles view my-profile -f yaml

# Create a minimal profile
diffuse profiles create \
  --name "GPU Training" \
  --slug gpu-training \
  --image ghcr.io/org/trainer:latest

# Create with all options
diffuse profiles create \
  --name "Cryo-EM Processing" \
  --slug cryo-em-proc \
  --image ghcr.io/org/cryo:v2.1 \
  --runner k8s_job \
  --gpus 2 \
  --type cryo_em \
  --tag latest \
  --entrypoint "python run.py" \
  --description "Standard cryo-EM pipeline" \
  --input-schema '{"resolution": "float", "voltage": "int"}'

# Update a profile's image
diffuse profiles update gpu-training --image ghcr.io/org/trainer:v3

# Update GPU requirements and tag
diffuse profiles update gpu-training --gpus 4 --tag v3

# Change experiment type
diffuse profiles update gpu-training --type crystallography

# Deactivate a profile
diffuse profiles update gpu-training --deactivate

# Re-activate a profile
diffuse profiles update gpu-training --activate

Experiment Runs

Trigger, monitor, and manage experiment runs. Runs execute a run profile against an experiment's data and parameters.

Command	Description
`diffuse run trigger [EXPERIMENT] --profile SLUG`	Trigger a run using a profile
`diffuse run list`	List experiment runs
`diffuse run info <run-id>`	View run details
`diffuse run cancel <run-id>`	Cancel a pending or running run

# Trigger a run with a profile (no experiment scope)
diffuse run trigger --profile gpu-training

# Trigger scoped to an experiment (params auto-resolve from metadata)
diffuse run trigger EXP-10 --profile gpu-training

# Specify a target cluster
diffuse run trigger EXP-10 --profile gpu-training --cluster my-cluster-id

# Pass runtime parameters
diffuse run trigger EXP-10 --profile gpu-training \
  --param learning_rate=0.001 \
  --param epochs=50

# JSON output for scripting
diffuse run trigger EXP-10 --profile gpu-training -f json

# List all runs
diffuse run list

# Filter by experiment
diffuse run list --experiment EXP-10

# Filter by profile
diffuse run list --profile gpu-training

# Filter by status (pending, running, completed, failed, cancelled)
diffuse run list --status running

# Limit results
diffuse run list --limit 20

# Table format
diffuse run list -f table

# JSON output for scripting
diffuse run list -f json

# View run details
diffuse run info <run-id>

# View in JSON format
diffuse run info <run-id> -f json

# View in YAML format
diffuse run info <run-id> -f yaml

# Cancel a pending or running run
diffuse run cancel <run-id>

Governance

Manage governance tasks for data quality and compliance.

# List all tasks
diffuse governance list

# Filter by type (duplicate_review, missing_metadata, validation_failure, enrichment_required)
diffuse governance list --type missing_metadata

# Filter by status (open, in_progress, resolved, dismissed)
diffuse governance list --status open

# Filter by experiment
diffuse governance list --experiment EXP-1

# Show only tasks assigned to me
diffuse governance list --mine

# Table output format
diffuse governance list -f table

# View task details
diffuse governance view <task-id>

# JSON output
diffuse governance view <task-id> -f json

# Create a task (required: --title, --type)
diffuse governance create \
  --title "Missing sample metadata" \
  --type missing_metadata

# Create with all options
diffuse governance create \
  --title "Review duplicate files" \
  --type duplicate_review \
  --priority high \
  --description "Multiple artifacts with same checksum" \
  --experiment EXP-5

# Task types: duplicate_review, missing_metadata, validation_failure, enrichment_required
# Priorities: low, medium, high, critical

# Update title
diffuse governance update <task-id> --title "New title"

# Update status to in_progress
diffuse governance update <task-id> --status in_progress

# Update priority
diffuse governance update <task-id> --priority high

# Resolve task
diffuse governance resolve <task-id>

# Resolve with notes
diffuse governance resolve <task-id> --notes "Deleted duplicate artifacts"

# Dismiss task with reason (required)
diffuse governance dismiss <task-id> --reason "False positive"

Admin Role Required

Hard delete removes the task permanently.

# Delete task (requires confirmation)
diffuse governance delete <task-id>

# Skip confirmation
diffuse governance delete <task-id> -y

Experiment Workflows

Run GPU-accelerated experiment workflows on the Voltage Park K8s cluster. Each workflow is a Docker-packaged scientific pipeline with named presets and configurable parameters.

Available Workflows

Sampleworks — Protein structure prediction (Boltz2, Protenix, RF3). 2 GPUs.
WaterFlow — Protein water structure prediction and training. 1 GPU.

CLI:

# List all presets (sampleworks + waterflow)
diffuse run presets

# Filter by workflow type
diffuse run presets -t sampleworks
diffuse run presets -t waterflow

# All presets (unified endpoint)
curl https://hub.diffuse.science/api/run/presets \
  -H "Authorization: Bearer $TOKEN"

# Filter by type
curl "https://hub.diffuse.science/api/run/presets?experiment_type=waterflow" \
  -H "Authorization: Bearer $TOKEN"

# Type-specific endpoint (waterflow)
curl https://hub.diffuse.science/api/waterflow/presets \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Start a run using a profile slug
diffuse run start --profile boltz2-xrd

# Override parameters
diffuse run start --profile boltz2-xrd -p gradient_weights="0.1 0.2"

# Target a specific cluster
diffuse run start --profile boltz2-xrd --cluster my-cluster

curl -X POST https://hub.diffuse.science/api/run/submit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "profile_slug": "boltz2-xrd",
    "params": {"gradient_weights": "0.1 0.2"}
  }'

Presets:

Preset	Command	Description	GPUs
`train-gvp`	train	Train with GVP encoder (default)	1
`train-esm`	train	Train with ESM encoder	1
`train-slae`	train	Train with SLAE encoder	1
`inference`	inference	Run inference with trained checkpoint	1
`generate-esm`	generate-esm	Generate ESM3 embeddings for PDB files	1

CLI:

# Run default preset (train-gvp)
diffuse run waterflow

# Run a specific preset
diffuse run waterflow train-esm

# Target a specific machine
diffuse run waterflow train-gvp --machine vratin-1

# Override training parameters
diffuse run waterflow train-gvp --epochs 100 --batch-size 8 --lr 5e-5

# Custom run name for checkpoints
diffuse run waterflow train-gvp -n my_experiment

# Wait for completion
diffuse run waterflow train-gvp --wait

# Run inference with a trained checkpoint
diffuse run waterflow inference --run-dir /data/checkpoints/my_run

# Generate ESM embeddings
diffuse run waterflow generate-esm --split-file /data/splits/train.txt

# With Weights & Biases logging
diffuse run waterflow train-esm --wandb-key $WANDB_KEY --wandb-project my-project

# Type-specific endpoint
curl -X POST https://hub.diffuse.science/api/waterflow/run \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "preset": "train-gvp",
    "machine": "vratin-1",
    "epochs": 100,
    "batch_size": 8,
    "lr": 5e-5
  }'

# Unified endpoint
curl -X POST https://hub.diffuse.science/api/run/submit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_type": "waterflow",
    "preset": "train-gvp",
    "machine": "vratin-1",
    "parameters": {
      "epochs": 100,
      "batch_size": 8,
      "lr": 5e-5
    }
  }'

# Inference with trained checkpoint
curl -X POST https://hub.diffuse.science/api/waterflow/run \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "preset": "inference",
    "machine": "vratin-1",
    "run_dir": "/data/checkpoints/my_run"
  }'

CLI:

# List all K8s jobs
diffuse run status

# Check a specific job
diffuse run status waterflow-train-gvp

# Target a specific machine
diffuse run status --machine vratin-1
diffuse run status waterflow-train-gvp --machine vratin-1

# Filter by label
diffuse run status -l app=sampleworks

# JSON output
diffuse run status -f json

# List all jobs on a machine
curl "https://hub.diffuse.science/api/k8s/jobs?machine=vratin-1" \
  -H "Authorization: Bearer $TOKEN"

# Get a specific job
curl "https://hub.diffuse.science/api/k8s/jobs/waterflow-train-gvp?machine=vratin-1" \
  -H "Authorization: Bearer $TOKEN"

# Filter by label
curl "https://hub.diffuse.science/api/k8s/jobs?label_selector=app%3Dsampleworks&machine=sampleworks" \
  -H "Authorization: Bearer $TOKEN"

CLI:

# Fetch logs from a job
diffuse run logs waterflow-train-gvp

# Target a specific machine
diffuse run logs waterflow-train-gvp --machine vratin-1

# Control number of tail lines
diffuse run logs waterflow-train-gvp --tail 500

curl "https://hub.diffuse.science/api/k8s/jobs/waterflow-train-gvp/logs?machine=vratin-1&tail=200" \
  -H "Authorization: Bearer $TOKEN"

WaterFlow Parameter Reference

Flag	Type	Description	Applies to
`--wait, -w`	bool	Wait for K8s job to complete	All
`--machine, -m`	text	VP machine name to target	All
`--epochs`	int	Override epoch count (default: 200)	train-*
`--batch-size, -b`	int	Override batch size (default: 4)	train-*
`--lr`	float	Override learning rate (default: 1e-4)	train-*
`--encoder`	text	Override encoder type (gvp, esm, slae)	train-*
`--run-name, -n`	text	Custom run name for checkpoints	train-*
`--run-dir`	text	Training run directory (required)	inference
`--num-steps`	int	Override integration steps (default: 100)	inference
`--split-file`	text	Override split file for ESM generation	generate-esm
`--wandb-key`	text	W&B API key (omit to disable)	All
`--wandb-project`	text	W&B project name	All

Quick Links

Installation

Prerequisites

Run without installing

Authentication

API Keys

Experiments

List Experiments

Create Experiment

View Experiment

Edit Experiment

Publish / Unpublish

Search & Filtering

CLI Examples

API Examples

Managing Tags & Metadata

Tags

Metadata

Discover Available Filter Values

Experiment Relationships

Relationship Types

List Relationships

Add Relationship

Remove Relationship

File Uploads & Downloads

Smart Upload Optimizations

Directory Upload Exclusions

Artifact Management

New: M:N Architecture

List & Browse Artifacts

View Artifact Details

Connect/Disconnect Artifacts

Delete Artifacts

Deletion Policy

Collections

Display IDs

List and Search Collections

Create and Update Collections

Manage Artifacts in Collections

Connect Collections to Experiments

Snapshots (Immutable Versions)

Snapshots are Immutable

External Buckets (OSN / S3)

Link by Reference

List Buckets & Browse Files

Connect External Artifacts

Example Workflow

Metadata Management

Activity & Audit Log

Configuration

Environment Configuration

Security Note

Metadata Fields

Admin Role Required

Examples

Experiment Types

Admin Role Required

Examples

Advanced Usage

Scripting & Automation

GitHub Actions Integration

Run Profiles

List and View Profiles

Create and Update Profiles

Experiment Runs

Trigger Runs

List and Filter Runs

View and Cancel Runs

Governance

governance list List governance tasks

governance view View task details

governance create Create a governance task

governance update Update a task

governance resolve Resolve a task

governance dismiss Dismiss a task

governance delete Delete a task (admin)

Admin Role Required

Experiment Workflows

Available Workflows