Manage experiments, artifacts, and metadata from your terminal with diffuse.
The CLI mirrors the web interface and provides scriptable access to all DiffUSE Hub functionality.
uv first: brew install uvInstall the DiffUSE CLI with a single command. Choose HTTPS or SSH based on your setup:
gh CLI)uv tool install git+https://github.com/diff-use/webapp.git#subdirectory=cli
Uses GitHub's HTTPS protocol. Requires the GitHub CLI (gh auth login) for authentication to private repositories.
uv tool install git+ssh://git@github.com/diff-use/webapp.git#subdirectory=cli
Uses SSH protocol. Requires a valid SSH key configured in your GitHub account.
# Show help (both -h and --help work for all commands)
diffuse -h
diffuse list -h
uv tool upgrade diffuse-cli
The CLI automatically checks for updates once per day and notifies you when a new version is available.
git+ssh://...) with a password-protected key, uv may appear stuck on "Resolving dependencies..." because the password prompt is hidden. Add -v for verbose mode to see the prompt. If you see a GitHub host authenticity warning, run: ssh-keyscan github.com >> ~/.ssh/known_hostsAuthenticate with GitHub to access private experiments and perform write operations:
| Command | Description |
|---|---|
diffuse auth login |
Login via GitHub device flow |
diffuse auth status |
Check authentication and API health |
diffuse auth logout |
Clear stored credentials |
$ diffuse auth login
============================================================
Visit: https://github.com/login/device
Enter code: ABCD-1234
============================================================
Waiting for authorization (expires in 900s)...
✓ Authentication successful!
# Get your token from ~/.diffuse/config.json
TOKEN=$(cat ~/.diffuse/config.json | jq -r '.github_token')
# Check authentication status
curl -H "Authorization: Bearer $TOKEN" \
https://hub.diffuse.science/api/whoami
Manage personal API keys for programmatic access and CI/automation workflows:
| Command | Description |
|---|---|
diffuse keys list |
List your personal API keys |
diffuse keys create NAME |
Create a new personal API key (--description to add a purpose, --save to store locally, --expires-in-days N for auto-expiry) |
diffuse keys revoke KEY_ID |
Revoke an API key |
diffuse keys set KEY |
Store an API key in local config |
diffuse keys clear |
Remove stored API key from local config |
# Create a key and save it to ~/.diffuse/config.json
$ diffuse keys create "ci-runner" --save
✓ API key created: dfk_xxxxxxxxxxxxxxxxxxxx
✓ Key saved to ~/.diffuse/config.json
# Use the DIFFUSE_API_KEY environment variable for CI/automation
export DIFFUSE_API_KEY=dfk_xxxxxxxxxxxxxxxxxxxx
diffuse list
DIFFUSE_API_KEY env var > api_key in config > github_token in config. Set DIFFUSE_API_KEY in CI to avoid committing credentials.
API_KEY=dfk_xxxxxxxxxxxxxxxxxxxx
# List personal keys
curl -H "Authorization: Bearer $API_KEY" \
https://hub.diffuse.science/api/api-keys/personal
# Create a personal key
curl -X POST -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "ci-runner"}' \
https://hub.diffuse.science/api/api-keys/personal
# Revoke a key
curl -X DELETE -H "Authorization: Bearer $API_KEY" \
https://hub.diffuse.science/api/api-keys/personal/{key_id}
Create, view, edit, and manage experiments from the command line:
| Command | Description |
|---|---|
diffuse list |
List all experiments (tree format by default) |
diffuse view <id> |
View experiment details |
diffuse types |
List available experiment types |
diffuse create |
Create a new experiment |
diffuse edit <id> |
Edit an experiment |
diffuse delete <id> |
Delete an experiment |
diffuse publish <id> |
Publish to public catalog |
diffuse unpublish <id> |
Unpublish from catalog |
# List all experiments (tree format, interactive pagination)
diffuse list
# Query syntax (field:value, AND/OR/NOT, comparisons, wildcards)
diffuse list -q "tag:crystal AND type:mx"
diffuse list -q "author:smith OR author:jones"
diffuse list -q "created:>2024-01-01 AND public:true"
diffuse list -q "beamline:*" # All experiments with beamline set
# Text search (searches title, summary, tags)
diffuse list -q "lysozyme"
# Output formats: tree (default), table, json, yaml
diffuse list -f table
diffuse list -f json
# Pagination: use --page/-p to get a specific page
diffuse list --page 1 --page-size 10
diffuse list -p 2 -n 20
# Sort experiments
diffuse list -s recent
diffuse list -s title
# Combine query and sorting
diffuse list -q "tag:processed AND type:mx" -s title
curl https://hub.diffuse.science/api/experiments
# Interactive prompt
diffuse create
# With flags
diffuse create \
--title "ResNet Training Run" \
--summary "Training ResNet-50 on ImageNet" \
--tags "vision,resnet,imagenet"
# With experiment type (see 'diffuse types' for available types)
diffuse create \
-t "My Protocol" \
--type protocol
# With markdown file
diffuse create \
--title "Experiment Log" \
--markdown ./experiment.md
curl -X POST https://hub.diffuse.science/api/experiments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "ResNet Training Run",
"summary": "Training ResNet-50 on ImageNet",
"tags": ["vision", "resnet", "imagenet"]
}'
# View experiment details (tree format by default)
diffuse view EXP-123
# Output formats: tree (default), table, json, yaml
diffuse view EXP-123 -f table
diffuse view EXP-123 -f json
curl https://hub.diffuse.science/api/experiments/EXP-123
# Update title or summary
diffuse edit EXP-123 --title "Updated Title"
diffuse edit EXP-123 --summary "New summary"
# Update tags
diffuse edit EXP-123 --tags "new,tags,here"
# Open markdown in $EDITOR
diffuse edit EXP-123 --editor
# Update from markdown file
diffuse edit EXP-123 --markdown ./updated.md
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "Updated Title",
"summary": "New summary"
}'
# Publish experiment
diffuse publish EXP-123
# Unpublish experiment
diffuse unpublish EXP-123
# Publish
curl -X POST https://hub.diffuse.science/api/experiments/EXP-123/publish \
-H "Authorization: Bearer $TOKEN"
# Unpublish
curl -X POST https://hub.diffuse.science/api/experiments/EXP-123/unpublish \
-H "Authorization: Bearer $TOKEN"
Find experiments by text, tags, metadata fields, and more. All filtering happens server-side for efficiency.
# Field:value syntax
diffuse list -q "tag:crystal"
diffuse list -q "type:mx"
diffuse list -q "author:smith"
# Text search (searches title, summary, tags, ID)
diffuse list -q "lysozyme"
diffuse list -q '"protein structure"' # Quoted phrase
# AND - all conditions must match
diffuse list -q "tag:crystal AND type:mx"
# OR - either condition matches
diffuse list -q "tag:xray OR tag:neutron"
# NOT - negate condition
diffuse list -q "NOT public:false"
# Grouping with parentheses
diffuse list -q "(tag:xray OR tag:neutron) AND type:mx"
# Date comparisons
diffuse list -q "created:>2024-01-01"
diffuse list -q "updated:>=2024-06-01"
diffuse list -q "created:>2024-01-01 AND created:<2024-12-31"
# Numeric metadata comparisons
diffuse list -q "resolution:>1.5"
diffuse list -q "epochs:>=100"
# Prefix wildcard
diffuse list -q "tag:cryst*"
diffuse list -q "title:Lysoz*"
# Match any value (field exists)
diffuse list -q "beamline:*"
# Sort by most recent (default)
diffuse list -s recent
# Sort by oldest first
diffuse list -s oldest
# Sort alphabetically by title
diffuse list -s title
# Complex query with sorting
diffuse list -q "(tag:crystal OR tag:protein) AND created:>2024-01-01" -s title
# Paginated results
diffuse list -q "type:mx" --page 2 --page-size 10
The API endpoint GET /api/experiments supports the unified q parameter with full query syntax:
| Parameter | Description | Example |
|---|---|---|
q |
Query string with search syntax | ?q=tag:crystal AND type:mx |
search |
Legacy Text search | ?search=resnet |
tag |
Legacy Filter by tag | ?tag=vision |
sort |
Sort field | ?sort=title |
order |
Sort direction | ?order=asc |
limit |
Max results (1-200) | ?limit=20 |
offset |
Skip N results | ?offset=20 |
# Field:value syntax
curl "https://hub.diffuse.science/api/experiments?q=tag:crystal"
# Boolean operators (URL-encode spaces)
curl "https://hub.diffuse.science/api/experiments?q=tag:crystal%20AND%20type:mx"
# Comparison operators
curl "https://hub.diffuse.science/api/experiments?q=created:%3E2024-01-01"
# Text search
curl "https://hub.diffuse.science/api/experiments?q=lysozyme"
# Sort by title A-Z
curl "https://hub.diffuse.science/api/experiments?sort=title&order=asc"
# Page 2 with 10 results per page
curl "https://hub.diffuse.science/api/experiments?limit=10&offset=10"
# Query + sort + pagination
curl "https://hub.diffuse.science/api/experiments?q=tag:crystal%20AND%20public:true&sort=updated_at&order=desc&limit=20" \
-H "Authorization: Bearer $TOKEN"
# Create experiment with tags
diffuse create --title "My Experiment" --tags "vision,resnet,imagenet"
# Update tags on existing experiment
diffuse edit EXP-123 --tags "new,tags,here"
# API: create with tags
curl -X POST https://hub.diffuse.science/api/experiments \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"title": "My Experiment", "tags": ["vision", "resnet"]}'
# API: update tags
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"tags": ["new", "tags"]}'
# Set individual metadata field
diffuse metadata set EXP-123 learning_rate 0.001
# Apply metadata from JSON file
echo '{"learning_rate": "0.001", "batch_size": "32"}' > metadata.json
diffuse metadata apply EXP-123 -f metadata.json
# Get metadata
diffuse metadata get EXP-123
# API: set metadata
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123/metadata \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"metadata": {"learning_rate": "0.001", "batch_size": "32"}}'
# Get metadata facets - shows all field values and counts
curl https://hub.diffuse.science/api/metadata/facets
# Response example:
# [
# {
# "field_key": "dataset",
# "field_display_name": "Dataset",
# "values": {"mnist": 5, "cifar10": 3, "imagenet": 2}
# },
# {
# "field_key": "optimizer",
# "field_display_name": "Optimizer",
# "values": {"adam": 8, "sgd": 4}
# }
# ]
Link experiments together to track dependencies, data reuse, and related work:
uses: Indicates one experiment uses data or results from another (e.g., "Experiment A uses the dataset from Experiment B")relates_to: General association between experiments (e.g., related analyses, follow-up studies)| Command | Description |
|---|---|
diffuse relationships list <id> |
List all relationships for an experiment |
diffuse relationships add <from> <to> |
Create a relationship between experiments |
diffuse relationships remove <from> <to> |
Remove a relationship between experiments |
# List all relationships for an experiment
diffuse relationships list EXP-5
# Output formats: tree (default), table, json, yaml
diffuse relationships list EXP-5 -f table
diffuse relationships list EXP-5 -f json
curl https://hub.diffuse.science/api/experiments/EXP-5/relationships \
-H "Authorization: Bearer $TOKEN"
# Create a "uses" relationship (EXP-10 uses data from EXP-5)
diffuse relationships add EXP-10 EXP-5 --type uses
# Create a "relates_to" relationship (general association)
diffuse relationships add EXP-10 EXP-7 --type relates_to
# Default type is "uses" if not specified
diffuse relationships add EXP-10 EXP-5
curl -X POST https://hub.diffuse.science/api/experiments/EXP-10/relationships \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"from_experiment_id": "EXP-10",
"to_experiment_id": "EXP-5",
"relationship_type": "uses"
}'
# Remove a "uses" relationship
diffuse relationships remove EXP-10 EXP-5 --type uses
# Remove a "relates_to" relationship
diffuse relationships remove EXP-10 EXP-7 --type relates_to
curl -X DELETE "https://hub.diffuse.science/api/experiments/EXP-10/relationships?from_experiment_id=EXP-10&to_experiment_id=EXP-5&relationship_type=uses" \
-H "Authorization: Bearer $TOKEN"
Upload artifacts and download experiment data:
| Command | Description |
|---|---|
diffuse upload <id> <paths...> |
Upload files or directories to experiment |
diffuse download <id> <name> |
Download an artifact |
# Upload a single file
diffuse upload EXP-123 ./model.pth
# Upload a directory (recursive, preserves structure)
diffuse upload EXP-123 ./checkpoints/
# Upload multiple files via glob pattern
diffuse upload EXP-123 *.log
diffuse upload EXP-123 results_*.csv
# Upload specific files
diffuse upload EXP-123 model.pth config.yaml metrics.json
# Mix files and directories
diffuse upload EXP-123 ./data/ summary.txt
# Control upload parallelism (default: 32 concurrent streams)
diffuse upload EXP-123 large_file.bin -c 16
~/.diffuse/checksum_cache.json) avoids recomputing checksums for unchanged files--resume to continue interrupted uploads-c)__pycache__, .git, .venv, node_modules, *.pyc, and other common non-essential files.# 1. Create upload session
curl -X POST https://hub.diffuse.science/api/uploads/session \
-H "Content-Type: application/json" \
-d '{
"experiment_id": "EXP-123",
"filename": "model.pth",
"content_type": "application/octet-stream",
"size_bytes": 102400,
"metadata_document": {}
}'
# 2. Upload chunks to presigned URLs (from response above)
# Upload each part to its presigned URL
curl -X PUT "$PRESIGNED_URL_PART_1" \
--data-binary @model.pth.part1 \
-H "Content-Type: application/octet-stream"
# Capture ETag from response headers for finalization
# ETag: "abc123..."
# 3. Finalize upload
curl -X POST https://hub.diffuse.science/api/uploads/sess_xyz/finalize \
-H "Content-Type: application/json" \
-d '{
"session_id": "sess_xyz",
"parts": [
{"part_number": 1, "etag": "abc123"},
{"part_number": 2, "etag": "def456"}
],
"metadata_document": {}
}'
# Download artifact
diffuse download EXP-123 model.pth
# Save to specific location
diffuse download EXP-123 model.pth --output ./downloads/model.pth
# Get presigned download URL
curl https://hub.diffuse.science/api/experiments/EXP-123/artifacts/model.pth/download \
-H "Authorization: Bearer $TOKEN"
# Download file using the presigned URL from response
curl -o model.pth "$PRESIGNED_DOWNLOAD_URL"
Manage artifacts across experiments with connect/disconnect operations:
| Command | Description |
|---|---|
diffuse artifacts list |
List all artifacts (global view) |
diffuse artifacts list --experiment EXP-5 |
List artifacts connected to experiment |
diffuse artifacts info <artifact-id> |
Show artifact details and connections |
diffuse artifacts connect EXP-5 <id> [<id>...] |
Connect one or more artifacts to experiment |
diffuse artifacts disconnect EXP-5 <id> [<id>...] |
Disconnect one or more artifacts from experiment |
diffuse artifacts delete <artifact-id> |
Delete artifact (fails if connected) |
diffuse artifacts buckets |
List configured external buckets (OSN, S3) |
diffuse artifacts browse <bucket> |
Browse contents of an external bucket |
diffuse artifacts connect EXP-1 <keys> --bucket <name> |
Connect external artifacts to an experiment |
# List all artifacts (tree format by default)
diffuse artifacts list
# Output formats: tree (default), table, json, yaml
diffuse artifacts list -f table
# Filter by experiment
diffuse artifacts list --experiment EXP-5
# Show orphaned artifacts (no connections)
diffuse artifacts list --orphaned
# List all artifacts
curl https://hub.diffuse.science/api/artifacts \
-H "Authorization: Bearer $TOKEN"
# Filter by experiment
curl https://hub.diffuse.science/api/artifacts?experiment_id=EXP-5 \
-H "Authorization: Bearer $TOKEN"
# Show artifact details including all connected experiments
diffuse artifacts info <artifact-id>
# Example output:
# Artifact: model.pth
# Type: internal
# Size: 245.3 MB
# Status: ready
# Connected to 3 experiments:
# - EXP-5: ./baseline-model.pth
# - EXP-7: ./improved-model.pth
# - EXP-12: ./final-model.pth
curl https://hub.diffuse.science/api/artifacts/<artifact-id> \
-H "Authorization: Bearer $TOKEN"
# Connect existing artifact to experiment
diffuse artifacts connect EXP-7 <artifact-id>
# Connect multiple artifacts at once
diffuse artifacts connect EXP-7 artifact-1 artifact-2 artifact-3
# Disconnect artifact from experiment
diffuse artifacts disconnect EXP-7 <artifact-id>
# Disconnect multiple artifacts at once
diffuse artifacts disconnect EXP-7 artifact-1 artifact-2
# Example workflow: Upload to Exp A, connect to Exp B
diffuse upload EXP-5 ./model.pth # Returns artifact-id: abc123
diffuse artifacts connect EXP-7 abc123
# Connect single artifact
curl -X POST https://hub.diffuse.science/api/experiments/EXP-7/artifacts/connect \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"artifact_id": "abc123"}'
# Batch connect multiple artifacts
curl -X POST https://hub.diffuse.science/api/artifacts/batch-connect \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"experiment_id": "EXP-7",
"artifact_ids": ["artifact-1", "artifact-2", "artifact-3"]
}'
# Disconnect artifact
curl -X DELETE https://hub.diffuse.science/api/experiments/EXP-7/artifacts/abc123 \
-H "Authorization: Bearer $TOKEN"
# Delete artifact (only works if disconnected from all experiments)
diffuse artifacts delete <artifact-id>
# If still connected, you'll see:
# Error: Cannot delete artifact. Still connected to 2 experiment(s):
# - EXP-5
# - EXP-7
# Use 'diffuse artifacts disconnect' first.
# Admin force delete (even if connected)
diffuse artifacts delete <artifact-id> --force
# Admin force delete
curl -X DELETE https://hub.diffuse.science/api/artifacts/<artifact-id>/force-delete \
-H "Authorization: Bearer $TOKEN"
Collections provide logical grouping for artifacts, enabling organization, sharing, and versioning across multiple experiments. Collections support an M:M:M relationship: Experiments ↔ Collections ↔ Artifacts.
COL-N format (e.g., COL-1, COL-42) for easy citation and reference.| Command | Description |
|---|---|
diffuse collections list |
List all collections |
diffuse collections create "Name" |
Create a new collection |
diffuse collections info COL-5 |
Show collection details |
diffuse collections update COL-5 |
Update collection metadata |
diffuse collections delete COL-5 |
Delete a collection |
diffuse collections add COL-5 <artifact-ids> |
Add artifacts to collection |
diffuse collections remove COL-5 <artifact-ids> |
Remove artifacts from collection |
diffuse collections connect COL-5 EXP-10 |
Connect collection to experiment |
diffuse collections disconnect COL-5 EXP-10 |
Disconnect from experiment |
diffuse collections snapshot COL-5 |
Create immutable snapshot |
diffuse collections snapshots COL-5 |
List collection snapshots |
diffuse collections add COL-5 <keys> --bucket <name> |
Add external artifacts to a collection |
# List all collections (tree format)
diffuse collections list
# Table format
diffuse collections list -f table
# Search by name
diffuse collections list --search "training"
# Filter by tag
diffuse collections list --tag ml --tag 2024
# Include snapshots
diffuse collections list --include-snapshots
# JSON output for scripting
diffuse collections list -f json
# List collections
curl https://hub.diffuse.science/api/collections \
-H "Authorization: Bearer $TOKEN"
# With search and tag filters
curl "https://hub.diffuse.science/api/collections?search=training&tag=ml" \
-H "Authorization: Bearer $TOKEN"
# Create a new collection
diffuse collections create "Training Data v2" \
--description "Updated dataset with augmentations" \
--tag ml --tag 2024
# Update collection metadata
diffuse collections update COL-5 --name "Training Data v2.1"
diffuse collections update COL-5 --description "New description"
diffuse collections update COL-5 --add-tag production
diffuse collections update COL-5 --remove-tag draft
# Delete (blocked if connected to experiments)
diffuse collections delete COL-5
# Force delete (admin only)
diffuse collections delete COL-5 --force
# Create collection
curl -X POST https://hub.diffuse.science/api/collections \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Training Data v2",
"description": "Updated dataset with augmentations",
"tags": ["ml", "2024"]
}'
# Update collection
curl -X PATCH https://hub.diffuse.science/api/collections/COL-5 \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "Training Data v2.1"}'
# Add artifacts to a collection
diffuse collections add COL-5 <artifact-id>
# Add multiple artifacts at once
diffuse collections add COL-5 artifact-1 artifact-2 artifact-3
# Add with custom path (how it appears in collection)
diffuse collections add COL-5 artifact-1 --path "./custom/path.csv"
# Remove artifact from collection
diffuse collections remove COL-5 <artifact-id>
# Remove multiple artifacts
diffuse collections remove COL-5 artifact-1 artifact-2
# Add artifact
curl -X POST https://hub.diffuse.science/api/collections/COL-5/artifacts \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"artifact_id": "abc123", "artifact_path": "./data.csv"}'
# Batch add
curl -X POST https://hub.diffuse.science/api/collections/COL-5/artifacts/batch \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"artifact_ids": ["artifact-1", "artifact-2"]}'
# Remove artifact
curl -X DELETE https://hub.diffuse.science/api/collections/COL-5/artifacts/abc123 \
-H "Authorization: Bearer $TOKEN"
# Connect a collection to an experiment
diffuse collections connect COL-5 EXP-10
# Connect with an alias (context-specific name)
diffuse collections connect COL-5 EXP-10 --alias "Primary Training Data"
# Disconnect collection from experiment
diffuse collections disconnect COL-5 EXP-10
# Example workflow: Multiple experiments sharing one collection
diffuse collections connect COL-5 EXP-10 # Base experiment
diffuse collections connect COL-5 EXP-11 # Ablation study
diffuse collections connect COL-5 EXP-12 # Extended training
# Connect collection to experiment
curl -X POST https://hub.diffuse.science/api/experiments/EXP-10/collections/connect \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"collection_id": "COL-5", "alias": "Primary Training Data"}'
# Disconnect
curl -X DELETE https://hub.diffuse.science/api/experiments/EXP-10/collections/COL-5 \
-H "Authorization: Bearer $TOKEN"
# Create an immutable snapshot
diffuse collections snapshot COL-5 --note "Pre-publication freeze"
# List all snapshots of a collection
diffuse collections snapshots COL-5
# Example workflow: Publication versioning
diffuse collections create "ImageNet Training v1"
diffuse collections add COL-1 artifact-1 artifact-2 artifact-3
# ... later, before publishing ...
diffuse collections snapshot COL-1 --note "Paper submission v1"
# ... after revisions ...
diffuse collections add COL-1 artifact-4
diffuse collections snapshot COL-1 --note "Camera ready version"
# Create snapshot
curl -X POST https://hub.diffuse.science/api/collections/COL-5/snapshots \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"note": "Pre-publication freeze"}'
# List snapshots
curl https://hub.diffuse.science/api/collections/COL-5/snapshots \
-H "Authorization: Bearer $TOKEN"
Browse and attach files from external S3-compatible buckets (OSN, AWS S3, MinIO, etc.) to experiments and collections:
| Command | Description |
|---|---|
diffuse artifacts buckets |
List all configured external buckets |
diffuse artifacts browse <bucket> |
Browse files in an external bucket |
diffuse artifacts connect <exp> <keys...> --bucket <name> |
Connect external files to an experiment |
diffuse collections add <col> <keys...> --bucket <name> |
Add external files to a collection |
# List all configured external buckets
diffuse artifacts buckets
# Table output (default):
# Name Type Bucket Public Endpoint
# osn-diffuse-chess osn diffuse-chess-data yes https://sdsc.osn.xsede.org
# aws-shared-datasets s3 shared-datasets no —
# JSON output for scripting
diffuse artifacts buckets -f json
# Browse the root of a bucket
diffuse artifacts browse osn-diffuse-chess
# Browse a specific prefix (directory)
diffuse artifacts browse osn-diffuse-chess -p data/2024/
# Force refresh (bypass server-side cache)
diffuse artifacts browse osn-diffuse-chess -p data/ --refresh
# JSON output for scripting
diffuse artifacts browse osn-diffuse-chess -f json
# List external buckets
curl https://hub.diffuse.science/api/external-buckets \
-H "Authorization: Bearer $TOKEN"
# Browse bucket contents
curl "https://hub.diffuse.science/api/external-buckets/osn-diffuse-chess?prefix=data/2024/" \
-H "Authorization: Bearer $TOKEN"
# Force refresh
curl "https://hub.diffuse.science/api/external-buckets/osn-diffuse-chess?prefix=data/&refresh=true" \
-H "Authorization: Bearer $TOKEN"
# Connect specific files to an experiment
diffuse artifacts connect EXP-1 data/scan001.h5 data/scan002.h5 \
--bucket osn-diffuse-chess
# Connect all files under a prefix (recursive)
diffuse artifacts connect EXP-1 \
--bucket osn-diffuse-chess --prefix data/2024/
# Skip confirmation prompt
diffuse artifacts connect EXP-1 data/scan001.h5 \
--bucket osn-diffuse-chess -y
# Add external files to a collection
diffuse collections add COL-1 data/scan001.h5 data/scan002.h5 \
--bucket osn-diffuse-chess
# Add all files under a prefix to a collection
diffuse collections add COL-1 \
--bucket osn-diffuse-chess --prefix data/2024/november/
--prefix to recursively attach entire directories. The CLI resolves all files under the prefix via the browse API before attaching.
# Attach external artifacts to an experiment
curl -X POST https://hub.diffuse.science/api/external-artifacts/batch-attach \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket_name": "osn-diffuse-chess",
"artifacts": [
{"s3_key": "data/scan001.h5"},
{"s3_key": "data/scan002.h5"}
]
}'
# Attach external artifacts to a collection
curl -X POST https://hub.diffuse.science/api/collections/COL-1/external-artifacts/batch-attach \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"bucket_name": "osn-diffuse-chess",
"artifacts": [
{"s3_key": "data/scan001.h5"},
{"s3_key": "data/scan002.h5"}
]
}'
# 1. See what buckets are available
diffuse artifacts buckets
# 2. Browse to find the files you need
diffuse artifacts browse osn-diffuse-chess
diffuse artifacts browse osn-diffuse-chess -p experiments/run42/
# 3. Connect specific files to your experiment
diffuse artifacts connect EXP-15 \
experiments/run42/results.h5 experiments/run42/config.yaml \
--bucket osn-diffuse-chess
# 4. Or connect an entire directory at once
diffuse artifacts connect EXP-15 \
--bucket osn-diffuse-chess --prefix experiments/run42/
# 5. Create a collection for a shared dataset
diffuse collections create "Chess OCR Training Set"
diffuse collections add COL-3 \
--bucket osn-diffuse-chess --prefix training/ocr/v2/
# 6. Connect that collection to multiple experiments
diffuse collections connect COL-3 EXP-15
diffuse collections connect COL-3 EXP-16
View and manage experiment metadata fields:
| Command | Description |
|---|---|
diffuse metadata get <id> |
Get experiment metadata |
diffuse metadata set <id> <key> <value> |
Set a metadata field |
diffuse metadata apply <id> -f <file> |
Apply metadata from file |
diffuse metadata fields |
List available field definitions |
# Set individual field
diffuse metadata set EXP-123 learning_rate 0.001
# Get all metadata
diffuse metadata get EXP-123
# Apply from JSON file
echo '{"learning_rate": 0.001, "batch_size": 32}' > metadata.json
diffuse metadata apply EXP-123 -f metadata.json
# Set metadata
curl -X PATCH https://hub.diffuse.science/api/experiments/EXP-123/metadata \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"metadata": {
"learning_rate": 0.001,
"batch_size": 32
}
}'
# Get metadata
curl https://hub.diffuse.science/api/experiments/EXP-123/metadata
# View last 20 activities (tree format by default)
diffuse activity
# Limit results
diffuse activity -n 50
# Output formats: tree (default), table, json, yaml
diffuse activity -f table
diffuse activity -f json
curl -H "Authorization: Bearer $TOKEN" \
https://hub.diffuse.science/api/activity
Manage CLI configuration stored in ~/.diffuse/config.json:
| Command | Description |
|---|---|
diffuse config view |
View current configuration |
diffuse config set <key> <value> |
Set configuration value |
diffuse config reset |
Reset configuration |
diffuse config env |
Show current environment |
diffuse config env --list |
List available environments |
diffuse config env <name> |
Switch environment (local/staging/prod) |
https://app.diffuse.science) by default. Use diffuse config env --list to see all environments.# Show available environments
diffuse config env --list
# Switch to staging
diffuse config env staging
# Switch to local development
diffuse config env local
# Switch back to production
diffuse config env prod
# Check current environment
diffuse config env
--server flag on each commandDIFFUSE_API_URL environment variable (full URL)DIFFUSE_ENV environment variable (local/staging/prod)~/.diffuse/config.json)https://app.diffuse.science)# View current config
diffuse config view
# Use per-command override
diffuse list --server https://custom.example.com
# Use environment variable (temporary)
export DIFFUSE_API_URL=https://custom.example.com
diffuse list
# Check authentication status (shows current API and GitHub Client ID)
diffuse auth status
~/.diffuse/config.json with restricted permissions (0600). Never commit this file to version control.Define metadata fields that can be associated with experiment types:
| Command | Description |
|---|---|
diffuse fields list [--include-inactive] |
List all field definitions |
diffuse fields create -k KEY -n NAME |
Create a new metadata field |
diffuse fields update FIELD_ID |
Update a field's properties |
diffuse fields delete FIELD_ID |
Delete a metadata field |
# List all field definitions
diffuse fields list
# Include inactive fields
diffuse fields list --include-inactive
# Create a numeric field with unit
diffuse fields create \
-k resolution \
-n "Resolution" \
--type NUMBER \
--unit "angstrom" \
--required
# Create an enum field with allowed values
diffuse fields create \
-k status \
-n "Status" \
--type ENUM \
--allowed-values "pending,running,completed,failed"
# Create a field with a default value
diffuse fields create \
-k priority \
-n "Priority" \
--type NUMBER \
--default "5"
# Update a field name
diffuse fields update resolution --name "Crystal Resolution"
# Deactivate a field
diffuse fields update resolution --inactive
# Delete a field (with confirmation)
diffuse fields delete resolution
# Delete without confirmation prompt
diffuse fields delete resolution -y
Manage experiment types and their associated metadata fields:
| Command | Description |
|---|---|
diffuse types list [--include-inactive] |
List all experiment types |
diffuse types view TYPE_ID |
View type details with associated fields |
diffuse types create -n NAME -d DISPLAY_NAME |
Create a new experiment type |
diffuse types update TYPE_ID |
Update a type's properties |
diffuse types delete TYPE_ID |
Delete an experiment type |
diffuse types set-fields TYPE_ID FIELD_IDS... |
Associate metadata fields with a type |
# List all experiment types
diffuse types list
# Include inactive types
diffuse types list --include-inactive
# View type with associated fields
diffuse types view protocol
# Create a new experiment type
diffuse types create \
-n crystallography \
-d "Crystallography Experiment" \
--description "Standard MX data collection"
# Create a type with a template file
diffuse types create \
-n cryo-em \
-d "Cryo-EM Experiment" \
--template ./template.md
# Update a type's display name
diffuse types update crystallography --display-name "X-ray Crystallography"
# Update a type's template
diffuse types update crystallography --template ./new-template.md
# Deactivate a type
diffuse types update crystallography --inactive
# Associate metadata fields with a type
diffuse types set-fields crystallography resolution beamline wavelength
# Delete a type (with confirmation)
diffuse types delete crystallography
# Delete without confirmation prompt
diffuse types delete crystallography -y
The CLI is designed for scripting and CI/CD integration:
#!/bin/bash
# Create experiment and upload results
EXP_ID=$(diffuse create \
--title "Automated Run $(date +%Y%m%d)" \
--tags "automation,ci" \
--format json | jq -r '.id')
echo "Created experiment: $EXP_ID"
# Upload entire checkpoints directory (recursive)
diffuse upload $EXP_ID ./checkpoints/
# Or upload individual files
diffuse upload $EXP_ID ./model.pth
diffuse upload $EXP_ID ./metrics.json
# Set metadata
diffuse metadata set $EXP_ID commit_sha $GITHUB_SHA
diffuse metadata set $EXP_ID build_number $BUILD_NUMBER
# Publish if successful
if [ $? -eq 0 ]; then
diffuse publish $EXP_ID
fi
name: Train and Track
on: [push]
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Install DiffUSE CLI
run: |
git clone https://github.com/diff-use/webapp.git /tmp/diffuse
cd /tmp/diffuse/cli
uv tool install .
- name: Authenticate
run: |
mkdir -p ~/.diffuse
echo '{"github_token": "${{ secrets.DIFFUSE_TOKEN }}"}' > ~/.diffuse/config.json
- name: Create experiment
run: |
EXP_ID=$(diffuse create --title "Run ${{ github.run_number }}" --format json | jq -r '.id')
echo "EXP_ID=$EXP_ID" >> $GITHUB_ENV
- name: Train model
run: python train.py
- name: Upload results
run: diffuse upload ${{ env.EXP_ID }} ./model.pth
Manage governance tasks for data quality and compliance.
# List all tasks
diffuse governance list
# Filter by type (duplicate_review, missing_metadata, validation_failure, enrichment_required)
diffuse governance list --type missing_metadata
# Filter by status (open, in_progress, resolved, dismissed)
diffuse governance list --status open
# Filter by experiment
diffuse governance list --experiment EXP-1
# Show only tasks assigned to me
diffuse governance list --mine
# Table output format
diffuse governance list -f table
# View task details
diffuse governance view <task-id>
# JSON output
diffuse governance view <task-id> -f json
# Create a task (required: --title, --type)
diffuse governance create \
--title "Missing sample metadata" \
--type missing_metadata
# Create with all options
diffuse governance create \
--title "Review duplicate files" \
--type duplicate_review \
--priority high \
--description "Multiple artifacts with same checksum" \
--experiment EXP-5
# Task types: duplicate_review, missing_metadata, validation_failure, enrichment_required
# Priorities: low, medium, high, critical
# Update title
diffuse governance update <task-id> --title "New title"
# Update status to in_progress
diffuse governance update <task-id> --status in_progress
# Update priority
diffuse governance update <task-id> --priority high
# Resolve task
diffuse governance resolve <task-id>
# Resolve with notes
diffuse governance resolve <task-id> --notes "Deleted duplicate artifacts"
# Dismiss task with reason (required)
diffuse governance dismiss <task-id> --reason "False positive"
Hard delete removes the task permanently.
# Delete task (requires confirmation)
diffuse governance delete <task-id>
# Skip confirmation
diffuse governance delete <task-id> -y
Run GPU-accelerated experiment workflows on the Voltage Park K8s cluster. Each workflow is a Docker-packaged scientific pipeline with named presets and configurable parameters.
Sampleworks — Protein structure prediction (Boltz2, Protenix, RF3). 2 GPUs.
WaterFlow — Protein water structure prediction and training. 1 GPU.
# List all presets (sampleworks + waterflow)
diffuse run presets
# Filter by workflow type
diffuse run presets -t sampleworks
diffuse run presets -t waterflow
# All presets (unified endpoint)
curl https://hub.diffuse.science/api/run/presets \
-H "Authorization: Bearer $TOKEN"
# Filter by type
curl "https://hub.diffuse.science/api/run/presets?experiment_type=waterflow" \
-H "Authorization: Bearer $TOKEN"
# Type-specific endpoints
curl https://hub.diffuse.science/api/sampleworks/presets \
-H "Authorization: Bearer $TOKEN"
curl https://hub.diffuse.science/api/waterflow/presets \
-H "Authorization: Bearer $TOKEN"
| Preset | Model | Methods | GPUs |
|---|---|---|---|
boltz2-xrd | Boltz2 | X-Ray Diffraction | 2 |
boltz2-md | Boltz2 | Molecular Dynamics | 2 |
protenix | Protenix | — | 2 |
rf3 | RoseTTAFold3 | — | 2 |
# Run default preset (boltz2-xrd) on auto-resolved machine
diffuse run sampleworks
# Run a specific preset
diffuse run sampleworks protenix
# Target a specific VP machine
diffuse run sampleworks boltz2-xrd --machine sampleworks
# Run all presets in sequence
diffuse run sampleworks all
# Wait for job completion
diffuse run sampleworks rf3 --wait
# Provide PDB input
diffuse run sampleworks boltz2-xrd --input-file /path/to/structure.pdb
# Type-specific endpoint
curl -X POST https://hub.diffuse.science/api/sampleworks/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"preset": "boltz2-xrd", "machine": "sampleworks"}'
# Unified endpoint
curl -X POST https://hub.diffuse.science/api/run/submit \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"experiment_type": "sampleworks",
"preset": "boltz2-xrd",
"machine": "sampleworks"
}'
| Preset | Command | Description | GPUs |
|---|---|---|---|
train-gvp | train | Train with GVP encoder (default) | 1 |
train-esm | train | Train with ESM encoder | 1 |
train-slae | train | Train with SLAE encoder | 1 |
inference | inference | Run inference with trained checkpoint | 1 |
generate-esm | generate-esm | Generate ESM3 embeddings for PDB files | 1 |
# Run default preset (train-gvp)
diffuse run waterflow
# Run a specific preset
diffuse run waterflow train-esm
# Target a specific machine
diffuse run waterflow train-gvp --machine vratin-1
# Override training parameters
diffuse run waterflow train-gvp --epochs 100 --batch-size 8 --lr 5e-5
# Custom run name for checkpoints
diffuse run waterflow train-gvp -n my_experiment
# Wait for completion
diffuse run waterflow train-gvp --wait
# Run inference with a trained checkpoint
diffuse run waterflow inference --run-dir /data/checkpoints/my_run
# Generate ESM embeddings
diffuse run waterflow generate-esm --split-file /data/splits/train.txt
# With Weights & Biases logging
diffuse run waterflow train-esm --wandb-key $WANDB_KEY --wandb-project my-project
# Type-specific endpoint
curl -X POST https://hub.diffuse.science/api/waterflow/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"preset": "train-gvp",
"machine": "vratin-1",
"epochs": 100,
"batch_size": 8,
"lr": 5e-5
}'
# Unified endpoint
curl -X POST https://hub.diffuse.science/api/run/submit \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"experiment_type": "waterflow",
"preset": "train-gvp",
"machine": "vratin-1",
"parameters": {
"epochs": 100,
"batch_size": 8,
"lr": 5e-5
}
}'
# Inference with trained checkpoint
curl -X POST https://hub.diffuse.science/api/waterflow/run \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"preset": "inference",
"machine": "vratin-1",
"run_dir": "/data/checkpoints/my_run"
}'
# List all K8s jobs
diffuse run status
# Check a specific job
diffuse run status waterflow-train-gvp
# Target a specific machine
diffuse run status --machine vratin-1
diffuse run status waterflow-train-gvp --machine vratin-1
# Filter by label
diffuse run status -l app=sampleworks
# JSON output
diffuse run status -f json
# List all jobs on a machine
curl "https://hub.diffuse.science/api/k8s/jobs?machine=vratin-1" \
-H "Authorization: Bearer $TOKEN"
# Get a specific job
curl "https://hub.diffuse.science/api/k8s/jobs/waterflow-train-gvp?machine=vratin-1" \
-H "Authorization: Bearer $TOKEN"
# Filter by label
curl "https://hub.diffuse.science/api/k8s/jobs?label_selector=app%3Dsampleworks&machine=sampleworks" \
-H "Authorization: Bearer $TOKEN"
# Fetch logs from a job
diffuse run logs waterflow-train-gvp
# Target a specific machine
diffuse run logs waterflow-train-gvp --machine vratin-1
# Control number of tail lines
diffuse run logs waterflow-train-gvp --tail 500
curl "https://hub.diffuse.science/api/k8s/jobs/waterflow-train-gvp/logs?machine=vratin-1&tail=200" \
-H "Authorization: Bearer $TOKEN"
| Flag | Type | Description | Applies to |
|---|---|---|---|
--wait, -w | bool | Wait for K8s job to complete | All |
--machine, -m | text | VP machine name to target | All |
--epochs | int | Override epoch count (default: 200) | train-* |
--batch-size, -b | int | Override batch size (default: 4) | train-* |
--lr | float | Override learning rate (default: 1e-4) | train-* |
--encoder | text | Override encoder type (gvp, esm, slae) | train-* |
--run-name, -n | text | Custom run name for checkpoints | train-* |
--run-dir | text | Training run directory (required) | inference |
--num-steps | int | Override integration steps (default: 100) | inference |
--split-file | text | Override split file for ESM generation | generate-esm |
--wandb-key | text | W&B API key (omit to disable) | All |
--wandb-project | text | W&B project name | All |