Skip to main content

Monitoring

Keep your ClawBook VPS running smoothly with built-in monitoring and optional external tools.

Built-in Monitoring

Health Check Command

Quick overview of system health:

clawbook-health

# Output:
# ClawBook Health Check
# =====================
# ✓ OpenClaw service: running
# ✓ PostgreSQL: running
# ✓ Caddy: running
# ✓ Disk: 45 GB / 80 GB (56%)
# ✓ Memory: 2.1 GB / 4 GB (52%)
# ✓ CPU: 15% average (2 cores)
# ✓ SSL: Valid (expires in 67 days)
#
# Status: HEALTHY

Dashboard Monitoring

View metrics in the web dashboard:

  1. Log into your dashboard
  2. Go to Home / Status
  3. View real-time gauges for:
    • CPU usage
    • Memory usage
    • Disk usage
    • Network I/O
    • Active connections

Service Status

Check individual services:

# All services
systemctl status openclaw postgresql caddy

# Single service
systemctl status openclaw

# Service logs
journalctl -u openclaw -f

Resource Monitoring

CPU

# Current load
top -bn1 | head -5

# CPU info
nproc # Number of cores
lscpu # Detailed CPU info

# Historical usage (if installed)
sar -u 1 5 # 5 samples, 1 second apart

Memory

# Current usage
free -h

# Detailed memory info
cat /proc/meminfo | head -10

# Memory usage by process
ps aux --sort=-%mem | head -10

Disk

# Disk usage
df -h

# Directory sizes
du -sh /var/log/*
du -sh /var/lib/postgresql/*

# Disk I/O
iostat -x 1 5

Network

# Current connections
ss -tulpn

# Bandwidth usage
vnstat -l # Live monitoring

# Network statistics
netstat -i

Application Metrics

Message Statistics

View in dashboard: HomeQuick Stats

Or via CLI:

clawbook-stats messages

# Output:
# Messages (Last 24h)
# ==================
# WhatsApp: 145
# Telegram: 89
# Discord: 23
# Total: 257
#
# Messages (Last 7d): 1,523
# Messages (Last 30d): 5,891

Token Usage

clawbook-stats tokens

# Output:
# Token Usage (Current Month)
# ===========================
# Provider: Anthropic
# Input tokens: 125,000
# Output tokens: 89,000
# Total cost: $2.14

Response Times

clawbook-stats performance

# Output:
# Response Times (Last 24h)
# =========================
# Average: 1.2s
# P50: 0.9s
# P95: 2.5s
# P99: 4.1s

Log Monitoring

Log Locations

LogLocationPurpose
Application/var/log/openclaw/app.logMain application logs
Access/var/log/openclaw/access.logHTTP requests
Error/var/log/openclaw/error.logErrors only
Backup/var/log/openclaw/backup.logBackup operations
System/var/log/syslogSystem events

Real-time Log Viewing

# Application logs
tail -f /var/log/openclaw/app.log

# All ClawBook logs
tail -f /var/log/openclaw/*.log

# Filtered errors
tail -f /var/log/openclaw/app.log | grep ERROR

Log Rotation

Logs are automatically rotated:

cat /etc/logrotate.d/openclaw

Default settings:

  • Rotate daily
  • Keep 7 days
  • Compress after 1 day

Alerting

Built-in Alerts

Configure in dashboard: SettingsNotifications

AlertThresholdAction
High CPU> 90% for 5 minEmail
High Memory> 90%Email
Disk Full> 85%Email
Service DownAny serviceEmail
SSL Expiring< 14 daysEmail
Backup FailedAny failureEmail

Alert Configuration

# /etc/openclaw/config.yaml
alerts:
email:
enabled: true
recipients:
- admin@example.com

thresholds:
cpu_percent: 90
memory_percent: 90
disk_percent: 85
ssl_days: 14

cooldown_minutes: 30 # Don't repeat alerts

Slack Notifications

alerts:
slack:
enabled: true
webhook_url: https://hooks.slack.com/services/xxx/yyy/zzz
channel: "#alerts"

External Monitoring

UptimeRobot (Free)

  1. Sign up at uptimerobot.com
  2. Add new monitor:
    • Type: HTTPS
    • URL: https://yourdomain.com
    • Interval: 5 minutes
  3. Configure alert contacts

Datadog

Professional monitoring with detailed metrics:

# Install agent
DD_API_KEY=your_api_key bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

# Configure
nano /etc/datadog-agent/datadog.yaml

Prometheus + Grafana

For self-hosted monitoring:

# /etc/openclaw/config.yaml
metrics:
prometheus:
enabled: true
port: 9090
path: /metrics

Access metrics at http://localhost:9090/metrics

Status Page

Public Status Page

Create a status page for users:

  1. Use Statuspage.io (paid)
  2. Or Cachet (self-hosted, free)
  3. Or Upptime (GitHub-based, free)

Integration

Automatically update status page:

# /etc/openclaw/config.yaml
status_page:
provider: statuspage.io
page_id: your_page_id
api_key: your_api_key
component_id: your_component_id

Monitoring Best Practices

  1. Check daily - Review dashboard or health command
  2. Set up alerts - Don't wait for problems
  3. Monitor externally - Catch issues you can't see internally
  4. Review logs weekly - Look for patterns
  5. Track trends - Storage growth, message volume
  6. Plan capacity - Upgrade before hitting limits

Troubleshooting High Resource Usage

High CPU

# Find culprit
top -c

# Common causes:
# - High message volume
# - Inefficient AI responses
# - Background processes

# Solution:
# - Upgrade plan
# - Rate limit users
# - Optimize settings

High Memory

# Find memory hogs
ps aux --sort=-%mem | head -10

# Common causes:
# - Large conversation contexts
# - Memory leaks
# - PostgreSQL buffers

# Solution:
# - Reduce context window
# - Restart services periodically
# - Upgrade plan

Disk Full

# Find large directories
du -sh /* | sort -rh | head -10

# Clean up
apt autoremove -y
journalctl --vacuum-time=7d
clawbook-cleanup logs --older-than 30d

Next Steps