Health Checks
This guide describes using health checks for monitoring and orchestration.
Overview
VibeMQ provides HTTP endpoints for server health checks. These are used by orchestrators (Kubernetes, Docker Swarm) to determine service state.
Enabling Health Checks
using VibeMQ.Health;
var broker = BrokerBuilder.Create()
.UsePort(2925)
.ConfigureHealthChecks(options => {
options.Enabled = true;
options.Port = 2926;
})
.Build();
Configuration Parameters
Parameter |
Default |
Description |
|---|---|---|
|
true |
Enable health check server |
|
2926 |
HTTP port for health checks |
Endpoints
GET /health/
Server health check.
Request:
curl http://localhost:2926/health/
Response (healthy):
{
"is_healthy": true,
"status": "healthy",
"active_connections": 15,
"queue_count": 5,
"in_flight_messages": 42,
"total_messages_published": 125000,
"total_messages_delivered": 124850,
"memory_usage_mb": 256,
"timestamp": "2026-02-18T10:30:00Z"
}
Status code: 200 OK
Response (unhealthy):
{
"is_healthy": false,
"status": "unhealthy",
"active_connections": 0,
"queue_count": 0,
"in_flight_messages": 0,
"total_messages_published": 0,
"total_messages_delivered": 0,
"memory_usage_mb": 512,
"timestamp": "2026-02-18T10:30:00Z"
}
Status code: 503 Service Unavailable
Health criteria:
healthy— memory usage < 90%unhealthy— memory usage >= 90%
GET /metrics/
Get detailed metrics.
Request:
curl http://localhost:2926/metrics/
Response:
{
"total_messages_published": 125000,
"total_messages_delivered": 124850,
"total_messages_acknowledged": 124800,
"total_retries": 150,
"total_dead_lettered": 50,
"total_errors": 5,
"total_connections_accepted": 500,
"total_connections_rejected": 10,
"active_connections": 15,
"active_queues": 5,
"in_flight_messages": 42,
"memory_usage_bytes": 268435456,
"average_delivery_latency_ms": 2.5,
"timestamp": "2026-02-18T10:30:00Z",
"uptime": "02:15:30.5000000"
}
Using with Orchestrators
Kubernetes
Deployment with health checks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vibemq
spec:
replicas: 3
selector:
matchLabels:
app: vibemq
template:
metadata:
labels:
app: vibemq
spec:
containers:
- name: vibemq
image: vibemq-server:latest
ports:
- containerPort: 2925
name: tcp
- containerPort: 2926
name: http
livenessProbe:
httpGet:
path: /health/
port: 2926
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/
port: 2926
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
Probe descriptions:
livenessProbe — determines if container should be restarted
readinessProbe — determines if container is ready to accept traffic
Parameters:
initialDelaySeconds— delay before first checkperiodSeconds— interval between checkstimeoutSeconds— check timeoutfailureThreshold— number of failed attempts
Docker Compose
version: '3.8'
services:
vibemq:
image: vibemq-server:latest
ports:
- "2925:2925"
- "2926:2926"
environment:
- VIBEMQ__PORT=2925
- VIBEMQ__AUTHTOKEN=my-secret-token
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:2926/health/"]
interval: 10s
timeout: 5s
retries: 3
start_period: 10s
restart: unless-stopped
Azure Container Instances
apiVersion: 2019-12-01
kind: ContainerGroup
metadata:
name: vibemq
spec:
containers:
- name: vibemq
image: vibemq-server:latest
ports:
- port: 2925
protocol: TCP
- port: 2926
protocol: TCP
livenessProbe:
httpGet:
path: /health/
port: 2926
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/
port: 2926
initialDelaySeconds: 5
periodSeconds: 5
AWS ECS
Task definition:
{
"family": "vibemq",
"containerDefinitions": [
{
"name": "vibemq",
"image": "vibemq-server:latest",
"portMappings": [
{
"containerPort": 2925,
"hostPort": 2925,
"protocol": "tcp"
},
{
"containerPort": 2926,
"hostPort": 2926,
"protocol": "tcp"
}
],
"healthCheck": {
"command": [
"CMD-SHELL",
"curl -f http://localhost:2926/health/ || exit 1"
],
"interval": 10,
"timeout": 5,
"retries": 3,
"startPeriod": 10
}
}
]
}
Monitoring Health Checks
Manual Check
# Health check
curl http://localhost:2926/health/
# Detailed metrics
curl http://localhost:2926/metrics/
# With formatting
curl -s http://localhost:2926/metrics/ | jq
PowerShell:
# Health check
Invoke-RestMethod -Uri http://localhost:2926/health/
# Check with error handling
try {
$response = Invoke-RestMethod -Uri http://localhost:2926/health/
if ($response.status -eq "healthy") {
Write-Host "✓ Server healthy" -ForegroundColor Green
} else {
Write-Host "✗ Server unhealthy" -ForegroundColor Red
}
} catch {
Write-Host "✗ Connection error: $_" -ForegroundColor Red
}
Automatic Monitoring
Bash script:
#!/bin/bash
HEALTH_URL="http://localhost:2926/health/"
METRICS_URL="http://localhost:2926/metrics/"
# Health check
response=$(curl -s -w "\n%{http_code}" $HEALTH_URL)
body=$(echo "$response" | head -n -1)
status_code=$(echo "$response" | tail -n 1)
if [ "$status_code" -eq 200 ]; then
echo "✓ VibeMQ healthy (HTTP $status_code)"
echo "$body" | jq .
elif [ "$status_code" -eq 503 ]; then
echo "✗ VibeMQ unhealthy (HTTP $status_code)"
echo "$body" | jq .
exit 1
else
echo "✗ Connection error (HTTP $status_code)"
exit 1
fi
# Metrics check
metrics=$(curl -s $METRICS_URL)
echo "Metrics:"
echo "$metrics" | jq .
Python script:
import requests
import sys
import time
HEALTH_URL = "http://localhost:2926/health/"
METRICS_URL = "http://localhost:2926/metrics/"
def check_health():
try:
response = requests.get(HEALTH_URL, timeout=5)
if response.status_code == 200:
data = response.json()
print(f"✓ VibeMQ healthy")
print(f" Status: {data['status']}")
print(f" Connections: {data['active_connections']}")
print(f" Queues: {data['queue_count']}")
print(f" Memory: {data['memory_usage_mb']} MB")
return True
elif response.status_code == 503:
print(f"✗ VibeMQ unhealthy")
return False
except requests.exceptions.RequestException as e:
print(f"✗ Connection error: {e}")
return False
return False
def get_metrics():
try:
response = requests.get(METRICS_URL, timeout=5)
if response.status_code == 200:
data = response.json()
print("\nMetrics:")
print(f" Published: {data['total_messages_published']}")
print(f" Delivered: {data['total_messages_delivered']}")
print(f" Acknowledged: {data['total_messages_acknowledged']}")
print(f" Errors: {data['total_errors']}")
print(f" Latency: {data['average_delivery_latency_ms']:.2f} ms")
except Exception as e:
print(f" Error getting metrics: {e}")
if __name__ == "__main__":
if check_health():
get_metrics()
sys.exit(0)
else:
sys.exit(1)
Integration with Monitoring Systems
Prometheus
prometheus.yml:
scrape_configs:
- job_name: 'vibemq'
static_configs:
- targets: ['vibemq:2926']
metrics_path: '/metrics/'
scrape_interval: 15s
scrape_timeout: 10s
Grafana
Import dashboard for visualizing VibeMQ metrics.
Main panels:
Health status (health check)
Active connections
Queue count
Memory usage
Message throughput
Delivery latency
Datadog
Agent configuration:
instances:
- vibemq_url: http://vibemq:2926/metrics/
tags:
- "service:vibemq"
- "env:production"
New Relic
Use Prometheus endpoint for integration:
integrations:
- name: prometheus
metric_types:
- vibemq_messages_published_total
- vibemq_messages_delivered_total
- vibemq_active_connections
urls:
- http://vibemq:2926/metrics/
Troubleshooting
Health check not responding
Problem: curl: (7) Failed to connect to localhost port 2926
Causes:
Health check disabled
Wrong port
Firewall blocking
Solution:
.ConfigureHealthChecks(options => {
options.Enabled = true;
options.Port = 2926; // Check port
})
Returns 503
Problem: Health check returns 503 Service Unavailable
Cause: Critical memory usage (>90%)
Solution:
Increase memory limit
Reduce queue sizes
Optimize memory usage
.ConfigureQueues(options => {
options.MaxQueueSize = 5000; // Reduce size
})
Health check timeout
Problem: curl: (28) Operation timed out
Causes:
Server overloaded
Wrong timeout
Solution: Increase timeout in orchestrator configuration:
livenessProbe:
timeoutSeconds: 10 # Increase timeout
Next Steps
Monitoring — monitoring and metrics
Troubleshooting — troubleshooting
Configuration — configuration