Zero-Downtime Deployment for Node.js
Deploy updates without interrupting service using PM2 reload, graceful shutdown, and rolling restarts.
What is Zero-Downtime Deployment?
Zero-downtime deployment means updating your application without any service interruption. Users can continue using your service while the update is happening in the background. This is critical for production applications where downtime directly impacts revenue and user satisfaction.
Key benefit: Deploy fixes, features, and updates to production without scheduling maintenance windows or apologizing to users.
PM2 Reload vs Restart
PM2 Restart (Hard)
Stops all processes immediately and starts new ones. Active connections are dropped.
pm2 restart app.js
Results in brief downtime and lost connections. Not recommended for production.
PM2 Reload (Graceful)
Starts new processes while old ones handle existing requests, then shuts down gracefully.
pm2 reload app.js
Zero downtime. New requests go to new processes while old requests finish cleanly.
Cluster Mode is Required
Zero-downtime deployment with PM2 reload requires cluster mode with at least 2 instances. Here's why:
- •With 2+ instances, PM2 restarts one while the other continues serving requests
- •Load balancer automatically routes traffic to healthy instances
- •Each instance restarts sequentially, never all at once
# Start app in cluster mode with 4 instances pm2 start app.js -i 4 --name "myapp" # Or use max CPUs available pm2 start app.js -i max --name "myapp" # List instances pm2 list
Graceful Shutdown Handling
1. Handle SIGINT and SIGTERM Signals
When PM2 initiates a reload, it sends SIGTERM to the old process. Your app must listen for these signals and gracefully shut down.
const http = require('http');
const app = require('./app');
let server = http.createServer(app);
server.listen(3000);
// Handle graceful shutdown
process.on('SIGTERM', () => {
console.log('SIGTERM received: starting graceful shutdown');
// Stop accepting new connections
server.close(() => {
console.log('HTTP server closed');
process.exit(0);
});
// Force exit after 30 seconds
setTimeout(() => {
console.error('Could not close connections in time');
process.exit(1);
}, 30000);
});
process.on('SIGINT', () => {
console.log('SIGINT received: starting graceful shutdown');
server.close(() => process.exit(0));
});2. Configure Graceful Shutdown in PM2
Set proper timeouts in PM2 ecosystem config file to allow graceful shutdown.
// ecosystem.config.js
module.exports = {
apps: [{
name: 'myapp',
script: './app.js',
instances: 4,
exec_mode: 'cluster',
// Graceful shutdown settings
kill_timeout: 30000, // Wait 30s for graceful shutdown
wait_ready: true, // Wait for app to signal readiness
listen_timeout: 3000, // Timeout for app to start listening
max_memory_restart: '500M',
// Environment
env: {
NODE_ENV: 'production'
}
}]
};
// Start with ecosystem config
// pm2 start ecosystem.config.js3. Signal Ready Status to PM2
Tell PM2 when your app is ready to receive traffic with wait_ready enabled.
const http = require('http');
const app = require('./app');
let server = http.createServer(app);
server.listen(3000, () => {
console.log('Server listening on port 3000');
// Signal PM2 that app is ready
if (process.send) {
process.send('ready');
}
});Deployment Script
Create a simple bash script to automate the zero-downtime deployment process.
#!/bin/bash
set -e
APP_NAME="myapp"
echo "Starting zero-downtime deployment..."
# 1. Pull latest code
echo "Pulling latest code..."
git pull origin main
# 2. Install dependencies
echo "Installing dependencies..."
npm ci --production
# 3. Run tests (optional)
echo "Running tests..."
npm test
# 4. Gracefully reload with PM2
echo "Reloading application..."
pm2 reload $APP_NAME
# 5. Wait for new instances to be ready
sleep 5
# 6. Verify health
echo "Verifying application health..."
curl -f http://localhost:3000/health || { echo "Health check failed"; exit 1; }
echo "Zero-downtime deployment completed successfully!"Save as deploy.sh and run: chmod +x deploy.sh && ./deploy.sh
Health Check Endpoint
Implement a health check endpoint that load balancers and deployment scripts can verify.
app.get('/health', (req, res) => {
// Check database connection
const dbHealthy = checkDatabaseConnection();
// Check Redis connection
const cacheHealthy = checkCacheConnection();
const healthy = dbHealthy && cacheHealthy;
const statusCode = healthy ? 200 : 503;
res.status(statusCode).json({
status: healthy ? 'ok' : 'unhealthy',
uptime: process.uptime(),
timestamp: new Date().toISOString(),
checks: {
database: dbHealthy,
cache: cacheHealthy
}
});
});
// Deep health check for detailed diagnostics
app.get('/health/deep', async (req, res) => {
const checks = {
database: await checkDatabaseLatency(),
cache: await checkCacheLatency(),
disk: checkDiskSpace(),
memory: checkMemoryUsage()
};
const healthy = Object.values(checks).every(c => c.ok);
res.status(healthy ? 200 : 503).json({
status: healthy ? 'ok' : 'unhealthy',
checks
});
});Testing Zero-Downtime Deployment
1. Monitor Requests During Reload
# Terminal 1: Watch PM2 logs
pm2 logs myapp
# Terminal 2: Send continuous requests
while true; do
curl http://localhost:3000/api/data -w "Status: %{http_code}
"
sleep 1
done
# Terminal 3: Trigger reload
pm2 reload myappYou should see no failed requests (HTTP 5xx errors) during the reload operation.
2. Verify Graceful Shutdown
# Start app in cluster mode pm2 start app.js -i 2 # Check active connections in another terminal watch 'lsof -i :3000' # Trigger reload and watch connections close gracefully pm2 reload app.js
Connections should migrate from old processes to new ones, then old processes should exit.
3. Load Test During Deployment
# Use Apache Bench for load testing ab -n 10000 -c 100 http://localhost:3000/api/data # Or use wrk for more realistic load testing wrk -t12 -c400 -d30s http://localhost:3000/api/data # During test, trigger reload in another terminal pm2 reload myapp # Results should show 0% error rate throughout
Common Pitfalls to Avoid
Not handling SIGTERM signals
Without signal handlers, old processes are forcefully killed, causing connection drops. Always implement graceful shutdown.
Running with only 1 instance
PM2 reload requires multiple instances. With 1 instance, reload still causes downtime. Use at least 2 instances.
Leaving active connections hanging
Without a kill_timeout, old processes might hang forever. Set a reasonable timeout (20-30 seconds) to force exit if needed.
Not draining database connections
When shutting down, close database connections properly to avoid "too many connections" errors on restart.
Ignoring long-running requests
Long requests (file uploads, batch processing) may timeout during graceful shutdown. Set appropriate timeouts for your use case.
How DeployWise Handles Zero-Downtime Deploys
DeployWise automates the entire zero-downtime deployment process. When you push to GitHub, our platform automatically:
- ✓Pulls your latest code and installs dependencies
- ✓Runs your test suite to catch issues early
- ✓Triggers PM2 reload with proper graceful shutdown
- ✓Verifies health checks on all instances
- ✓Rolls back automatically if anything fails
Related Guides
Deploy updates with zero downtime
Let DeployWise handle PM2 configuration, graceful shutdowns, and health checks automatically.