Intermediate
30 mins

Monitoring

Learn how to implement comprehensive monitoring for your applications. This guide covers metrics collection, alerting, visualization, and best practices.

Prerequisites

  • Basic understanding of metrics and monitoring
  • Familiarity with logging concepts
  • Experience with data visualization
  • Knowledge of alerting systems

Monitoring Overview

Monitoring Workflow

Visual representation of the monitoring workflow and components.

1

Configure Metrics Collection

Set up metrics collection for your application:

// Initialize metrics collector
const metrics = {
  async collect() {
    return {
      system: await this.collectSystemMetrics(),
      application: await this.collectAppMetrics(),
      business: await this.collectBusinessMetrics()
    };
  },

  async collectSystemMetrics() {
    return {
      cpu: await getCPUUsage(),
      memory: await getMemoryUsage(),
      disk: await getDiskUsage(),
      network: await getNetworkMetrics()
    };
  },

  async collectAppMetrics() {
    return {
      requestRate: await getRequestRate(),
      responseTime: await getResponseTime(),
      errorRate: await getErrorRate(),
      activeUsers: await getActiveUsers()
    };
  }
};

// Implement metrics middleware
app.use(async (req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.record('response_time', duration);
    metrics.record('status_code', res.statusCode);
  });
  
  next();
});
2

Set Up Alerting

Configure alerts for important metrics:

// Alert configuration
const alerts = {
  thresholds: {
    cpu: {
      warning: 70,
      critical: 90,
      duration: '5m'
    },
    memory: {
      warning: 80,
      critical: 95,
      duration: '5m'
    },
    errorRate: {
      warning: 1,
      critical: 5,
      duration: '1m'
    }
  },

  async check() {
    const metrics = await this.getCurrentMetrics();
    
    for (const [metric, value] of Object.entries(metrics)) {
      const threshold = this.thresholds[metric];
      
      if (value >= threshold.critical) {
        await this.sendAlert('critical', metric, value);
      } else if (value >= threshold.warning) {
        await this.sendAlert('warning', metric, value);
      }
    }
  },

  async sendAlert(level, metric, value) {
    const alert = {
      level,
      metric,
      value,
      timestamp: new Date(),
      message: `${metric} is at ${value}% (${level})`
    };
    
    // Send to notification channels
    await Promise.all([
      this.sendEmail(alert),
      this.sendSlack(alert),
      this.sendPagerDuty(alert)
    ]);
  }
};
3

Dashboard Setup

Create monitoring dashboards:

// Dashboard configuration
const dashboard = {
  panels: [
    {
      title: 'System Health',
      metrics: ['cpu', 'memory', 'disk'],
      type: 'line',
      timeRange: '24h'
    },
    {
      title: 'Application Performance',
      metrics: ['response_time', 'error_rate'],
      type: 'line',
      timeRange: '24h'
    },
    {
      title: 'Business Metrics',
      metrics: ['active_users', 'transactions'],
      type: 'line',
      timeRange: '24h'
    }
  ],

  async render() {
    const data = await Promise.all(
      this.panels.map(async panel => ({
        ...panel,
        data: await this.getMetricsData(panel)
      }))
    );
    
    return this.generateHTML(data);
  },

  async getMetricsData(panel) {
    const end = new Date();
    const start = new Date(end - this.parseTimeRange(panel.timeRange));
    
    return await db.metrics
      .where('name', 'in', panel.metrics)
      .whereBetween('timestamp', [start, end])
      .orderBy('timestamp', 'asc');
  }
};
4

Log Management

Implement comprehensive logging:

// Logging configuration
const logger = {
  levels: {
    debug: 0,
    info: 1,
    warn: 2,
    error: 3
  },

  transports: [
    {
      type: 'console',
      level: 'debug'
    },
    {
      type: 'file',
      level: 'info',
      filename: 'app.log'
    },
    {
      type: 'elasticsearch',
      level: 'info',
      index: 'app-logs'
    }
  ],

  async log(level, message, meta = {}) {
    const entry = {
      timestamp: new Date(),
      level,
      message,
      ...meta
    };
    
    await Promise.all(
      this.transports
        .filter(t => this.levels[level] >= this.levels[t.level])
        .map(t => this.writeToTransport(t, entry))
    );
  },

  async writeToTransport(transport, entry) {
    switch (transport.type) {
      case 'console':
        console[entry.level](entry);
        break;
      case 'file':
        await fs.appendFile(transport.filename, JSON.stringify(entry) + '\n');
        break;
      case 'elasticsearch':
        await elastic.index({
          index: transport.index,
          body: entry
        });
        break;
    }
  }
};

Best Practices

Metrics Collection

Best practices for collecting metrics:

  • Define key metrics
  • Use appropriate intervals
  • Implement aggregation
  • Monitor trends

Alerting

Effective alert management:

  • Set meaningful thresholds
  • Avoid alert fatigue
  • Define escalation paths
  • Document procedures

Visualization

Dashboard best practices:

  • Clear data presentation
  • Relevant timeframes
  • Custom views
  • Real-time updates

Common Issues

Metrics Collection

Common collection issues:

  • High cardinality
  • Data gaps
  • Collection delays
  • Resource overhead

Alert Management

Alert-related challenges:

  • False positives
  • Missing alerts
  • Alert storms
  • Poor signal-to-noise

Next Steps

Now that you understand monitoring, explore these related topics: