Skip to content

Observability

How the Smartsapp backend is instrumented for tracing, metrics, and logging.


Architecture Overview

┌─────────────────────────────────────────────┐
│              Spring Boot App                │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐ │
│  │ Micrometer│  │ SLF4J +  │  │  Spring   │ │
│  │ Tracing   │  │ Logback  │  │ Actuator  │ │
│  │ (OTel     │  │ (JSON)   │  │ /metrics  │ │
│  │  bridge)  │  │          │  │ /health   │ │
│  └─────┬─────┘  └─────┬────┘  └─────┬─────┘ │
│        │              │              │       │
└────────┼──────────────┼──────────────┼───────┘
         │              │              │
         ▼              ▼              ▼
   OTLP Exporter   Log Collector   Prometheus
   (Tempo/Jaeger)  (Loki/ELK)     Scraper

Components

1. Distributed Tracing

Stack: Micrometer Tracing + OpenTelemetry Bridge

Dependencies: - io.micrometer:micrometer-tracing-bridge-otel — bridges Spring's Micrometer API to OTel - io.opentelemetry:opentelemetry-exporter-otlp — exports spans via OTLP protocol

What's auto-instrumented: - Incoming HTTP requests (Spring MVC) - Outgoing HTTP requests (RestTemplate, WebClient) - JDBC queries - Kafka producer/consumer - Spring Data JPA repositories - Thread context propagation (virtual threads currently disabled — Kafka client 3.8 pins carrier threads; re-enable after upgrading to Spring Kafka 4.0+)

Configuration (application.yml):

management.tracing:
  sampling:
    probability: 1.0    # Sample 100% of requests (reduce in production)
  enabled: true

Connecting to a trace backend:

To send traces to Tempo, Jaeger, or any OTLP-compatible backend:

management:
  otlp:
    tracing:
      endpoint: http://tempo.smartsapp-dev.svc.cluster.local:4318/v1/traces

Trace context in logs:

The OTel bridge automatically injects traceId and spanId into SLF4J MDC. In JSON log output, these appear as top-level fields, enabling log-to-trace correlation.

Trace ID in HTTP responses:

The TraceIdFilter exposes the OTel trace ID as the X-Trace-Id response header on every HTTP response. Clients can use this to correlate a request with backend logs, spans, and downstream Kafka processing — all share the same trace ID.


2. Structured Logging

Stack: SLF4J + Logback + logstash-logback-encoder

Config: backend/src/main/resources/logback-spring.xml

See LOGGING.md for the full logging strategy.

JSON output fields (non-dev profiles):

{
  "@timestamp": "2026-03-20T01:08:17.738Z",
  "@version": "1",
  "message": "School created: id=abc-123",
  "logger_name": "c.s.s.m.c.school.services.SchoolService",
  "level": "INFO",
  "thread_name": "http-nio-8080-exec-1",
  "requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "userId": "5237",
  "schoolId": "6",
  "userName": "[email protected]",
  "traceId": "64f2b4a8e3d1...",
  "spanId": "a1b2c3d4..."
}

Connecting to a log backend:

Logs go to stdout. In Kubernetes, a log collector (Loki, Fluentd, Filebeat) picks them up from container stdout automatically. No application-level config needed — the JSON format is parser-friendly.


3. Metrics & Health

Stack: Spring Boot Actuator + Micrometer

Exposed endpoints:

Endpoint Description
GET /actuator/health Application health status
GET /actuator/info Application info (git, build)
GET /actuator/metrics List of available metric names
GET /actuator/metrics/{name} Specific metric value
GET /actuator/prometheus Prometheus-format metrics scrape endpoint

Configuration (application.yml):

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    health:
      show-details: when-authorized
  metrics:
    tags:
      application: smartsapp-system

Connecting to Prometheus:

Add a scrape target in your Prometheus config:

scrape_configs:
  - job_name: smartsapp-system
    metrics_path: /actuator/prometheus
    static_configs:
      - targets: ['system-app:8080']

Or use Kubernetes ServiceMonitor if running the Prometheus Operator.

Key metrics available out of the box: - http.server.requests — request count, duration, status codes - jvm.memory.used — JVM heap/non-heap usage - jvm.threads.live — active thread count - hikaricp.connections.active — DB connection pool - kafka.producer.record.send.total — Kafka publish count - system.cpu.usage — CPU utilisation


Infrastructure Stubs

Pre-created config files (currently empty, ready to populate):

File Purpose
infrastructure/core_system_dependencies/system_primitives/observability/prometheus.yml Prometheus scrape config
infrastructure/core_system_dependencies/system_primitives/observability/grafana.yml Grafana datasources + dashboards
infrastructure/core_system_dependencies/system_primitives/observability/loki.yml Loki log aggregation config
infrastructure/core_system_dependencies/system_primitives/observability/tempo.yml Tempo trace backend config
infrastructure/core_system_dependencies/system_primitives/observability/opentelemetry.yml OTel Collector config (if using a collector)
infrastructure/operational_utilities/error_tracking/sentry.yml Sentry error tracking config

Environment-Specific Behaviour

Environment Traces Logs Metrics
Local dev Generated (no export) Console, human-readable /actuator/metrics
Dev cluster Export to Tempo (when configured) JSON to stdout, collected by Loki Scraped by Prometheus
Production Export to Tempo, sample at 10% JSON to stdout, collected by Loki Scraped by Prometheus

To reduce production trace volume, set:

management.tracing.sampling.probability: 0.1


Auth-Free Endpoints

The following paths are excluded from JWT authentication (configured in JwtAuthFilter.shouldNotFilter()):

  • /actuator/**
  • /swagger-ui/**
  • /swagger-ui.html
  • /v3/api-docs/**