Observability¶
How the Smartsapp backend is instrumented for tracing, metrics, and logging.
Architecture Overview¶
┌─────────────────────────────────────────────┐
│ Spring Boot App │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Micrometer│ │ SLF4J + │ │ Spring │ │
│ │ Tracing │ │ Logback │ │ Actuator │ │
│ │ (OTel │ │ (JSON) │ │ /metrics │ │
│ │ bridge) │ │ │ │ /health │ │
│ └─────┬─────┘ └─────┬────┘ └─────┬─────┘ │
│ │ │ │ │
└────────┼──────────────┼──────────────┼───────┘
│ │ │
▼ ▼ ▼
OTLP Exporter Log Collector Prometheus
(Tempo/Jaeger) (Loki/ELK) Scraper
Components¶
1. Distributed Tracing¶
Stack: Micrometer Tracing + OpenTelemetry Bridge
Dependencies:
- io.micrometer:micrometer-tracing-bridge-otel — bridges Spring's Micrometer API to OTel
- io.opentelemetry:opentelemetry-exporter-otlp — exports spans via OTLP protocol
What's auto-instrumented: - Incoming HTTP requests (Spring MVC) - Outgoing HTTP requests (RestTemplate, WebClient) - JDBC queries - Kafka producer/consumer - Spring Data JPA repositories - Thread context propagation (virtual threads currently disabled — Kafka client 3.8 pins carrier threads; re-enable after upgrading to Spring Kafka 4.0+)
Configuration (application.yml):
management.tracing:
sampling:
probability: 1.0 # Sample 100% of requests (reduce in production)
enabled: true
Connecting to a trace backend:
To send traces to Tempo, Jaeger, or any OTLP-compatible backend:
management:
otlp:
tracing:
endpoint: http://tempo.smartsapp-dev.svc.cluster.local:4318/v1/traces
Trace context in logs:
The OTel bridge automatically injects traceId and spanId into SLF4J MDC. In JSON log output, these appear as top-level fields, enabling log-to-trace correlation.
Trace ID in HTTP responses:
The TraceIdFilter exposes the OTel trace ID as the X-Trace-Id response header on every HTTP response. Clients can use this to correlate a request with backend logs, spans, and downstream Kafka processing — all share the same trace ID.
2. Structured Logging¶
Stack: SLF4J + Logback + logstash-logback-encoder
Config: backend/src/main/resources/logback-spring.xml
See LOGGING.md for the full logging strategy.
JSON output fields (non-dev profiles):
{
"@timestamp": "2026-03-20T01:08:17.738Z",
"@version": "1",
"message": "School created: id=abc-123",
"logger_name": "c.s.s.m.c.school.services.SchoolService",
"level": "INFO",
"thread_name": "http-nio-8080-exec-1",
"requestId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"userId": "5237",
"schoolId": "6",
"userName": "[email protected]",
"traceId": "64f2b4a8e3d1...",
"spanId": "a1b2c3d4..."
}
Connecting to a log backend:
Logs go to stdout. In Kubernetes, a log collector (Loki, Fluentd, Filebeat) picks them up from container stdout automatically. No application-level config needed — the JSON format is parser-friendly.
3. Metrics & Health¶
Stack: Spring Boot Actuator + Micrometer
Exposed endpoints:
| Endpoint | Description |
|---|---|
GET /actuator/health |
Application health status |
GET /actuator/info |
Application info (git, build) |
GET /actuator/metrics |
List of available metric names |
GET /actuator/metrics/{name} |
Specific metric value |
GET /actuator/prometheus |
Prometheus-format metrics scrape endpoint |
Configuration (application.yml):
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: when-authorized
metrics:
tags:
application: smartsapp-system
Connecting to Prometheus:
Add a scrape target in your Prometheus config:
scrape_configs:
- job_name: smartsapp-system
metrics_path: /actuator/prometheus
static_configs:
- targets: ['system-app:8080']
Or use Kubernetes ServiceMonitor if running the Prometheus Operator.
Key metrics available out of the box:
- http.server.requests — request count, duration, status codes
- jvm.memory.used — JVM heap/non-heap usage
- jvm.threads.live — active thread count
- hikaricp.connections.active — DB connection pool
- kafka.producer.record.send.total — Kafka publish count
- system.cpu.usage — CPU utilisation
Infrastructure Stubs¶
Pre-created config files (currently empty, ready to populate):
| File | Purpose |
|---|---|
infrastructure/core_system_dependencies/system_primitives/observability/prometheus.yml |
Prometheus scrape config |
infrastructure/core_system_dependencies/system_primitives/observability/grafana.yml |
Grafana datasources + dashboards |
infrastructure/core_system_dependencies/system_primitives/observability/loki.yml |
Loki log aggregation config |
infrastructure/core_system_dependencies/system_primitives/observability/tempo.yml |
Tempo trace backend config |
infrastructure/core_system_dependencies/system_primitives/observability/opentelemetry.yml |
OTel Collector config (if using a collector) |
infrastructure/operational_utilities/error_tracking/sentry.yml |
Sentry error tracking config |
Environment-Specific Behaviour¶
| Environment | Traces | Logs | Metrics |
|---|---|---|---|
| Local dev | Generated (no export) | Console, human-readable | /actuator/metrics |
| Dev cluster | Export to Tempo (when configured) | JSON to stdout, collected by Loki | Scraped by Prometheus |
| Production | Export to Tempo, sample at 10% | JSON to stdout, collected by Loki | Scraped by Prometheus |
To reduce production trace volume, set:
management.tracing.sampling.probability: 0.1
Auth-Free Endpoints¶
The following paths are excluded from JWT authentication (configured in JwtAuthFilter.shouldNotFilter()):
/actuator/**/swagger-ui/**/swagger-ui.html/v3/api-docs/**