Skip to content

Debugging Guide

How to trace customer issues through the system. You never need to ask the customer for technical IDs — start from who they are and when it happened.


Step 1: Identify the user

Get the customer's name, email, or school from the support ticket. Look up their userId and schoolId:

-- By email (staff/admin)
SELECT user_id, school_id, user_name FROM users WHERE email = '[email protected]';

-- By student name (parent reported issue)
SELECT s.id AS student_id, g.user_id AS parent_user_id, g.school_id
FROM students s
JOIN guardians g ON g.family_id = s.family_id
WHERE s.first_name = 'Kwame' AND s.last_name = 'Mensah';

Step 2: Search logs by user + time

Every authenticated request logs userId and schoolId in structured JSON. Use your log aggregator (Loki, ELK, CloudWatch) to search:

userId="5237" AND level="WARN" OR level="ERROR"

Narrow by time window based on when the customer says the issue occurred. Look for:

  • WARN — business rule violations (deadline passed, already cancelled, duplicate check-in)
  • ERROR — unexpected failures (DB connection, null pointer, serialization)

Each log line also contains requestId and traceId — note these for the next steps.

Step 3: Follow a specific request

Once you find a suspicious log line, use its requestId to see everything that happened in that request:

requestId="a1b2c3d4-e5f6-7890-abcd-ef1234567890"

This shows the full request lifecycle: auth, service calls, DB queries, Kafka events, and the response.

Step 4: View the distributed trace

Use the traceId from the log line in your tracing UI (Grafana Tempo, Jaeger):

traceId: 64f2b4a8e3d1c9f0a5b7d2e4f6a8c0d1

The trace shows:

  • Full call graph with timing per span
  • Downstream Kafka event processing
  • DB query durations
  • Where time was spent or where errors occurred

Step 5: Check audit logs

If the issue involves data that looks wrong ("my order disappeared", "menu settings changed"), query the audit trail:

# What happened to a specific order?
GET /api/platform/audit-logs?entityType=CanteenOrder&entityId=<order-uuid>

# What did a specific user change?
GET /api/platform/audit-logs?actorId=5237

# All menu changes in the school
GET /api/platform/audit-logs?entityType=Menu

Audit logs capture CREATE, UPDATE, and DELETE with before/after state snapshots, so you can see exactly what changed and who did it.

Step 6: Check metrics

If the issue might be performance-related ("app was slow", "request timed out"), check metrics for the time window:

Metric What it tells you
http.server.requests Request latency and error rates by endpoint
hikaricp.connections.active DB connection pool exhaustion
jvm.memory.used Memory pressure
kafka.producer.record.send.total Kafka publish failures

Access via Prometheus (/actuator/prometheus) or your Grafana dashboards.


Common scenarios

"My order failed"

  1. Find parent's userId from their email/phone
  2. Search logs: userId="<id>" AND "order" AND level="WARN" OR level="ERROR"
  3. Common causes:
  4. "Ordering deadline has passed" — parent tried to order after cutoff
  5. "Insufficient stock" — item ran out
  6. "Menu is informational only" — menu not configured for ordering
  1. Find the student's studentId and parent's userId
  2. Search logs: userId="<id>" AND "menu" AND level="WARN"
  3. Check: is the menu published? Does the child's campus/class match the menu's target audience?
  4. Query audit logs: GET /api/platform/audit-logs?entityType=Menu to see if someone unpublished it

"Order shows wrong status"

  1. Find the order ID from the student/parent's recent orders
  2. Query audit logs: GET /api/platform/audit-logs?entityType=CanteenOrder&entityId=<order-id>
  3. The before/after states show every status transition with timestamps and who triggered it

"Student marked absent but was present"

  1. Find studentId
  2. Query audit logs: GET /api/platform/audit-logs?entityType=AttendanceRecord&entityId=<record-id>
  3. Check actorId — was it manual (staff) or automatic (auto-serve setting)?

MDC fields reference

Every JSON log line includes these fields for filtering:

Field Source Example
requestId X-Request-Id header (auto-generated by SDK) a1b2c3d4-e5f6-...
traceId OpenTelemetry (auto-injected) 64f2b4a8e3d1...
spanId OpenTelemetry (auto-injected) a1b2c3d4...
userId JWT user_id claim 5237
schoolId JWT school_id claim 6
userName JWT user_name claim [email protected]

Tools

Tool URL Purpose
Log aggregator (configure per environment) Search structured JSON logs by MDC fields
Grafana Tempo (configure per environment) Distributed trace viewer (search by traceId)
Prometheus / Grafana /actuator/prometheus Metrics and dashboards
Audit logs API GET /api/platform/audit-logs Entity change history
Swagger UI /swagger-ui.html Interactive API explorer