Debugging Guide¶
How to trace customer issues through the system. You never need to ask the customer for technical IDs — start from who they are and when it happened.
Step 1: Identify the user¶
Get the customer's name, email, or school from the support ticket. Look up their userId and schoolId:
-- By email (staff/admin)
SELECT user_id, school_id, user_name FROM users WHERE email = '[email protected]';
-- By student name (parent reported issue)
SELECT s.id AS student_id, g.user_id AS parent_user_id, g.school_id
FROM students s
JOIN guardians g ON g.family_id = s.family_id
WHERE s.first_name = 'Kwame' AND s.last_name = 'Mensah';
Step 2: Search logs by user + time¶
Every authenticated request logs userId and schoolId in structured JSON. Use your log aggregator (Loki, ELK, CloudWatch) to search:
userId="5237" AND level="WARN" OR level="ERROR"
Narrow by time window based on when the customer says the issue occurred. Look for:
- WARN — business rule violations (deadline passed, already cancelled, duplicate check-in)
- ERROR — unexpected failures (DB connection, null pointer, serialization)
Each log line also contains requestId and traceId — note these for the next steps.
Step 3: Follow a specific request¶
Once you find a suspicious log line, use its requestId to see everything that happened in that request:
requestId="a1b2c3d4-e5f6-7890-abcd-ef1234567890"
This shows the full request lifecycle: auth, service calls, DB queries, Kafka events, and the response.
Step 4: View the distributed trace¶
Use the traceId from the log line in your tracing UI (Grafana Tempo, Jaeger):
traceId: 64f2b4a8e3d1c9f0a5b7d2e4f6a8c0d1
The trace shows:
- Full call graph with timing per span
- Downstream Kafka event processing
- DB query durations
- Where time was spent or where errors occurred
Step 5: Check audit logs¶
If the issue involves data that looks wrong ("my order disappeared", "menu settings changed"), query the audit trail:
# What happened to a specific order?
GET /api/platform/audit-logs?entityType=CanteenOrder&entityId=<order-uuid>
# What did a specific user change?
GET /api/platform/audit-logs?actorId=5237
# All menu changes in the school
GET /api/platform/audit-logs?entityType=Menu
Audit logs capture CREATE, UPDATE, and DELETE with before/after state snapshots, so you can see exactly what changed and who did it.
Step 6: Check metrics¶
If the issue might be performance-related ("app was slow", "request timed out"), check metrics for the time window:
| Metric | What it tells you |
|---|---|
http.server.requests |
Request latency and error rates by endpoint |
hikaricp.connections.active |
DB connection pool exhaustion |
jvm.memory.used |
Memory pressure |
kafka.producer.record.send.total |
Kafka publish failures |
Access via Prometheus (/actuator/prometheus) or your Grafana dashboards.
Common scenarios¶
"My order failed"¶
- Find parent's
userIdfrom their email/phone - Search logs:
userId="<id>" AND "order" AND level="WARN" OR level="ERROR" - Common causes:
"Ordering deadline has passed"— parent tried to order after cutoff"Insufficient stock"— item ran out"Menu is informational only"— menu not configured for ordering
"Menu is not showing for my child"¶
- Find the student's
studentIdand parent'suserId - Search logs:
userId="<id>" AND "menu" AND level="WARN" - Check: is the menu published? Does the child's campus/class match the menu's target audience?
- Query audit logs:
GET /api/platform/audit-logs?entityType=Menuto see if someone unpublished it
"Order shows wrong status"¶
- Find the order ID from the student/parent's recent orders
- Query audit logs:
GET /api/platform/audit-logs?entityType=CanteenOrder&entityId=<order-id> - The before/after states show every status transition with timestamps and who triggered it
"Student marked absent but was present"¶
- Find
studentId - Query audit logs:
GET /api/platform/audit-logs?entityType=AttendanceRecord&entityId=<record-id> - Check
actorId— was it manual (staff) or automatic (auto-serve setting)?
MDC fields reference¶
Every JSON log line includes these fields for filtering:
| Field | Source | Example |
|---|---|---|
requestId |
X-Request-Id header (auto-generated by SDK) |
a1b2c3d4-e5f6-... |
traceId |
OpenTelemetry (auto-injected) | 64f2b4a8e3d1... |
spanId |
OpenTelemetry (auto-injected) | a1b2c3d4... |
userId |
JWT user_id claim |
5237 |
schoolId |
JWT school_id claim |
6 |
userName |
JWT user_name claim |
[email protected] |
Tools¶
| Tool | URL | Purpose |
|---|---|---|
| Log aggregator | (configure per environment) | Search structured JSON logs by MDC fields |
| Grafana Tempo | (configure per environment) | Distributed trace viewer (search by traceId) |
| Prometheus / Grafana | /actuator/prometheus |
Metrics and dashboards |
| Audit logs API | GET /api/platform/audit-logs |
Entity change history |
| Swagger UI | /swagger-ui.html |
Interactive API explorer |