Database Migration Management Guide¶
How the Smartsapp backend manages its Postgres schema. Read this before changing any @Entity.
TL;DR¶
- Flyway owns the schema in every environment except
test. Changes ship as.sqlfiles inbackend/src/main/resources/db/migration/. - Hibernate runs in
validatemode. It refuses to start if entities and DB disagree, so migrations must land before or with the code that needs them. - Tests still use
create-dropviaapplication-test.yml(Flyway disabled there) so Testcontainers spins up clean schemas from entities each run. - Never change the schema outside a migration. No more
ddl-auto=update, no more "just run this SQL on prod" in Slack.
Why we did this¶
Between March and April 2026 we had four silent schema-drift incidents on the dev DB. Every one had the same shape: ddl-auto=update added new columns/tables but never dropped legacy ones or adjusted CHECK constraints when enums were renamed. Each drift detonated as a 500 on the next insert, usually in production-adjacent environments:
| Incident | What drifted | Damage |
|---|---|---|
stock_available moved from Item to MenuSchedule |
Legacy column left NOT NULL |
Parent ordering broken until column dropped |
maxItemQuantityPerOrder → maxQuantityPerDay |
Legacy column left in place | Inserts unaffected, data duplication risk |
menuVisibilityDays appeared in DB but not in entity |
Column had NOT NULL constraint with no mapping |
POST /api/canteen/menus returned 500 |
PaymentType enum renamed |
DB CHECK still enforced old values | POST /api/canteen/menus returned 500 again |
The common cause: ddl-auto=update is a local-dev convenience, not a migration tool. Spring Boot's own docs warn against it outside dev. Specifically it:
- Never drops legacy columns or tables
- Never relaxes
NOT NULLorCHECKconstraints when they become stale - Never renames anything (a rename becomes "add new + leave old sitting there")
- Gives you zero audit trail of how prod's schema got to whatever state it's in
Flyway + ddl-auto=validate fixes this: every change is a reviewable, ordered, idempotent .sql file, applied once per environment, tracked in the flyway_schema_history table.
How it works end-to-end¶
DEV TIME RUNTIME (app startup)
┌───────────┐ ./gradlew ┌──────────┐ ┌──────────┐ startup ┌──────────┐
│ Entity │──atlasDiff──▶│ Atlas │ │ Flyway │─────────▶│ Hibernate│
│ change │ │ diff │ │ migrate │ │ validate │
└───────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │
▼ ▼ ▼
generates V<n>__*.sql applies new migrations compares entities
from entity diff in version order to DB. FAILS if
they disagree.
PRE-COMMIT / CI
┌───────────┐
│ Atlas │ replays ALL migrations against a throwaway Docker Postgres.
│ validate │ Catches broken SQL, missing migrations, column conflicts.
└───────────┘ Fails the commit / PR before bad migrations reach deploy.
The three layers¶
- Atlas (dev-time + pre-commit + CI). Generates migration SQL from entity diffs (
./gradlew atlasDiff) and validates the migration directory by replaying it against a fresh Postgres (./gradlew atlasValidate). Atlas never touches the live database — it uses a throwaway Docker container. The pre-commit hook runsatlasValidatewhen migration or entity files are staged; CI runs it on every PR. - Flyway (runtime). Reads every
V<n>__*.sqlfile indb/migration/, checksflyway_schema_history, and applies any new ones in version order inside a transaction. Runs on pod startup, before the app accepts traffic. - Hibernate (runtime). In
validatemode it compares every@Entity/@Columnagainst the live DB. Any mismatch (missing column, wrong nullability, wrong type) fails startup. This is the last-resort safety net — if you somehow bypassed both Atlas and Flyway, Hibernate still catches the drift.
If any layer fails, the outcome is the same: the app doesn't start, the failure is loud, and the old pod keeps serving traffic (K8s rolling update with maxUnavailable: 0).
When to create a migration¶
Any time the database schema needs to change. Specifically:
| Trigger | Example | Migration needed? |
|---|---|---|
Add a new field to an @Entity |
private String allergenCode on Item |
Yes — ALTER TABLE ... ADD COLUMN |
Remove a field from an @Entity |
Delete stockAvailable from Item |
Yes — ALTER TABLE ... DROP COLUMN |
Rename a field on an @Entity |
maxItemQuantityPerOrder → maxQuantityPerDay |
Yes — multi-release dance (see Common tasks) |
| Change a field's type or nullability | int → Integer (nullable), @NotNull added |
Yes — ALTER COLUMN ... SET/DROP NOT NULL |
Add a new @Entity (new table) |
New FeedingListEntry entity |
Yes — CREATE TABLE |
Delete an @Entity |
Remove LegacyMenuItem |
Yes — DROP TABLE IF EXISTS |
| Add/change an enum value used in a CHECK | New PaymentType value |
Yes — rewrite the CHECK constraint |
| Add an index for query performance | Index on (student_id, scheduled_date) |
Yes — CREATE INDEX |
| Change only Java logic (no schema impact) | New service method, new DTO field | No |
| Change only test code | New test class | No |
| Add a field to a DTO (not an entity) | New field on CreateItemRequest |
No — DTOs are not tables |
The rule of thumb: if your PR changes anything under entities/, you almost certainly need a migration. If Hibernate validate will see a mismatch between what @Entity says and what the DB has, you need a migration. If you're unsure, run ./gradlew bootRun against your local DB — a failure on startup tells you immediately.
The workflow¶
-
Write the entity change first. Add/rename/remove the field, update the constructor, getters, setters.
-
Generate the migration with Atlas. From the
backend/directory:Atlas compares your./gradlew atlasDiff -Pname=add_my_column@Entityannotations against the cumulative state of all existing migrations and generates aV<next>__add_my_column.sqlfile indb/migration/. Always review the generated SQL — Atlas handles additive changes perfectly but may need hand-editing for renames, backfills, or constraint changes (see Common tasks).
If you prefer to write the SQL by hand, that works too — create V<next>__<description>.sql manually. Atlas is a helper, not a requirement.
-
Validate locally. Run the migration through Atlas's replay check:
This replays all migrations (V1 through your new one) against a throwaway Docker Postgres. If it passes, the SQL is valid and conflict-free. Requires Docker/Podman running../gradlew atlasValidate -
Boot the app locally. Run
./gradlew bootRun— Flyway applies the migration against your local DB, Hibernate validates the schema, the app starts. If it fails, fix the SQL and retry. -
Run the test suite.
./gradlew test— tests usecreate-drop(no Flyway), so they validate the entity independently. -
Update the mapper/DTO/controller if the entity change is user-facing.
-
Update the CHANGELOG under
[Unreleased]. -
Commit. The pre-commit hook automatically runs
atlasValidatewhen migration or entity files are staged. If validation fails, the commit is aborted with a diagnostic message. -
Open the PR. CI runs
atlasValidateagain (can't be bypassed with--no-verify). Migration + entity + DTO/controller changes all ship together.
Available Gradle tasks¶
| Task | What it does | When to use |
|---|---|---|
./gradlew atlasDiff -Pname=<description> |
Generate migration SQL from entity diff | After changing an @Entity |
./gradlew atlasValidate |
Replay all migrations against fresh Postgres | Before committing (also runs automatically in pre-commit + CI) |
./gradlew atlasHash |
Regenerate atlas.sum checksum file |
After manually editing a migration file |
./gradlew atlasLint -Platest=1 |
Check latest migration for destructive ops | Requires atlas login (free Atlas Pro account) |
Atlas auto-downloads its binary on first use to build/tools/atlas — no brew install or manual setup needed. Requires Docker/Podman running for the throwaway Postgres container.
Writing a migration¶
Naming¶
V<version>__<snake_case_description>.sql
Vis literal.<version>is a monotonically increasing integer (V2,V3,V4… don't reuse or skip).__is two underscores. Flyway parses the filename; don't use single underscore after the version.<description>is human-readable, lowercase, snake_case. Examples:add_menu_visibility_days,rename_select_and_pay_to_order_and_pay,drop_legacy_canteen_menu_items_table.
Bad: V3_fix.sql, V3__fix stuff.sql, V03__fix.sql, v3__fix.sql
Good: V3__add_menu_visibility_days_column.sql
File structure¶
-- V3__add_menu_visibility_days_column.sql
--
-- Why: parents asked to see a "what's coming up" window per menu. The Menu
-- entity now carries a visibilityDays int (1–30, default 30).
-- Ticket: SMRT-1234
-- 1. Add the column with a temporary default so existing rows get a value.
ALTER TABLE canteen_menus ADD COLUMN menu_visibility_days INT NOT NULL DEFAULT 30;
-- 2. Drop the default now that every row has one. New rows are supplied by the
-- entity (the @Column has no default in Java), matching the backfill.
ALTER TABLE canteen_menus ALTER COLUMN menu_visibility_days DROP DEFAULT;
-- 3. Enforce the 1–30 range at the DB boundary too, as belt-and-braces alongside
-- the @Min(1) @Max(30) on the DTO.
ALTER TABLE canteen_menus ADD CONSTRAINT canteen_menus_visibility_days_range
CHECK (menu_visibility_days BETWEEN 1 AND 30);
Lead with a short Why comment — future you will thank present you when a migration that ran 18 months ago starts looking mysterious. One file per logical change; don't bundle unrelated ALTERs.
Golden rules¶
-
Migrations are forward-only and immutable. Once a migration has been merged to
mainand applied anywhere (dev, CI, staging, prod), never edit it. Flyway hashes every applied migration and refuses to start if a hash changes. If you need to correct a mistake, write a new migration (V4__fix_V3_typo.sql). This is non-negotiable. -
Write migrations to work on a mixed fleet. Dev, staging, and prod may be at different versions and in different states (especially during the transition from
ddl-auto=update). Use: DROP ... IF EXISTSCREATE TABLE ... IF NOT EXISTSUPDATE ... WHERE <condition that is safe if already done>DO $$ BEGIN IF NOT EXISTS (SELECT ...) THEN ... END IF; END $$;for conditional DDL-
Postgres guarantees
ALTER COLUMN ... DROP NOT NULLis idempotent;DROP DEFAULTis too. -
Migration + entity change go in the same PR. If you add a column to the entity but not to the DB,
validatecrashes on everyone's next startup. If you add a column to the DB but not to the entity, it sits there drifting (the bad old days). Land both together. -
Never touch data the entity doesn't own in the same migration as a schema change. Split DDL and DML into separate migrations when both are needed. DDL migrations are cheap to retry, DML migrations are not.
-
Validate every migration locally before pushing.
The pre-commit hook runs# Preferred: replay all migrations against a throwaway Postgres (catches conflicts with earlier migrations) ./gradlew atlasValidate # Then boot the app to confirm Flyway + Hibernate are happy ./gradlew bootRunatlasValidateautomatically when migration or entity files are staged — but running it manually first gives faster feedback than waiting for the commit to fail. -
Coordinate data-destroying migrations. Any
DROP COLUMN,DROP TABLE, or narrowingALTER COLUMN ... SET NOT NULLis a ratchet. Get a second review on the PR. For prod, schedule during a known quiet window. -
Prefer additive over destructive. Renaming a column is a two-step dance: add the new, backfill, make the code read both, drop the old in a later release. Avoid rewriting existing data unless you absolutely must.
Environment-specific behaviour¶
Local dev (application.yml)¶
spring:
jpa:
hibernate:
ddl-auto: validate
flyway:
enabled: true
baseline-on-migrate: true
baseline-version: "1"
flyway_schema_history table, creates it, marks the schema as baselined at V1 (without actually running V1's SQL), then runs V2+.
- Subsequent startups: Flyway runs only migrations newer than what's in the history table.
- If a migration fails, the app doesn't start and the history row is marked success=false. Fix the SQL, run ./gradlew bootRun again, and Flyway retries that row.
Tests (application-test.yml)¶
spring:
jpa:
hibernate:
ddl-auto: create-drop
flyway:
enabled: false
@Entity annotations. No Flyway in tests. This is on purpose: tests validate that entities-in-Java work, not that entities + migration SQL work.
- If a migration introduces a production-only feature (e.g., a partial index, a generated column) that affects behaviour, write a test against Testcontainers + a one-off Flyway call in the test setup — don't rely on create-drop.
Staging and production¶
- Same
application.ymlas local dev (via env vars for DB URL/credentials). - Migrations run on pod startup, before the app accepts traffic.
- Health probes (
/actuator/health) fail-fast if migration fails, so the K8s rolling update keeps the old pod alive.
Rollback strategy¶
Flyway Community Edition has no native rollback. Migrations are forward-only. This is a deliberate Flyway design choice, not a limitation we can configure away. When something goes wrong, you fix forward — write a new migration that reverses or corrects the bad one.
Why not just revert the commit?¶
Reverting the Java code (entity + controller) is fine — the old code redeploys with the old entity definition. But the migration has already run against the database. The column/table/constraint change is in the DB. The old entity code will now fail Hibernate validate because the DB has the new schema but the old code doesn't expect it.
So a code revert must be accompanied by a new forward migration that undoes the schema change. You cannot simply "un-apply" a Flyway migration.
Rollback recipes¶
Scenario 1: You added a nullable column and want to undo it.
The safest rollback — dropping a nullable column that no existing code reads.
-- V<next>__rollback_v6_drop_stock_available.sql
ALTER TABLE canteen_menu_schedules DROP COLUMN IF EXISTS stock_available;
Revert the entity change in the same PR. Hibernate validate passes because both the column and the field are gone.
Scenario 2: You added a NOT NULL column with a default and want to undo it.
-- V<next>__rollback_v7_drop_visibility_days.sql
ALTER TABLE canteen_menus DROP COLUMN IF EXISTS menu_visibility_days;
Same pattern — drop the column, revert the entity. The DROP COLUMN removes the NOT NULL constraint with it.
Scenario 3: You renamed a column (multi-release dance) and want to abort mid-way.
If you're in the "both columns exist, code reads both" phase: write a migration that drops the new column and update the code to read only the old one.
If you've already dropped the old column in a later release: the data is gone. You must recreate the old column and backfill from the new one, then drop the new one. This is why the multi-release dance exists — it keeps a rollback path open at every step.
Scenario 4: You changed data (UPDATE) and want to undo it.
This is the hardest case. DML changes (updates, deletes) cannot be reversed unless you took a backup or logged the old values. Always take a backup of affected rows before running DML migrations in prod:
-- V<next>__rename_payment_type_values.sql
-- Backup existing values for rollback
CREATE TABLE IF NOT EXISTS _rollback_payment_type_backup AS
SELECT id, payment_type FROM canteen_menus WHERE payment_type = 'SELECT_AND_PAY';
-- Perform the rename
UPDATE canteen_menus SET payment_type = 'ORDER_AND_PAY_ITEM_PRICING'
WHERE payment_type = 'SELECT_AND_PAY';
Rollback migration:
-- V<next>__rollback_payment_type_rename.sql
UPDATE canteen_menus m
SET payment_type = b.payment_type
FROM _rollback_payment_type_backup b
WHERE m.id = b.id;
DROP TABLE IF EXISTS _rollback_payment_type_backup;
Rollback decision tree¶
Migration went bad after deploy?
├── Is the change DDL-only (ADD/DROP COLUMN, ADD/DROP CONSTRAINT)?
│ ├── Yes → Write a forward migration to reverse the DDL.
│ │ Revert the entity code. Deploy both together.
│ └── No (data was changed) → Do you have a backup table?
│ ├── Yes → Write a forward migration to restore from the backup.
│ └── No → Restore from DB backup or accept the data change.
│ Write a migration that corrects the values if possible.
└── Did the deploy itself fail (pod didn't come up)?
└── K8s rolling update kept the old pod alive.
Fix forward: write a corrective migration, push, redeploy.
The old pod continues serving traffic until the new one is healthy.
Prevention > rollback¶
The best rollback strategy is not needing one:
- Run every migration against a dev DB locally before pushing. ./gradlew bootRun catches schema mismatches immediately.
- Use IF EXISTS / IF NOT EXISTS in DDL so migrations are idempotent on re-run.
- Split DDL and DML into separate migrations. If the DDL succeeds but DML fails, you can fix just the DML.
- Land additive changes first (add column), deploy, verify, then land destructive changes (drop old column) in a later PR. This keeps a rollback path open between deploys.
- Back up affected rows before any DML migration in prod. A CREATE TABLE _rollback_* AS SELECT ... is cheap insurance.
Common tasks¶
Add a new column¶
-- V4__add_item_allergen_code.sql
ALTER TABLE canteen_items ADD COLUMN allergen_code VARCHAR(32);
@Column(name = "allergen_code")
private String allergenCode;
Add a NOT NULL column to a non-empty table¶
Two-step — the DB needs a value for existing rows before you can enforce NOT NULL:
-- V4__add_item_currency.sql
ALTER TABLE canteen_items ADD COLUMN currency VARCHAR(3);
UPDATE canteen_items SET currency = 'GHS' WHERE currency IS NULL;
ALTER TABLE canteen_items ALTER COLUMN currency SET NOT NULL;
Rename a column (safe, zero-downtime)¶
This is a multi-release dance. Don't try to do it in one migration.
Release N: add the new column alongside the old, backfill, write code that reads both and writes to both.
ALTER TABLE canteen_items ADD COLUMN max_quantity_per_day INT;
UPDATE canteen_items SET max_quantity_per_day = max_item_quantity_per_order;
Release N+1: flip readers and writers to the new column only.
Release N+2: drop the old column.
ALTER TABLE canteen_items DROP COLUMN max_item_quantity_per_order;
Rename a value in an enum-backed column¶
The column is VARCHAR with a CHECK constraint. You need to update both.
-- V5__rename_payment_type_values.sql
-- Update data first so the new constraint doesn't reject existing rows.
UPDATE canteen_menus SET payment_type = 'ORDER_AND_PAY_ITEM_PRICING'
WHERE payment_type = 'SELECT_AND_PAY';
-- Swap the constraint.
ALTER TABLE canteen_menus DROP CONSTRAINT IF EXISTS canteen_menus_payment_type_check;
ALTER TABLE canteen_menus ADD CONSTRAINT canteen_menus_payment_type_check
CHECK (payment_type IS NULL OR payment_type IN (
'ORDER_AND_PAY_ITEM_PRICING',
'ORDER_ONLY_NO_PAYMENT',
'ORDER_AND_PAY_TICKET_PRICING'
));
PaymentType.java in the same PR.
Drop a legacy table¶
-- V6__drop_legacy_canteen_menu_items.sql
DROP TABLE IF EXISTS canteen_menu_item_variants;
DROP TABLE IF EXISTS canteen_menu_items;
SELECT COUNT(*) FROM canteen_menu_items;
Add an index¶
-- V7__add_index_canteen_orders_student_date.sql
CREATE INDEX CONCURRENTLY idx_canteen_orders_student_date
ON canteen_orders (student_id, scheduled_date);
CREATE INDEX CONCURRENTLY cannot run inside a transaction. Mark the migration as non-transactional by putting CREATE INDEX CONCURRENTLY in its own file with no other statements, and name the file with a trailing _concurrent and configure Flyway's executeInTransaction: false — or simpler, drop CONCURRENTLY and accept a brief table lock during dev/staging deploys. For prod with real traffic, do the CONCURRENTLY dance.
Troubleshooting¶
"Migration checksum mismatch"¶
You (or someone) edited an already-applied migration. Flyway refuses to start. Options:
1. Revert the edit. Get the file back to what it was when it was applied. Bounce the app.
2. Repair the checksum (only if you're 100% sure the new SQL is equivalent and no DB needs rerunning): ./gradlew flywayRepair — but this should basically never be needed. Prefer option 1.
3. Never run flywayRepair in production. Fix forward with a new migration.
"Validation failed: detected applied migration not resolved locally"¶
You deleted a migration file that had already been applied somewhere. Restore the file from git history. Never delete applied migrations.
"Schema validation failed" (Hibernate validate)¶
The entity says one thing, the DB says another. Read the stack trace — it tells you which table/column. Fix by either: - Adjusting the entity to match the DB (if the DB is correct) - Writing a new migration to adjust the DB (if the entity is correct)
Do not flip ddl-auto back to update to "make it go away". That's how we got into the mess in the first place.
Flyway tries to run a migration that already ran manually¶
Your dev DB has a migration that was applied by hand, but flyway_schema_history doesn't know about it. Two fixes:
1. Undo the manual change in the DB and let Flyway run the migration normally.
2. Mark the migration as applied by inserting a row in flyway_schema_history with success=true and a matching checksum. This is an escape hatch for dev only — never do this in staging or prod.
A migration hangs¶
It's probably blocked on a lock. Common cause: a long-running transaction (often from a psql session you forgot about) is holding a row lock that your ALTER TABLE needs. Kill the offending session:
SELECT pid, state, query FROM pg_stat_activity WHERE state != 'idle';
SELECT pg_terminate_backend(<pid>);
The Flyway history table¶
Flyway tracks what's been applied in flyway_schema_history:
SELECT installed_rank, version, description, success, installed_on
FROM flyway_schema_history
ORDER BY installed_rank;
installed_rank: order the migration was applied (not necessarily the same as version — repairs can reorder)version: theV<n>from the filenamesuccess:trueif the migration committed;falseif it failed mid-wayinstalled_on: timestamp
If success = false on any row, Flyway will retry that migration on the next startup. If retries keep failing, you need to either fix the SQL or manually clean up partial work and delete the failed row.
When NOT to use a Flyway migration¶
- Seed data that should only exist in dev/staging, not prod. Use the existing
DataSeederclass, gated on a Spring profile. - One-off data cleanup you want to run manually with a human watching. Run it from psql, record it in the ops runbook, and write a forward migration later if it turns out to be repeatable.
- Read-only query helpers (views) that exist only in a monitoring environment. Keep those in the monitoring env's config, not app migrations.
Checklist for every schema PR¶
- [ ] New migration file follows
V<n>__<description>.sqlnaming - [ ] Version number is strictly greater than the highest on
main - [ ] Migration is idempotent on mixed environments (uses
IF EXISTS/IF NOT EXISTSwhere needed) - [ ] Matching
@Entitychanges are in the same PR - [ ]
./gradlew atlasValidatepasses (replays all migrations against fresh Postgres) - [ ] Tests pass locally (
./gradlew test) - [ ] Backend starts cleanly locally with the new migration applied (
./gradlew bootRun) - [ ] If the migration touches more than one table or does data backfill, you've run it against a realistic-sized dev DB and it completes in a reasonable time
- [ ] If the migration drops data, a second reviewer has signed off
- [ ] The Why comment at the top of the
.sqlfile names the ticket / incident / PRD section
Tooling summary¶
| Tool | Role | When it runs | What it catches |
|---|---|---|---|
Atlas atlasDiff |
Generates migration SQL from entity diffs | Developer runs manually after entity change | "You forgot to write a migration" |
Atlas atlasValidate |
Replays all migrations against fresh Postgres | Pre-commit hook + CI pipeline | Broken SQL, column conflicts, replay failures |
Atlas atlasHash |
Regenerates atlas.sum checksum file |
After manually editing a migration | Checksum mismatches |
| Flyway | Applies migrations at app startup | Runtime (pod startup) | Applies pending migrations in order |
Hibernate validate |
Compares entities to live DB | Runtime (after Flyway) | Entity-vs-DB drift that somehow bypassed everything above |
Atlas is dev-time only — it never touches the live database. It auto-downloads its binary on first ./gradlew atlasDiff or atlasValidate (no brew install needed). Requires Docker/Podman for the throwaway Postgres container.
Configuration lives in atlas.hcl at the project root. Gradle tasks are defined in backend/build.gradle.kts under the atlas group.
Further reading¶
- Atlas documentation — especially "Versioned Migrations" and "Migration Linting"
- Flyway documentation — especially the "Migrations" and "Callbacks" sections
- Spring Boot and Flyway integration
- Postgres ALTER TABLE documentation — especially the locking notes
- Project history:
CHANGELOG.mdunder Changed / BREAKING entries documents every drift incident that led to this guide