Skip to content

Database Migration Management Guide

How the Smartsapp backend manages its Postgres schema. Read this before changing any @Entity.


TL;DR

  • Flyway owns the schema in every environment except test. Changes ship as .sql files in backend/src/main/resources/db/migration/.
  • Hibernate runs in validate mode. It refuses to start if entities and DB disagree, so migrations must land before or with the code that needs them.
  • Tests still use create-drop via application-test.yml (Flyway disabled there) so Testcontainers spins up clean schemas from entities each run.
  • Never change the schema outside a migration. No more ddl-auto=update, no more "just run this SQL on prod" in Slack.

Why we did this

Between March and April 2026 we had four silent schema-drift incidents on the dev DB. Every one had the same shape: ddl-auto=update added new columns/tables but never dropped legacy ones or adjusted CHECK constraints when enums were renamed. Each drift detonated as a 500 on the next insert, usually in production-adjacent environments:

Incident What drifted Damage
stock_available moved from Item to MenuSchedule Legacy column left NOT NULL Parent ordering broken until column dropped
maxItemQuantityPerOrdermaxQuantityPerDay Legacy column left in place Inserts unaffected, data duplication risk
menuVisibilityDays appeared in DB but not in entity Column had NOT NULL constraint with no mapping POST /api/canteen/menus returned 500
PaymentType enum renamed DB CHECK still enforced old values POST /api/canteen/menus returned 500 again

The common cause: ddl-auto=update is a local-dev convenience, not a migration tool. Spring Boot's own docs warn against it outside dev. Specifically it:

  • Never drops legacy columns or tables
  • Never relaxes NOT NULL or CHECK constraints when they become stale
  • Never renames anything (a rename becomes "add new + leave old sitting there")
  • Gives you zero audit trail of how prod's schema got to whatever state it's in

Flyway + ddl-auto=validate fixes this: every change is a reviewable, ordered, idempotent .sql file, applied once per environment, tracked in the flyway_schema_history table.


How it works end-to-end

  DEV TIME                                           RUNTIME (app startup)
┌───────────┐  ./gradlew   ┌──────────┐           ┌──────────┐  startup  ┌──────────┐
│  Entity   │──atlasDiff──▶│  Atlas   │           │  Flyway  │─────────▶│ Hibernate│
│  change   │              │  diff    │           │  migrate │          │ validate │
└───────────┘              └──────────┘           └──────────┘          └──────────┘
                                │                        │                    │
                                ▼                        ▼                    ▼
                      generates V<n>__*.sql    applies new migrations   compares entities
                      from entity diff         in version order         to DB. FAILS if
                                                                        they disagree.
  PRE-COMMIT / CI
┌───────────┐
│  Atlas    │  replays ALL migrations against a throwaway Docker Postgres.
│  validate │  Catches broken SQL, missing migrations, column conflicts.
└───────────┘  Fails the commit / PR before bad migrations reach deploy.

The three layers

  1. Atlas (dev-time + pre-commit + CI). Generates migration SQL from entity diffs (./gradlew atlasDiff) and validates the migration directory by replaying it against a fresh Postgres (./gradlew atlasValidate). Atlas never touches the live database — it uses a throwaway Docker container. The pre-commit hook runs atlasValidate when migration or entity files are staged; CI runs it on every PR.
  2. Flyway (runtime). Reads every V<n>__*.sql file in db/migration/, checks flyway_schema_history, and applies any new ones in version order inside a transaction. Runs on pod startup, before the app accepts traffic.
  3. Hibernate (runtime). In validate mode it compares every @Entity / @Column against the live DB. Any mismatch (missing column, wrong nullability, wrong type) fails startup. This is the last-resort safety net — if you somehow bypassed both Atlas and Flyway, Hibernate still catches the drift.

If any layer fails, the outcome is the same: the app doesn't start, the failure is loud, and the old pod keeps serving traffic (K8s rolling update with maxUnavailable: 0).


When to create a migration

Any time the database schema needs to change. Specifically:

Trigger Example Migration needed?
Add a new field to an @Entity private String allergenCode on Item YesALTER TABLE ... ADD COLUMN
Remove a field from an @Entity Delete stockAvailable from Item YesALTER TABLE ... DROP COLUMN
Rename a field on an @Entity maxItemQuantityPerOrdermaxQuantityPerDay Yes — multi-release dance (see Common tasks)
Change a field's type or nullability intInteger (nullable), @NotNull added YesALTER COLUMN ... SET/DROP NOT NULL
Add a new @Entity (new table) New FeedingListEntry entity YesCREATE TABLE
Delete an @Entity Remove LegacyMenuItem YesDROP TABLE IF EXISTS
Add/change an enum value used in a CHECK New PaymentType value Yes — rewrite the CHECK constraint
Add an index for query performance Index on (student_id, scheduled_date) YesCREATE INDEX
Change only Java logic (no schema impact) New service method, new DTO field No
Change only test code New test class No
Add a field to a DTO (not an entity) New field on CreateItemRequest No — DTOs are not tables

The rule of thumb: if your PR changes anything under entities/, you almost certainly need a migration. If Hibernate validate will see a mismatch between what @Entity says and what the DB has, you need a migration. If you're unsure, run ./gradlew bootRun against your local DB — a failure on startup tells you immediately.

The workflow

  1. Write the entity change first. Add/rename/remove the field, update the constructor, getters, setters.

  2. Generate the migration with Atlas. From the backend/ directory:

    ./gradlew atlasDiff -Pname=add_my_column
    
    Atlas compares your @Entity annotations against the cumulative state of all existing migrations and generates a V<next>__add_my_column.sql file in db/migration/. Always review the generated SQL — Atlas handles additive changes perfectly but may need hand-editing for renames, backfills, or constraint changes (see Common tasks).

If you prefer to write the SQL by hand, that works too — create V<next>__<description>.sql manually. Atlas is a helper, not a requirement.

  1. Validate locally. Run the migration through Atlas's replay check:

    ./gradlew atlasValidate
    
    This replays all migrations (V1 through your new one) against a throwaway Docker Postgres. If it passes, the SQL is valid and conflict-free. Requires Docker/Podman running.

  2. Boot the app locally. Run ./gradlew bootRun — Flyway applies the migration against your local DB, Hibernate validates the schema, the app starts. If it fails, fix the SQL and retry.

  3. Run the test suite. ./gradlew test — tests use create-drop (no Flyway), so they validate the entity independently.

  4. Update the mapper/DTO/controller if the entity change is user-facing.

  5. Update the CHANGELOG under [Unreleased].

  6. Commit. The pre-commit hook automatically runs atlasValidate when migration or entity files are staged. If validation fails, the commit is aborted with a diagnostic message.

  7. Open the PR. CI runs atlasValidate again (can't be bypassed with --no-verify). Migration + entity + DTO/controller changes all ship together.

Available Gradle tasks

Task What it does When to use
./gradlew atlasDiff -Pname=<description> Generate migration SQL from entity diff After changing an @Entity
./gradlew atlasValidate Replay all migrations against fresh Postgres Before committing (also runs automatically in pre-commit + CI)
./gradlew atlasHash Regenerate atlas.sum checksum file After manually editing a migration file
./gradlew atlasLint -Platest=1 Check latest migration for destructive ops Requires atlas login (free Atlas Pro account)

Atlas auto-downloads its binary on first use to build/tools/atlas — no brew install or manual setup needed. Requires Docker/Podman running for the throwaway Postgres container.


Writing a migration

Naming

V<version>__<snake_case_description>.sql
  • V is literal.
  • <version> is a monotonically increasing integer (V2, V3, V4… don't reuse or skip).
  • __ is two underscores. Flyway parses the filename; don't use single underscore after the version.
  • <description> is human-readable, lowercase, snake_case. Examples: add_menu_visibility_days, rename_select_and_pay_to_order_and_pay, drop_legacy_canteen_menu_items_table.

Bad: V3_fix.sql, V3__fix stuff.sql, V03__fix.sql, v3__fix.sql Good: V3__add_menu_visibility_days_column.sql

File structure

-- V3__add_menu_visibility_days_column.sql
--
-- Why: parents asked to see a "what's coming up" window per menu. The Menu
--      entity now carries a visibilityDays int (1–30, default 30).
-- Ticket: SMRT-1234

-- 1. Add the column with a temporary default so existing rows get a value.
ALTER TABLE canteen_menus ADD COLUMN menu_visibility_days INT NOT NULL DEFAULT 30;

-- 2. Drop the default now that every row has one. New rows are supplied by the
--    entity (the @Column has no default in Java), matching the backfill.
ALTER TABLE canteen_menus ALTER COLUMN menu_visibility_days DROP DEFAULT;

-- 3. Enforce the 1–30 range at the DB boundary too, as belt-and-braces alongside
--    the @Min(1) @Max(30) on the DTO.
ALTER TABLE canteen_menus ADD CONSTRAINT canteen_menus_visibility_days_range
    CHECK (menu_visibility_days BETWEEN 1 AND 30);

Lead with a short Why comment — future you will thank present you when a migration that ran 18 months ago starts looking mysterious. One file per logical change; don't bundle unrelated ALTERs.

Golden rules

  1. Migrations are forward-only and immutable. Once a migration has been merged to main and applied anywhere (dev, CI, staging, prod), never edit it. Flyway hashes every applied migration and refuses to start if a hash changes. If you need to correct a mistake, write a new migration (V4__fix_V3_typo.sql). This is non-negotiable.

  2. Write migrations to work on a mixed fleet. Dev, staging, and prod may be at different versions and in different states (especially during the transition from ddl-auto=update). Use:

  3. DROP ... IF EXISTS
  4. CREATE TABLE ... IF NOT EXISTS
  5. UPDATE ... WHERE <condition that is safe if already done>
  6. DO $$ BEGIN IF NOT EXISTS (SELECT ...) THEN ... END IF; END $$; for conditional DDL
  7. Postgres guarantees ALTER COLUMN ... DROP NOT NULL is idempotent; DROP DEFAULT is too.

  8. Migration + entity change go in the same PR. If you add a column to the entity but not to the DB, validate crashes on everyone's next startup. If you add a column to the DB but not to the entity, it sits there drifting (the bad old days). Land both together.

  9. Never touch data the entity doesn't own in the same migration as a schema change. Split DDL and DML into separate migrations when both are needed. DDL migrations are cheap to retry, DML migrations are not.

  10. Validate every migration locally before pushing.

    # Preferred: replay all migrations against a throwaway Postgres (catches conflicts with earlier migrations)
    ./gradlew atlasValidate
    
    # Then boot the app to confirm Flyway + Hibernate are happy
    ./gradlew bootRun
    
    The pre-commit hook runs atlasValidate automatically when migration or entity files are staged — but running it manually first gives faster feedback than waiting for the commit to fail.

  11. Coordinate data-destroying migrations. Any DROP COLUMN, DROP TABLE, or narrowing ALTER COLUMN ... SET NOT NULL is a ratchet. Get a second review on the PR. For prod, schedule during a known quiet window.

  12. Prefer additive over destructive. Renaming a column is a two-step dance: add the new, backfill, make the code read both, drop the old in a later release. Avoid rewriting existing data unless you absolutely must.


Environment-specific behaviour

Local dev (application.yml)

spring:
  jpa:
    hibernate:
      ddl-auto: validate
  flyway:
    enabled: true
    baseline-on-migrate: true
    baseline-version: "1"
- Flyway runs against your local Postgres on app startup. - First startup after the Flyway cutover: Flyway notices there's no flyway_schema_history table, creates it, marks the schema as baselined at V1 (without actually running V1's SQL), then runs V2+. - Subsequent startups: Flyway runs only migrations newer than what's in the history table. - If a migration fails, the app doesn't start and the history row is marked success=false. Fix the SQL, run ./gradlew bootRun again, and Flyway retries that row.

Tests (application-test.yml)

spring:
  jpa:
    hibernate:
      ddl-auto: create-drop
  flyway:
    enabled: false
- Testcontainers spawns a fresh Postgres per test class. - Hibernate builds the schema directly from @Entity annotations. No Flyway in tests. This is on purpose: tests validate that entities-in-Java work, not that entities + migration SQL work. - If a migration introduces a production-only feature (e.g., a partial index, a generated column) that affects behaviour, write a test against Testcontainers + a one-off Flyway call in the test setup — don't rely on create-drop.

Staging and production

  • Same application.yml as local dev (via env vars for DB URL/credentials).
  • Migrations run on pod startup, before the app accepts traffic.
  • Health probes (/actuator/health) fail-fast if migration fails, so the K8s rolling update keeps the old pod alive.

Rollback strategy

Flyway Community Edition has no native rollback. Migrations are forward-only. This is a deliberate Flyway design choice, not a limitation we can configure away. When something goes wrong, you fix forward — write a new migration that reverses or corrects the bad one.

Why not just revert the commit?

Reverting the Java code (entity + controller) is fine — the old code redeploys with the old entity definition. But the migration has already run against the database. The column/table/constraint change is in the DB. The old entity code will now fail Hibernate validate because the DB has the new schema but the old code doesn't expect it.

So a code revert must be accompanied by a new forward migration that undoes the schema change. You cannot simply "un-apply" a Flyway migration.

Rollback recipes

Scenario 1: You added a nullable column and want to undo it.

The safest rollback — dropping a nullable column that no existing code reads.

-- V<next>__rollback_v6_drop_stock_available.sql
ALTER TABLE canteen_menu_schedules DROP COLUMN IF EXISTS stock_available;

Revert the entity change in the same PR. Hibernate validate passes because both the column and the field are gone.

Scenario 2: You added a NOT NULL column with a default and want to undo it.

-- V<next>__rollback_v7_drop_visibility_days.sql
ALTER TABLE canteen_menus DROP COLUMN IF EXISTS menu_visibility_days;

Same pattern — drop the column, revert the entity. The DROP COLUMN removes the NOT NULL constraint with it.

Scenario 3: You renamed a column (multi-release dance) and want to abort mid-way.

If you're in the "both columns exist, code reads both" phase: write a migration that drops the new column and update the code to read only the old one.

If you've already dropped the old column in a later release: the data is gone. You must recreate the old column and backfill from the new one, then drop the new one. This is why the multi-release dance exists — it keeps a rollback path open at every step.

Scenario 4: You changed data (UPDATE) and want to undo it.

This is the hardest case. DML changes (updates, deletes) cannot be reversed unless you took a backup or logged the old values. Always take a backup of affected rows before running DML migrations in prod:

-- V<next>__rename_payment_type_values.sql
-- Backup existing values for rollback
CREATE TABLE IF NOT EXISTS _rollback_payment_type_backup AS
  SELECT id, payment_type FROM canteen_menus WHERE payment_type = 'SELECT_AND_PAY';

-- Perform the rename
UPDATE canteen_menus SET payment_type = 'ORDER_AND_PAY_ITEM_PRICING'
  WHERE payment_type = 'SELECT_AND_PAY';

Rollback migration:

-- V<next>__rollback_payment_type_rename.sql
UPDATE canteen_menus m
  SET payment_type = b.payment_type
  FROM _rollback_payment_type_backup b
  WHERE m.id = b.id;

DROP TABLE IF EXISTS _rollback_payment_type_backup;

Rollback decision tree

Migration went bad after deploy?
├── Is the change DDL-only (ADD/DROP COLUMN, ADD/DROP CONSTRAINT)?
│   ├── Yes → Write a forward migration to reverse the DDL.
│   │         Revert the entity code. Deploy both together.
│   └── No (data was changed) → Do you have a backup table?
│       ├── Yes → Write a forward migration to restore from the backup.
│       └── No → Restore from DB backup or accept the data change.
│                 Write a migration that corrects the values if possible.
└── Did the deploy itself fail (pod didn't come up)?
    └── K8s rolling update kept the old pod alive.
        Fix forward: write a corrective migration, push, redeploy.
        The old pod continues serving traffic until the new one is healthy.

Prevention > rollback

The best rollback strategy is not needing one: - Run every migration against a dev DB locally before pushing. ./gradlew bootRun catches schema mismatches immediately. - Use IF EXISTS / IF NOT EXISTS in DDL so migrations are idempotent on re-run. - Split DDL and DML into separate migrations. If the DDL succeeds but DML fails, you can fix just the DML. - Land additive changes first (add column), deploy, verify, then land destructive changes (drop old column) in a later PR. This keeps a rollback path open between deploys. - Back up affected rows before any DML migration in prod. A CREATE TABLE _rollback_* AS SELECT ... is cheap insurance.


Common tasks

Add a new column

-- V4__add_item_allergen_code.sql
ALTER TABLE canteen_items ADD COLUMN allergen_code VARCHAR(32);
Then add the field to the entity in the same PR:
@Column(name = "allergen_code")
private String allergenCode;

Add a NOT NULL column to a non-empty table

Two-step — the DB needs a value for existing rows before you can enforce NOT NULL:

-- V4__add_item_currency.sql
ALTER TABLE canteen_items ADD COLUMN currency VARCHAR(3);
UPDATE canteen_items SET currency = 'GHS' WHERE currency IS NULL;
ALTER TABLE canteen_items ALTER COLUMN currency SET NOT NULL;

Rename a column (safe, zero-downtime)

This is a multi-release dance. Don't try to do it in one migration.

Release N: add the new column alongside the old, backfill, write code that reads both and writes to both.

ALTER TABLE canteen_items ADD COLUMN max_quantity_per_day INT;
UPDATE canteen_items SET max_quantity_per_day = max_item_quantity_per_order;

Release N+1: flip readers and writers to the new column only.

Release N+2: drop the old column.

ALTER TABLE canteen_items DROP COLUMN max_item_quantity_per_order;

Rename a value in an enum-backed column

The column is VARCHAR with a CHECK constraint. You need to update both.

-- V5__rename_payment_type_values.sql
-- Update data first so the new constraint doesn't reject existing rows.
UPDATE canteen_menus SET payment_type = 'ORDER_AND_PAY_ITEM_PRICING'
 WHERE payment_type = 'SELECT_AND_PAY';

-- Swap the constraint.
ALTER TABLE canteen_menus DROP CONSTRAINT IF EXISTS canteen_menus_payment_type_check;
ALTER TABLE canteen_menus ADD CONSTRAINT canteen_menus_payment_type_check
    CHECK (payment_type IS NULL OR payment_type IN (
        'ORDER_AND_PAY_ITEM_PRICING',
        'ORDER_ONLY_NO_PAYMENT',
        'ORDER_AND_PAY_TICKET_PRICING'
    ));
Then update PaymentType.java in the same PR.

Drop a legacy table

-- V6__drop_legacy_canteen_menu_items.sql
DROP TABLE IF EXISTS canteen_menu_item_variants;
DROP TABLE IF EXISTS canteen_menu_items;
Confirm the table has zero rows first:
SELECT COUNT(*) FROM canteen_menu_items;
If it has rows, the data needs a home — don't just drop it.

Add an index

-- V7__add_index_canteen_orders_student_date.sql
CREATE INDEX CONCURRENTLY idx_canteen_orders_student_date
    ON canteen_orders (student_id, scheduled_date);
Important: Flyway runs each migration inside a transaction by default, but CREATE INDEX CONCURRENTLY cannot run inside a transaction. Mark the migration as non-transactional by putting CREATE INDEX CONCURRENTLY in its own file with no other statements, and name the file with a trailing _concurrent and configure Flyway's executeInTransaction: false — or simpler, drop CONCURRENTLY and accept a brief table lock during dev/staging deploys. For prod with real traffic, do the CONCURRENTLY dance.


Troubleshooting

"Migration checksum mismatch"

You (or someone) edited an already-applied migration. Flyway refuses to start. Options: 1. Revert the edit. Get the file back to what it was when it was applied. Bounce the app. 2. Repair the checksum (only if you're 100% sure the new SQL is equivalent and no DB needs rerunning): ./gradlew flywayRepair — but this should basically never be needed. Prefer option 1. 3. Never run flywayRepair in production. Fix forward with a new migration.

"Validation failed: detected applied migration not resolved locally"

You deleted a migration file that had already been applied somewhere. Restore the file from git history. Never delete applied migrations.

"Schema validation failed" (Hibernate validate)

The entity says one thing, the DB says another. Read the stack trace — it tells you which table/column. Fix by either: - Adjusting the entity to match the DB (if the DB is correct) - Writing a new migration to adjust the DB (if the entity is correct)

Do not flip ddl-auto back to update to "make it go away". That's how we got into the mess in the first place.

Flyway tries to run a migration that already ran manually

Your dev DB has a migration that was applied by hand, but flyway_schema_history doesn't know about it. Two fixes: 1. Undo the manual change in the DB and let Flyway run the migration normally. 2. Mark the migration as applied by inserting a row in flyway_schema_history with success=true and a matching checksum. This is an escape hatch for dev only — never do this in staging or prod.

A migration hangs

It's probably blocked on a lock. Common cause: a long-running transaction (often from a psql session you forgot about) is holding a row lock that your ALTER TABLE needs. Kill the offending session:

SELECT pid, state, query FROM pg_stat_activity WHERE state != 'idle';
SELECT pg_terminate_backend(<pid>);


The Flyway history table

Flyway tracks what's been applied in flyway_schema_history:

SELECT installed_rank, version, description, success, installed_on
FROM flyway_schema_history
ORDER BY installed_rank;
  • installed_rank: order the migration was applied (not necessarily the same as version — repairs can reorder)
  • version: the V<n> from the filename
  • success: true if the migration committed; false if it failed mid-way
  • installed_on: timestamp

If success = false on any row, Flyway will retry that migration on the next startup. If retries keep failing, you need to either fix the SQL or manually clean up partial work and delete the failed row.


When NOT to use a Flyway migration

  • Seed data that should only exist in dev/staging, not prod. Use the existing DataSeeder class, gated on a Spring profile.
  • One-off data cleanup you want to run manually with a human watching. Run it from psql, record it in the ops runbook, and write a forward migration later if it turns out to be repeatable.
  • Read-only query helpers (views) that exist only in a monitoring environment. Keep those in the monitoring env's config, not app migrations.

Checklist for every schema PR

  • [ ] New migration file follows V<n>__<description>.sql naming
  • [ ] Version number is strictly greater than the highest on main
  • [ ] Migration is idempotent on mixed environments (uses IF EXISTS / IF NOT EXISTS where needed)
  • [ ] Matching @Entity changes are in the same PR
  • [ ] ./gradlew atlasValidate passes (replays all migrations against fresh Postgres)
  • [ ] Tests pass locally (./gradlew test)
  • [ ] Backend starts cleanly locally with the new migration applied (./gradlew bootRun)
  • [ ] If the migration touches more than one table or does data backfill, you've run it against a realistic-sized dev DB and it completes in a reasonable time
  • [ ] If the migration drops data, a second reviewer has signed off
  • [ ] The Why comment at the top of the .sql file names the ticket / incident / PRD section

Tooling summary

Tool Role When it runs What it catches
Atlas atlasDiff Generates migration SQL from entity diffs Developer runs manually after entity change "You forgot to write a migration"
Atlas atlasValidate Replays all migrations against fresh Postgres Pre-commit hook + CI pipeline Broken SQL, column conflicts, replay failures
Atlas atlasHash Regenerates atlas.sum checksum file After manually editing a migration Checksum mismatches
Flyway Applies migrations at app startup Runtime (pod startup) Applies pending migrations in order
Hibernate validate Compares entities to live DB Runtime (after Flyway) Entity-vs-DB drift that somehow bypassed everything above

Atlas is dev-time only — it never touches the live database. It auto-downloads its binary on first ./gradlew atlasDiff or atlasValidate (no brew install needed). Requires Docker/Podman for the throwaway Postgres container.

Configuration lives in atlas.hcl at the project root. Gradle tasks are defined in backend/build.gradle.kts under the atlas group.


Further reading