Skip to content

CI/CD — Smartsapp

This document defines how code moves from a developer's machine to production.

Smartsapp follows:

  • Trunk-based development
  • Immutable container images
  • Version creation during staging promotion
  • Manual promotion to production
  • Zero-downtime deployments

1. Platform

Component Technology
CI/CD Orchestration Devtron
Container Runtime Kubernetes (RKE2)
Cluster Management Rancher
Image Builds Multi-stage Containerfiles
Container Registry OCI-compatible registry

Each deployable application has its own Devtron pipeline (CI + CD).


2. Branching Strategy

All engineers integrate into a single branch: main.

Principles

  • Feature branches are short-lived.
  • main is always deployable.
  • Incomplete features are hidden behind feature flags.
  • Code is integrated continuously.
  • Every commit to main must build and pass tests.

3. Image Build Strategy

Every commit to main produces a container image.

Continuous Integration Images

When a commit is pushed to main:

  1. CI builds the container image.
  2. The image is tagged with the commit SHA.

Example:

sha-f4e8a12

This image is:

  • Automatically deployed to dev
  • Used for integration validation
  • Not considered a release

SHA images are temporary validation artifacts.


4. Versioning and Releases

A semantic version is created when promoting from dev to staging.

Developers do not manually create Git tags.

Version Creation Process

When promoting a build to staging:

  1. The system identifies the latest Git tag.
  2. The next semantic version is calculated (typically a PATCH increment).
  3. A new Git tag (e.g., v1.14.0) is created automatically.
  4. The tag is pushed to the repository.
  5. The existing SHA image is retagged with the semantic version (1.14.0).
  6. Release notes are generated automatically from commit history.
  7. The versioned image is deployed to staging.

No rebuild occurs.

The same image is later promoted to production.

Version Increment Policy

By default, each staging promotion increments PATCH:

1.14.0
1.14.1
1.14.2

This may be extended in future to support automated major/minor bumps.


5. Release Notes

Release notes are generated automatically from the Git history between the previous version tag and the new version tag.

Example command:

git log <previous-tag>..HEAD --pretty=format:"- %s"

Release notes are:

  • Stored in CHANGELOG.md at the repository root
  • Updated automatically during staging promotion
  • Available for auditing

CHANGELOG.md is the authoritative record of releases.


6. Environments

dev → staging → production
Environment Purpose Replicas
dev Continuous validation 1
staging Release verification 1
production Live traffic 3

Rules

  • Dev updates automatically on every commit.
  • Staging promotion creates a version.
  • Production only runs semantic version images.
  • Promotion to production requires manual approval.

7. Immutable Promotion Model

Images are built once.

Promotion moves the same image through environments:

Commit → sha-abc123 → dev
Promote → 1.14.0 → staging
Promote → 1.14.0 → production

There is no rebuild between environments.

Production runs the exact image tested in staging.


8. Monorepo Deployment Model

The Smartsapp monorepo contains multiple deployable applications.

Each application:

  • Is registered separately in Devtron
  • Points to the same repository
  • Uses a path filter to trigger builds only when relevant files change

Backend — System API

Repo Path backend/
Containerfile backend/Containerfile
Build Gradle 8.12 + JDK 21
Runtime Eclipse Temurin 21 JRE
Deploy Target Kubernetes
Health Endpoint /actuator/health

Frontend — Admin Portal

Repo Path ui-clients/frontend-web-apps/apps/admin-portal/
Containerfile Multi-stage (Node → Nginx)
Deploy Target Kubernetes
Health Endpoint /

Mobile Applications

Repo Paths ui-clients/mobile-hybrid-apps/apps/...
Output APK / IPA
Distribution App Stores

Mobile applications are built separately and are not deployed via Kubernetes.


9. Pipeline Structure

Bitbucket Pipelines runs on every push to main.

Pipeline Steps (main branch)

  1. Run Test Suite + Admin Portal Vitests + Admin Portal E2E Playwright Smoke (parallel) — Gradle build + tests with Testcontainers (Postgres + Kafka) runs alongside admin-portal vitest + coverage delta gate, and the Playwright @smoke browser tier (see docs/AI_CONTEXT.md > "CI gates" for the full gate model).
  2. Build & Push Images (parallel) — Backend, System Documentation, Admin Portal, Parent Portal, Coverage Dashboard.
  3. Update Infrastructurekubectl apply for K8s manifests (probes, resources, config).
  4. Deploy (parallel) — Backend, System Documentation, Admin Portal, Parent Portal, Coverage Dashboard via kubectl set image.
  5. Post-deploy (parallel) — Smoke Test Backend (Newman/Postman), Deploy System Directory.

Pipeline Steps (pull requests)

  1. Run Test Suite + Admin Portal Vitests + Admin Portal E2E Playwright Smoke (parallel) — same backend and frontend gates as main. PR cannot merge with a failing pipeline (requires "Successful builds" merge check in Bitbucket repo settings).

Smoke tests verify: health endpoints, OpenAPI docs, and key module APIs (school, canteen, sample). The collection lives at e2e/postman/smoke-tests.json.

Staging

  • Manual promotion.
  • Triggers automatic version creation.
  • Deploys semantic version image.

Production

  • Manual promotion.
  • Deploys same semantic version image.
  • No rebuild.

10. Deployment Strategy

The backend uses RollingUpdate with maxUnavailable: 0 and maxSurge: 1 for zero-downtime releases.

How It Works

  1. New pod is created with the updated image.
  2. New pod must pass startup and readiness probes.
  3. Once ready, the old pod is terminated.
  4. Traffic shifts seamlessly — at least one healthy pod is always serving.

Health Probes

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 10

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 120
  periodSeconds: 15
  timeoutSeconds: 10
  failureThreshold: 5

startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 20
  periodSeconds: 10
  timeoutSeconds: 10
  failureThreshold: 30
  • Startup gives the JVM up to ~5 minutes to boot and settle Kafka consumers before liveness kicks in.
  • Liveness checks only livenessState — Kafka/Redis outages do not restart the pod.
  • Readiness checks only readinessState — gates traffic to healthy pods.

11. Rollback

If the green deployment fails (pods crash or never become ready), the Service selector remains on blue. No traffic is affected.

If an issue is detected after switchover:

  1. Revert the Service selector to the blue pods.
  2. Traffic returns to the previous version instantly.
  3. No rebuild or redeployment is needed.

Blue pods are retained for a configurable period after switchover to enable this fast rollback.

For older versions, use Devtron:

  1. Open the application in Devtron.
  2. Go to Deployment History.
  3. Select a previous version.
  4. Click Rollback.

12. Hotfix Protocol

Use this protocol only when production is down or critically degraded and the fix is small, well-understood, and time-sensitive.

When to use

  • Production outage or critical degradation
  • The fix is isolated (1-3 files) and the root cause is confirmed
  • Waiting for the full test suite (~10 min) is not acceptable

Who can authorize

A hotfix requires verbal or written approval from the tech lead or the on-call engineer before merging.

Steps

  1. Branch and fix. Create a short-lived branch from main with the fix.

  2. Commit with [hotfix] tag.

    fix(auth): patch session token crash [hotfix]
    

    Use --no-verify to skip the local pre-commit test hook:

    git commit --no-verify -m "fix(auth): patch session token crash [hotfix]"
    
  3. Merge to main. The CI pipeline detects [hotfix] in the commit message and skips tests — it only builds the JAR and container image.

  4. Verify on dev. Confirm the fix works on the dev environment.

  5. Fast-track promote. Promote through staging to production using the normal promotion flow.

  6. Mandatory follow-up (within 24 hours). Push a normal commit (without [hotfix]) that runs the full test suite against the hotfix code. This ensures test coverage is restored.

Audit trail

All hotfix commits are searchable:

git log --grep='\[hotfix\]'

Rollback

If the hotfix makes things worse, use the existing rollback procedure (Section 11) — the blue-green deployment model allows instant traffic reversion.


References