Context: These workflows power this Gatsby site (Netlify), Car-Match (GitHub Pages frontend + Render backend), CheeseMath (GitHub Pages), and AWS labs. No enterprise-scale systems—just my projects.
AI assist: ChatGPT helped me structure the write-up; the YAML snippets come straight from the repos.
Status: Everything described here runs today. Some AWS deploys still live in lab accounts, and I call that out below.

Reality snapshot

  • Static apps: Gatsby/React/Vite sites build via Yarn/Bun, lint/test, then deploy to Netlify or GitHub Pages.
  • APIs: Express apps build Docker images, push to GHCR/ECR, and deploy to Render or EKS labs.
  • Serverless: SAM/Serverless Framework packages functions, assumes AWS roles via OIDC, and deploys to sandbox accounts.
  • Observability: Slack webhooks for success/failure, /healthz smoke tests, version tags (vYYYY.MM.DD.N). No on-call, but I check logs daily.

Baseline workflow (static site example)

name: ci-static
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-test-deploy:
runs-on: ubuntu-latest
permissions:
id-token: write # for OIDC when targeting AWS
contents: read
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run lint
- run: npm run test --if-present
- run: npm run build
- name: Deploy to Netlify
if: github.ref == 'refs/heads/main'
uses: netlify/actions/cli@v4
with:
args: deploy --dir=public --prod
env:
NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
  • Why it works: Immutable installs (npm ci), lint/tests enforced before builds, and deployments happen only on main. Pull requests stop after the build so reviewers see artifacts fast.

AWS deployments (labs + future hosting)

deploy-aws:
needs: build-test-deploy
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions
role-session-name: github-actions
aws-region: us-east-1
- run: sam build
- run: sam deploy --no-confirm-changeset --stack-name media-pipeline --region us-east-1
  • OIDC means I don’t store long-lived AWS keys in GitHub.
  • Stacks deploy to AWS Academy / sandbox accounts; README files state that clearly so nobody assumes it’s production.

Containers & Render deploys

  • Build + scan images (docker build, docker scan, Trivy).
  • Push to GitHub Container Registry.
  • Trigger Render via webhook (free tier for Car-Match backend). README warns about 5-minute cold starts.

Observability & rollback habits

  • Smoke tests (scripts/smoke.sh) hit /healthz + a basic user flow post-deploy. Failures run scripts/rollback.sh to redeploy the previous artifact.
  • Slack notifications (success/failure) include commit SHA, author, and deploy URL.
  • Tags like v2025.10.15.1 map to Netlify deploy IDs and Render releases, making it easy to diff.

Governance & cost guardrails

  • Branch protections: lint/test/build must pass before merging.
  • Secrets live in GitHub environments (dev/stage/prod) with required reviewers.
  • Nightly npm audit + pip-audit workflows log findings; high severity issues block merging until resolved.
  • Cost dashboards (Netlify, Render, Mongo Atlas, AWS Budgets) alert via email when spend crosses thresholds—even if it’s only a few dollars.

Real-world incidents and fixes

  • Broken Netlify build (missing sharp deps): Added NETLIFY_USE_YARN + NODE_OPTIONS=--max_old_space_size=4096 and documented a manual retry script.
  • Render cold starts causing 500s: Added a synthetic warm-up curl in the cron workflow and surfaced a “server is waking up” banner on the frontend to stay honest.
  • Stale cache on GitHub Pages: A rogue service worker served old assets. I added a “clear SW” checklist to deploy notes and a version.txt endpoint to verify freshness.
  • ECR permission error: OIDC role was missing ecr:GetAuthorizationToken; fixed the policy and wrote a README section on least privilege.

Security checks baked into CI

  • Dependency scanning with npm audit/pip-audit; Trivy for images.
  • Secrets scanning via gitleaks on PRs.
  • OIDC everywhere possible—no long-lived AWS keys.
  • Deployment steps bail on unsigned commits or dirty trees in release branches.

Release cadence and communication

  • PR stage: Build + lint + unit tests. PR comment shows artifact size and lighthouse summary when applicable.
  • Main merges: Auto-deploy to preview/stage. Slack gets the URL plus /healthz result.
  • Production: Manual approval with a short checklist: migration status, smoke tests passed, roll-forward plan in place.
  • Post-release: If metrics degrade or smoke fails, run rollback script and post a short retro in the repo.

Playbooks I keep on hand

  • “Deploy is broken” flow: Check runner logs → rerun with debug → verify secrets/env → isolate failing step.
  • “AWS creds failing” flow: Confirm OIDC role trust policy, session duration, and GitHub environment.
  • “Static assets stale” flow: Purge CDN, unregister service worker, check version.txt, then redeploy if needed.

How I’d scale this for a team

  • Split workflows by concern (lint/test/build vs deploy) with required approvals.
  • Add ephemeral preview environments per PR (Netlify deploy previews + seeded test data).
  • Introduce policy-as-code (Checkov or OPA) for Terraform/CloudFormation before apply.
  • Rotate secrets automatically and surface drift via nightly checks.

To-do list

  • Add Playwright-based synthetic tests that run automatically after deploys.
  • Publish my workflow templates so classmates can fork them instead of copying snippets.
  • Experiment with Checkly/Upptime for external monitoring.
  • Migrate more workloads to the AWS adapter (S3/CloudFront) once I clean up the sharp install process.

References