Automating Deployment with GitHub Actions and AWS | Bradley Matera

Context: These workflows power this Gatsby site (Netlify), Car-Match (GitHub Pages frontend + Render backend), CheeseMath (GitHub Pages), and AWS labs. No enterprise-scale systems—just my projects.
AI assist: ChatGPT helped me structure the write-up; the YAML snippets come straight from the repos.
Status: Everything described here runs today. Some AWS deploys still live in lab accounts, and I call that out below.

Reality snapshot

Static apps: Gatsby/React/Vite sites build via Yarn/Bun, lint/test, then deploy to Netlify or GitHub Pages.
APIs: Express apps build Docker images, push to GHCR/ECR, and deploy to Render or EKS labs.
Serverless: SAM/Serverless Framework packages functions, assumes AWS roles via OIDC, and deploys to sandbox accounts.
Observability: Slack webhooks for success/failure, /healthz smoke tests, version tags (vYYYY.MM.DD.N). No on-call, but I check logs daily.

Baseline workflow (static site example)

name: ci-static

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build-test-deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # for OIDC when targeting AWS
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm run lint
      - run: npm run test --if-present
      - run: npm run build
      - name: Deploy to Netlify
        if: github.ref == 'refs/heads/main'
        uses: netlify/actions/cli@v4
        with:
          args: deploy --dir=public --prod
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}

Why it works: Immutable installs (npm ci), lint/tests enforced before builds, and deployments happen only on main. Pull requests stop after the build so reviewers see artifacts fast.

AWS deployments (labs + future hosting)

deploy-aws:
    needs: build-test-deploy
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions
          role-session-name: github-actions
          aws-region: us-east-1
      - run: sam build
      - run: sam deploy --no-confirm-changeset --stack-name media-pipeline --region us-east-1

OIDC means I don’t store long-lived AWS keys in GitHub.
Stacks deploy to AWS Academy / sandbox accounts; README files state that clearly so nobody assumes it’s production.

Containers & Render deploys

Build + scan images (docker build, docker scan, Trivy).
Push to GitHub Container Registry.
Trigger Render via webhook (free tier for Car-Match backend). README warns about 5-minute cold starts.

Observability & rollback habits

Smoke tests (scripts/smoke.sh) hit /healthz + a basic user flow post-deploy. Failures run scripts/rollback.sh to redeploy the previous artifact.
Slack notifications (success/failure) include commit SHA, author, and deploy URL.
Tags like v2025.10.15.1 map to Netlify deploy IDs and Render releases, making it easy to diff.

Governance & cost guardrails

Branch protections: lint/test/build must pass before merging.
Secrets live in GitHub environments (dev/stage/prod) with required reviewers.
Nightly npm audit + pip-audit workflows log findings; high severity issues block merging until resolved.
Cost dashboards (Netlify, Render, Mongo Atlas, AWS Budgets) alert via email when spend crosses thresholds—even if it’s only a few dollars.

Real-world incidents and fixes

Broken Netlify build (missing sharp deps): Added NETLIFY_USE_YARN + NODE_OPTIONS=--max_old_space_size=4096 and documented a manual retry script.
Render cold starts causing 500s: Added a synthetic warm-up curl in the cron workflow and surfaced a “server is waking up” banner on the frontend to stay honest.
Stale cache on GitHub Pages: A rogue service worker served old assets. I added a “clear SW” checklist to deploy notes and a version.txt endpoint to verify freshness.
ECR permission error: OIDC role was missing ecr:GetAuthorizationToken; fixed the policy and wrote a README section on least privilege.

Security checks baked into CI

Dependency scanning with npm audit/pip-audit; Trivy for images.
Secrets scanning via gitleaks on PRs.
OIDC everywhere possible—no long-lived AWS keys.
Deployment steps bail on unsigned commits or dirty trees in release branches.

Release cadence and communication

PR stage: Build + lint + unit tests. PR comment shows artifact size and lighthouse summary when applicable.
Main merges: Auto-deploy to preview/stage. Slack gets the URL plus /healthz result.
Production: Manual approval with a short checklist: migration status, smoke tests passed, roll-forward plan in place.
Post-release: If metrics degrade or smoke fails, run rollback script and post a short retro in the repo.

Playbooks I keep on hand

“Deploy is broken” flow: Check runner logs → rerun with debug → verify secrets/env → isolate failing step.
“AWS creds failing” flow: Confirm OIDC role trust policy, session duration, and GitHub environment.
“Static assets stale” flow: Purge CDN, unregister service worker, check version.txt, then redeploy if needed.

How I’d scale this for a team

Split workflows by concern (lint/test/build vs deploy) with required approvals.
Add ephemeral preview environments per PR (Netlify deploy previews + seeded test data).
Introduce policy-as-code (Checkov or OPA) for Terraform/CloudFormation before apply.
Rotate secrets automatically and surface drift via nightly checks.

To-do list

Add Playwright-based synthetic tests that run automatically after deploys.
Publish my workflow templates so classmates can fork them instead of copying snippets.
Experiment with Checkly/Upptime for external monitoring.
Migrate more workloads to the AWS adapter (S3/CloudFront) once I clean up the sharp install process.

Reality snapshot

Baseline workflow (static site example)

AWS deployments (labs + future hosting)

Containers & Render deploys

Observability & rollback habits

Governance & cost guardrails

Real-world incidents and fixes

Security checks baked into CI

Release cadence and communication

Playbooks I keep on hand

How I’d scale this for a team

To-do list

References