Context: Summer 2025 AWS Cloud Support Associate internship in Seattle (Oscar building). My cohort lived inside labs, mock tickets, and certification prep—no direct customer production support.
AI assist: ChatGPT + Amazon Q Business helped summarize service docs and draft troubleshooting checklists; I edited everything before submitting to mentors.
Status: Honest reflection so recruiters see exactly what I touched (and what remains on the “practice only” list).

Reality snapshot

  • 12-week program split between mornings (Cloud Practitioner/SAA coursework, architecture reviews) and afternoons (ticket simulations, hands-on labs, runbook writing).
  • I rotated through EC2, S3, IAM, networking, and observability labs; each lab ended with a quiz + short retro shared with a senior engineer.
  • Capstone was a media metadata pipeline built entirely in AWS sandbox accounts: S3 → Lambda (FFmpeg) → DynamoDB → API Gateway + static dashboard. No external users relied on it.
  • Tracked lab completion, quiz scores, ticket MTTR in simulations, and budget alerts; shipped weekly retros (“what broke / what I fixed / what I still don’t know”) to mentors.

Table of contents

Weekly structure

Week(s)FocusDeliverables
1–2Orientation, Cloud Practitioner refreshDaily lab reports, IAM policy walk-through, “how to escalate” checklist.
3–4Linux + networking deep diveTroubleshoot EC2 boot loops, build VPC peering diagrams, script CloudWatch log exports.
5–6Storage & securityS3 bucket policy labs, KMS envelope encryption exercises, Bedrock prompt-logging prototype.
7–8Observability + automationCloudWatch dashboard for mock SaaS, Cost Explorer alarms, npm audit playbooks.
9–10Capstone buildS3→Lambda→DynamoDB metadata pipeline, runbook, health checks, IaC template.
11Support simulationsPager-style ticket drills, on-call shadowing, Amazon Leadership Principles reviews.
12Presentations + retrosCapstone demo, personal growth plan, peer feedback write-up.

Troubleshooting drills I ran

  • EC2 boot loops: Collected console output, diffed failed user-data scripts, rebuilt launch templates. Lesson: user-data must be idempotent, and CloudWatch agent config drifts silently.
  • VPC reachability: Reachability Analyzer + traceroute inside bastions; caught mismatched CIDRs and missing return routes during peering labs.
  • S3 “Access Denied” mazes: IAM Policy Simulator + CloudTrail to isolate missing Principal/Condition in cross-account bucket policies; wrote a step-by-step playbook.
  • Cost spikes: “Rapid response” checklist: Budgets alert → Cost Explorer by service → orphaned EBS + idle NAT → snapshot/terminate → tag everything.
  • CloudWatch log flooding: Temporary retention policy, metric filters for error bursts, and alarms to detect runaway debug logs.
  • Bedrock prompt logging: Prototyped logging wrapper to capture prompts/outputs for audit; documented privacy/legal considerations before any production use.

Capstone: media metadata pipeline (lab-only)

Architecture

  • Input: Files land in media-ingest-bucket.
  • Processing: Node.js 20 Lambda pulls the object, shells out to FFmpeg to extract metadata, pushes a compact JSON doc to DynamoDB.
  • API: API Gateway exposes read-only endpoints so a static dashboard (S3 + CloudFront) can query the table.
  • Observability: CloudWatch logs, metrics, and alarms track Lambda duration, DynamoDB throttle counts, and FFmpeg exits. Budgets/Cost Explorer alerts guard the lab account.
Resources:
MediaBucket:
Type: AWS::S3::Bucket
Properties:
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:*
Function: !GetAtt MetadataLambda.Arn
MetadataLambda:
Type: AWS::Lambda::Function
Properties:
Runtime: nodejs20.x
Handler: index.handler
Code:
S3Bucket: !Ref ArtifactBucket
S3Key: lambda.zip
Environment:
Variables:
TABLE_NAME: !Ref MetadataTable
MetadataTable:
Type: AWS::DynamoDB::Table
Properties:
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: FileKey
AttributeType: S
KeySchema:
- AttributeName: FileKey
KeyType: HASH

What worked

  • Lambda stayed under 2 GB memory/30 s duration even when FFmpeg processed 250 MB sample files.
  • DynamoDB recorded ~300 sample rows with zero throttling thanks to on-demand mode.
  • CloudWatch dashboard (latency, invocations, FFmpeg exit codes, DynamoDB consumed RCUs) made it easy to talk through the design review.
  • Step Functions “stretch goal” doc lists how I’d fan-out enrichment jobs if this ever handled more than demo traffic.

What still needs work

  • Replace API key auth with Cognito + IAM authorizers (on the backlog).
  • Integration tests exist locally but CI/CD only runs lint + unit tests. Need to script sam validate, deploy to a staging stack, and capture screenshots automatically.
  • FFmpeg binary came from a public layer; I owe the team a security review and pinning strategy before recommending it anywhere else.
  • Add signed URLs and lifecycle rules on the ingest bucket so lab data ages out automatically.
  • Improve cold-starts: explore container-based Lambda vs. slimmer FFmpeg layer; measure impact and document trade-offs.
  • Load test with varied media types and longer runs; current numbers are from small samples.

Tooling & automations I leaned on

  • Docs-as-code: Every lab ended with a markdown runbook + diagram (Mermaid + Excalidraw). These live in notes/aws-internship/ and were reviewed by mentors weekly.
  • Cost visibility: Budgets (email + Slack) triggered at 10% and 25% of the sandbox allowance, mostly to prove I could wire alerts.
  • Security workflows: npm audit CI (both frontend + backend), OWASP ZAP baseline workflow for the Render-deployed intern app, Bedrock prompt logging experiments with Amazon Q.
  • AI helpers: Amazon Q Business answered “where does this service log?” while ChatGPT helped translate dense docs into playbooks. Every AI-assisted snippet is annotated in the repo so it’s obvious what I edited.
  • Observability kits: Reusable CloudWatch dashboards for EC2/Lambda/RDS; alarm templates for errors, latency p90/p99, throttles.
  • Runbook template: Intro → Symptoms → Timeline → Logs/metrics links → Fix → Prevent → Open questions. Kept consistency across labs.
  • Retro cadence: Weekly self-review to mentors: “what worked, what broke, what I still don’t understand,” with ticket IDs and lab links.

Proof & artifacts

  • Lab tracker: https://github.com/BradleyMatera/aws-internship-journal (private; screenshots/redacted excerpts available).
  • Dashboards: dashboards/cloudwatch-dashboard.json plus PNG exports in the repo.
  • Runbooks: See notes/runbooks/*.md—each links to the relevant log groups, budgets, or config files.
  • Capstone deck: PDF stored under presentations/capstone-media-pipeline.pdf with the architecture diagram, metrics, and TODO table.
  • Quizzes/assessments: Scores + notes per module; gaps highlighted (SCPs, advanced VPC patterns).
  • Ticket drills: Stored transcripts/timelines from mock pager escalations; anonymized for interview use.

Gaps & next steps

  • Earn Developer Associate and re-run the labs with IaC-first deployments.
  • Pair with a real AWS Support engineer on a shadow shift to see how customer tickets differ from our simulations.
  • Harden IAM knowledge (resource-level permissions, SCPs, org design) beyond what the internship covered.
  • Turn the FFmpeg Lambda into a public tutorial once I replace the binary layer and add full test coverage.
  • Build a “cost kill switch” pattern (Budgets → SNS → Lambda to tag/stop idle resources) to prove automated cleanup.
  • Add synthetics against the capstone API (API Gateway → Lambda → DynamoDB) and publish results with dashboards.

Interview stories I reuse

  • EC2 boot-loop fix: Found bad user-data and broken CloudWatch agent; rewrote idempotently, added alarms—shows calm debugging under pressure.
  • S3 Access Denied: Used Policy Simulator + CloudTrail to prove missing Principal condition; fixed cross-account access—demonstrates systematic troubleshooting.
  • Cost spike response: Identified idle NAT + orphaned EBS, tagged/terminated, set budgets/alerts—ownership + cost awareness.
  • Capstone cold-starts: Measured FFmpeg cold-start impact, proposed container-based Lambda + signed URLs—trade-off thinking and next steps.

References