How I Built This
A technical deep-dive into the infrastructure and CI/CD pipeline powering this portfolio
Architecture Overview
Cost-Optimized Design: A single t4g.medium EC2 (ARM Graviton, 4GB RAM, ~$24/mo) runs k3s instead of managed EKS. Plus an Elastic IP, Route53 zone, S3 state backend, monthly EBS snapshots, and Packer-baked AMI storage. Total: ~$31/mo, well under the $40/mo budget alert and ~60% cheaper than EKS at this scale.
Technology Stack
Next.js
React framework for the website
Docker
Multi-stage build → distroless runtime, multi-arch (amd64 + arm64)
k3s
Lightweight Kubernetes on a single EC2
Envoy Gateway
Gateway API implementation handling TLS termination and routing
cert-manager
Let's Encrypt via gatewayHTTPRoute solver
ExternalDNS
Writes Route53 records from Gateway/HTTPRoute annotations
ArgoCD
App-of-Apps continuous deployment from Git
Terraform
AWS infrastructure with S3 state + native locking
Packer
Pre-bakes k3s + helm into a versioned AMI; ~60s cold-start
GitHub Actions
OIDC-authenticated builds; multi-arch image + AMI pipelines
AWS SSM
Session Manager: no SSH, no inbound 22/6443
AWS Graviton
t4g.medium (ARM) — ~20% cheaper than equivalent x86
AWS DLM
Native EBS snapshot lifecycle — monthly × 3 retention
Route53
Public zone; Squarespace delegates via NS records
Infrastructure as Code
The entire AWS infrastructure is defined in Terraform, organized into reusable modules:
terraform/ ├── tf-modules/ │ ├── aws-vpc/ # VPC, public subnet, IGW, route table │ ├── aws-k3s/ # EC2 t4g.medium, EIP, SG, IAM role w/ SSM, │ │ # user_data.sh runtime bootstrap │ └── aws-dns/ # Route53 public hosted zone for fuhriman.org ├── packer/ │ ├── k3s-portfolio.pkr.hcl # AL2023 arm64 + k3s + helm + ssm-agent │ └── scripts/ # Provisioner scripts ├── .github/workflows/ │ └── build-ami.yml # OIDC-auth Packer builds + 3-AMI retention ├── docs/plans/ # Architecture design + manual-steps docs ├── main.tf # Module composition + IAM policies ├── backend.tf # S3 + native use_lockfile (no DynamoDB) ├── budget.tf # $40/mo AWS budget alert ├── dlm.tf # Monthly EBS snapshots × 3 retention ├── oidc.tf # GitHub Actions OIDC trust + Packer IAM role ├── providers.tf # AWS provider ~> 6.31 with default_tags └── variables.tf # Configuration with validation blocks
VPC Module
Simple VPC (10.0.0.0/16) with a single public subnet in one AZ. No NAT Gateway needed — everything runs in the public subnet.
k3s Module
Single t4g.medium (4GB ARM Graviton) launched from a Packer-baked AMI with k3s, helm, and the SSM Agent pre-installed. The runtime user_data.shis just ~55 lines: fetch the public IP from IMDSv2, wire k3s's --tls-san, install ArgoCD, hand off to App-of-Apps. Cold-start to argocd-server Running: ~60 seconds.
DNS Module
A single Route53 public hosted zone for fuhriman.org. Squarespace is the registrar only — NS records delegate to Route53. ExternalDNS in-cluster manages records automatically based on HTTPRoute hostnames.
Zero-Trust Admin Access
There's no SSH server reachable from the internet. There's no public kube-apiserver. Admin happens entirely through AWS Systems Manager Session Manager.
Security Group
Inbound: only 80 and 443. No 22 (SSH), no 6443 (k8s API), no NodePort 30443. The EC2 instance has no aws_key_pair resource at all.
Interactive Shell
aws ssm start-session --target $INSTANCE_ID. IAM-authenticated, CloudTrail-audited. The EC2 instance role has AmazonSSMManagedInstanceCore attached.
kubectl via SSM Tunnel
start-session --document-name AWS-StartPortForwardingSession forwards localhost:6443 over SSM to the in-cluster API server. Local kubectl then runs against a kubeconfig that points at localhost. No public k8s API needed.
IMDSv2 Enforced
Instance metadata requires http_tokens=required withhttp_put_response_hop_limit=2. Defeats SSRF-style attacks that could otherwise reach IMDS via a compromised pod.
Routing with Gateway API
The cluster doesn't use Ingress resources at all. Routing is handled by the Kubernetes Gateway API(GA since 1.29) implemented by Envoy Gateway. ExternalDNS reads HTTPRoute resources and publishes Route53 records automatically; cert-manager issues Let's Encrypt certs via the gatewayHTTPRoute HTTP-01 solver.
GatewayClass + Gateway
Single GatewayClass named envoy, controlled by Envoy Gateway. One shared Gateway named public in envoy-gateway-system with HTTP :80 and HTTPS :443 listeners that terminate TLS using a multi-SAN cert.
HTTPRoute per Service
The website chart declares fuhriman.org + www.fuhriman.org as HTTPRoute hostnames attaching to the public Gateway. The ArgoCD chart adds argocd.fuhriman.org the same way. ExternalDNS picks both up.
Multi-SAN Let's Encrypt Cert
One Certificate resource covers all three hostnames. cert-manager issues via HTTP-01, creating a temporary HTTPRoute through the public Gateway for the ACME challenge. Auto-renews 30 days before expiry. R13 intermediate.
klipper-lb + EIP Override
k3s ships klipper-lbas its default Service LoadBalancer, which advertises the node's private IP — not what we want ExternalDNS publishing to Route53. The fix is one annotation on the Gateway: external-dns.alpha.kubernetes.io/target: 52.37.95.130 (the Elastic IP). One line, no extra LoadBalancer controller.
GitOps with ArgoCD
ArgoCD implements the GitOps pattern where Git is the single source of truth for the desired cluster state.
App of Apps Pattern
A parent Application bootstrapped by user_data.sh manages four child Applications: cert-manager, envoy-gateway, external-dns, fuhriman-website.
Sync Waves
cert-manager (-2) installs first (it owns the cert CRDs). envoy-gateway (-1) follows. external-dns + fuhriman-website (0) deploy together. Wave numbers guarantee dependency order.
Auto-Sync & Self-Heal
ArgoCD automatically applies Git changes and reverts any manual cluster modifications back to the declared state.
CI/CD Pipeline
Every push to main triggers a fully automated build and deployment pipeline:
Quality Gates
Six jobs run in parallel: Biome 2 (lint + format), TypeScript, Vitest with a 95% coverage gate, Next.js build, Playwright smoke, and Lighthouse CI. All must pass.
Build & Push (Multi-Arch)
Multi-stage Docker build with QEMU emulation produces a multi-arch image (linux/amd64 + linux/arm64) → distroless runtime. Pushed to Docker Hub with a timestamp tag (ga-YYYY.MM.DD-HHMM) and latest.
Scan
Trivy v0.69.3 (SHA-pinned) scans the pushed image for CRITICAL and HIGH CVEs with ignore-unfixed enabled. The pipeline fails if any fixable vulnerabilities surface.
Update
yq updates fuhriman-chart/values.yaml in eks-helm-charts with the new image tag; ArgoCD detects the commit and syncs the change to the k3s cluster.
name: Build and Deploy
on:
push:
branches: [main]
permissions:
contents: read # Least-privilege security
jobs:
# Six parallel quality gates — all must pass before docker runs
lint: # biome check (lint + format)
typecheck: # tsc --noEmit
test: # vitest run --coverage (95/95/95/95 gate)
build: # next build --output standalone
e2e: # playwright against built standalone
lighthouse: # perf >= 90, a11y >= 0.95, BP >= 95, SEO >= 95
docker:
needs: [lint, typecheck, test, build, e2e, lighthouse]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd7190... # SHA-pinned
- uses: pnpm/action-setup@a7487c7e... # corepack-pinned pnpm
- run: echo "tag=ga-$(date +'%Y.%m.%d-%H%M')" >> $GITHUB_OUTPUT
- uses: docker/login-action@650006c6...
- uses: docker/setup-qemu-action@49b3bc8e... # arm64 emulation
- uses: docker/setup-buildx-action@d7f5e7f5...
- uses: docker/build-push-action@f9f3042f...
with:
push: true
platforms: linux/amd64,linux/arm64 # Multi-arch for Graviton
tags: furryman/fuhriman-website:${{ steps.tag.outputs.tag }},latest
# Trivy v0.69.3 (binary pinned; addresses GHSA-69fq-xp46-6x23)
- uses: aquasecurity/trivy-action@a9c7b0f0... # SHA-pinned
with:
severity: CRITICAL,HIGH
ignore-unfixed: true
exit-code: 1
deploy:
needs: docker
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd7190...
with:
repository: furryman/eks-helm-charts
token: ${{ secrets.GH_PAT }} # repo scope — fires downstream workflows
- run: yq -i '.image.tag = "..."' fuhriman-chart/values.yaml
- run: git commit -am "Bump image" && git push # ArgoCD picks this upImmutable Infrastructure with Packer
The EC2 instance launches from a custom AMI built by Packer. k3s, helm, the SSM Agent, and Helm repo caches are pre-baked. user_data.shonly does what depends on the running instance's identity.
~60-Second Cold-Start
k3s, helm, ssm-agent, and the helm repo cache are pre-baked into the AMI. First boot is purely runtime-specific: fetch the IMDSv2 public IP for k3s's --tls-san, install ArgoCD, hand off to App-of-Apps. Instance launch to argocd-server Running: ~60 seconds. Full convergence with all certs issued: ~3 minutes.
OIDC, Not Long-Lived Keys
The build-ami.yml workflow assumes an IAM role via GitHub OIDC. github-actions-packer has a trust policy scoped to repo:furryman/terraform:*. No AWS access keys live in GitHub Secrets.
3-AMI Retention
The workflow's last step lists Packer-tagged AMIs and deregisters everything beyond the 3 most recent (snapshot cleanup included). Storage stays bounded at ~$0.30/mo for AMI snapshots, dedup'd against existing DLM snapshots.
Backups & Observability Tradeoffs
A single-node portfolio cluster doesn't need everything a production fleet does. Two deliberate calls: keep backups cheap and visible, and don't pay for observability that nothing acts on.
AWS DLM — Monthly × 3 EBS Snapshots
aws_dlm_lifecycle_policy (Data Lifecycle Manager, native AWS — no third-party scheduler) takes a snapshot of the root EBS volume on the 1st of each month at 04:00 UTC and retains the 3 most recent. Cost: pennies per month. Recovery time: a few minutes to launch a new instance from a chosen snapshot. DLM was picked over AWS Backup for cost (DLM has no per-protected-resource pricing) and over Velero/restic for simplicity (no in-cluster moving parts).
$40/mo Budget Alert
An AWS Budget watches actual spend against the cost model (~$31/mo target, $40/mo alert). If anything regresses — orphaned EIPs, runaway DLM snapshots, an instance-type drift — email lands before the AWS bill does.
No Prometheus (by choice)
A Prometheus + Grafana stack would add ~512 MiB of memory pressure to a 4 GB node and would never be acted on for a portfolio site. Chart values disable Prometheus metric emitters across envoy-gateway and external-dns. If something genuinely breaks, kubectl logs and CloudWatch Container Insights for the ec2-level signals are enough. This is a deliberate tradeoff, not negligence.
Kubernetes Resources
The website runs as a Deployment with associated Service and HTTPRoute resources:
Deployment
- 1 replica (sufficient for single-node cluster)
- Resource limits: 100m CPU, 128Mi memory
- Liveness and readiness probes on port 3000
- Rolling update strategy with health checks
- Multi-arch image — runs on the Graviton instance
Service
- ClusterIP type for internal access
- Port 80 → target port 3000
- Label selector for pod discovery
HTTPRoute (Gateway API)
- Attaches to the shared
publicGateway viaparentRefs - Hostnames: fuhriman.org, www.fuhriman.org
- TLS terminates at the Gateway (not the Service)
- Path prefix
/→ backend Service port 80 - ExternalDNS reads the hostnames and writes Route53 A records
Repository Structure
The project is organized across 4 repositories following separation of concerns:
Key DevOps Principles
Infrastructure as Code
All infrastructure is version-controlled in Terraform, enabling reproducible deployments and peer review of changes. Variables have validation blocks; providers are pinned with ~> constraints.
GitOps
Git is the single source of truth. All changes flow through commits, providing audit trails and rollback capabilities. ArgoCD's self-heal reverts manual cluster edits automatically.
Immutable Infrastructure
Both container images and the host AMI are immutable artifacts with versioned tags. Packer rebuilds the AMI; user_data_replace_on_change=true means a new bootstrap script always lands on a fresh instance.
Declarative Configuration
Desired state is declared in YAML (HTTPRoutes, Applications, Certificates, Gateways). Kubernetes and ArgoCD continuously reconcile actual state to match.
Cost Optimization
k3s on a single t4g.medium (ARM Graviton) keeps the bill at ~$31/mo vs ~$80+ for managed EKS at equivalent scale. ARM saves ~20% over x86 at the same memory tier with no observable performance loss for this workload.
Least Privilege & Zero Trust
SSM-only admin (no SSH, no public k8s API). IMDSv2 enforced. GitHub Actions OIDC instead of long-lived keys. IAM policies scoped narrowly (ExternalDNS to one zone, Packer to specific EC2 actions).