The architecture of microservice
azatcloud — Infrastructure deployment, as actually performed
End-to-end record of bringing azatcloud Phase 3 (microservices) up from a bare
Ubuntu VPS to a live, TLS-terminated https://azat.cloud on single-node K3s.
This is the "what we really did" companion to DEPLOYMENT.md — including the two
problems we hit and how we fixed them.
- Box: DigitalOcean droplet, 8 vCPU / 16 GB / 463 GB, Ubuntu 22.04.5, public IP
165.232.108.66. - User:
appuser(uid 1000, insudogroup; passwordless sudo intentionally NOT enabled). - Domain:
azat.cloud(+www) on Cloudflare, DNS-only (grey cloud) so HTTP-01 reaches the origin. - Result: all 5 services + Postgres/Redis/NATS-JetStream/Jaeger running; trusted Let's Encrypt cert; admin access working.
Toolchain installed: Docker 29.5.3 · K3s v1.35.5+k3s1 · Helm v3.21.0 · cert-manager (jetstack chart) · openssl (preinstalled).
1. Architecture
1.1 Request + event topology
Internet
│ DNS: azat.cloud, www.azat.cloud ──► 165.232.108.66
▼
┌──────────────────────┐
│ Traefik (k3s) │ :80 (ACME/HTTP-01) + :443 (TLS)
│ ingressClass=traefik │ cert: azatcloud-tls (Let's Encrypt prod)
└───────────┬──────────-┘
/media/* │ / (catch-all) www ── 301 ──► apex
▼ ▼
┌────────────┐ ┌────────────┐
│ media │ │ web (BFF) │ :8080 server-side rendered UI + /admin
│ :8083 │ └─────┬──────┘
└─────┬──────┘ │ Connect-go RPC (in-cluster)
│ ┌───────┴───────┐
│ ▼ ▼
│ ┌────────┐ ┌──────────┐
│ │ auth │ │ article │
│ │ :8081 │ │ :8082 │
│ └───┬────┘ └────┬─────┘
│ │ │ outbox relay (transactional outbox)
│ │ ▼
│ │ ┌──────────┐ subject: articles.published
│ │ │ NATS │◄──────────────────────────┐
│ │ │ JetStream│ │
│ │ │ :4222 │───► subscribe ──► ┌──────────────┐
│ │ └──────────┘ │ notification │ :8084
│ │ │ (email/log) │
▼ ▼ ▼ ▼ └──────────────┘
┌──────────────────────────────────────────────────┐
│ Postgres :5432 (auth_db, article_db, │ Redis :6379 (auth sessions)
│ media_db, notification_db) │ Jaeger :4317 OTLP (traces)
└──────────────────────────────────────────────────┘
- RS256 JWT, verify-only:
authsigns with the private key;article/media/webverify with the public key. Both PEMs live in thejwt-keyssecret, mounted read-only at/keys. - Reliable events:
articlewrites domain events to a DB outbox in the same transaction; an outbox relay publishes them to NATS JetStream;notificationsubscribes toarticles.published.
1.2 Kubernetes objects (namespace azatcloud, 31 objects)
| Kind | Count | What |
|---|---|---|
| Deployment | 6 | auth, article, media, notification, web, jaeger |
| StatefulSet | 3 | postgres, redis, nats (each with a PVC via volumeClaimTemplate) |
| Service | 9 | 5 services + postgres + redis + nats + jaeger |
| Ingress | 1 | apex /media→media, /→web; www→web (redirected) |
| Middleware (Traefik) | 1 | www-to-apex 301 redirect |
| HorizontalPodAutoscaler | 2 | article (2–5), web (2–6) @ 70% CPU |
| PodDisruptionBudget | 2 | article, web |
| NetworkPolicy | 5 | default-deny + 4 allow rules (see §7.1) |
| PersistentVolumeClaim | 1 | media-data (5Gi) |
| ConfigMap | 1 | postgres-init (creates the 4 DBs/roles) |
| Secret | 2 | jwt-keys, azatcloud-secrets |
2. Prerequisites (DNS + access)
# DNS (at Cloudflare): A records, grey cloud (DNS only)
# azat.cloud A 165.232.108.66
# www.azat.cloud A 165.232.108.66
# Verify from the box:
getent hosts azat.cloud www.azat.cloud # both -> 165.232.108.66
# SSH deploy key so the box can pull the private GitLab repo (read-only)
ssh-keygen -t ed25519 -C "appuser@azatcloud-vps" -f ~/.ssh/id_ed25519 -N ""
cat ~/.ssh/id_ed25519.pub # add as a read-only Deploy Key in GitLab
ssh -T git@gitlab.com # expect: "Welcome to GitLab, @<user>!"
git clone git@gitlab.com:azat1hajy/azatcloud.git ~/azatcloud
cd ~/azatcloud && git checkout phase-3-microservices
⚠️ Gitignore gotcha:
.gitignorecontains an unanchoredazatcloudline (meant for the built binary) which also matches the chart directorydeploy/helm/azatcloud/— so the chart is invisible to git. Either copy the chart onto the box manually, or fix the rule (azatcloud→/azatcloud) andgit add deploy/helm. See §10.
3. Host tooling (run by appuser; needs sudo)
# Docker — only to build the 5 images (k3s uses its own containerd)
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER # docker without sudo (re-login to apply)
# K3s — single-node Kubernetes; bundles Traefik, CoreDNS, metrics-server,
# local-path storage, and a NetworkPolicy controller
curl -sfL https://get.k3s.io | sh -
# kubeconfig owned by appuser (so kubectl/helm work without sudo)
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config && chmod 600 ~/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc # k3s' kubectl defaults to the root file otherwise
export KUBECONFIG=$HOME/.kube/config
# Helm
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Build tool (not preinstalled on the minimal image)
sudo apt-get update -qq && sudo apt-get install -y make
# Verify
docker --version && kubectl get nodes -o wide && helm version
kubectl get nodes must show the node Ready before continuing.
4. RS256 JWT keypair
cd ~/azatcloud
make gen-keys # or, if make is unavailable, the openssl equivalent:
# mkdir -p keys
# openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out keys/jwt_private.pem
# openssl rsa -in keys/jwt_private.pem -pubout -out keys/jwt_public.pem
# chmod 644 keys/*.pem
ls -l keys/ # jwt_private.pem, jwt_public.pem (keys/ is gitignored)
5. cert-manager + Let's Encrypt issuers
helm repo add jetstack https://charts.jetstack.io && helm repo update
helm install cert-manager jetstack/cert-manager -n cert-manager \
--create-namespace --set crds.enabled=true --wait
kubectl apply -f deploy/cert-manager/cluster-issuer.yaml # letsencrypt-staging + -prod (HTTP-01 via Traefik)
kubectl get clusterissuer # both READY=True
ACME / HTTP-01 flow (why §7.1's NetworkPolicy matters):
Let's Encrypt ──GET http://azat.cloud/.well-known/acme-challenge/<token>──►
Traefik :80 ──► cm-acme-http-solver pod :8089 ──► 200 <key-authz> ──► cert issued
6. Namespace + secrets
kubectl create namespace azatcloud
# (a) RS256 keypair, mounted at /keys
kubectl -n azatcloud create secret generic jwt-keys \
--from-file=keys/jwt_private.pem --from-file=keys/jwt_public.pem
# (b) App config secret (envFrom). Build it from a FILTERED .env so compose-era
# keys (DATABASE_URL, JWT_PRIVATE_KEY_PATH, POSTGRES_PASSWORD…) don't leak
# into every service via envFrom. Only the 7 keys the chart expects:
umask 077
grep -E '^(GOOGLE_CLIENT_ID|GOOGLE_CLIENT_SECRET|RESEND_API_KEY|EMAIL_FROM|ADMIN_EMAILS)=' .env > /tmp/s.env
echo 'GOOGLE_LOGIN_ENABLED=true' >> /tmp/s.env
echo 'NOTIFY_EMAIL=azat1hajy@gmail.com' >> /tmp/s.env
kubectl -n azatcloud create secret generic azatcloud-secrets --from-env-file=/tmp/s.env
rm -f /tmp/s.env
kubectl -n azatcloud get secret azatcloud-secrets # DATA=7
azatcloud-secrets keys
| Key | Purpose | Notes |
|---|---|---|
| GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET | Google OAuth login | redirect URI must include https://azat.cloud in Google console |
| GOOGLE_LOGIN_ENABLED | toggle Google login | true |
| RESEND_API_KEY | transactional email | empty here → email sending off until set |
| EMAIL_FROM | sender address | no-reply@azat.cloud (verify the domain in Resend to actually send) |
| ADMIN_EMAILS | bootstrap admins | see §9 — only applied at first signup |
| NOTIFY_EMAIL | admin notification recipient |
The
.envoriginally shippedADMIN_EMAILS=you@example.com(placeholder) — the cause of the admin issue in §9.
7. Build images, import, deploy
cd ~/azatcloud
make images # builds auth/article/media/notification/web (multi-stage, golang:1.26)
make k3s-import # docker save | sudo k3s ctr images import (no external registry)
sudo k3s ctr images ls | grep azat1hajy/azatcloud # all 5, tag :latest
# Deploy against STAGING first (avoids burning LE prod rate limits)
helm upgrade --install azatcloud deploy/helm/azatcloud -n azatcloud \
--set ingress.clusterIssuer=letsencrypt-staging
kubectl -n azatcloud wait --for=condition=Ready pods --all --timeout=240s
kubectl -n azatcloud wait --for=condition=Ready certificate/azatcloud-tls --timeout=150s
Pods crash-loop 2–3× on first boot while Postgres initialises the per-service
DBs, then connect, run their embedded migrations (//go:embed *.sql +
postgres.Migrate in each main.go), and go Ready. This is expected.
7.1 Fix #1 — ACME challenge blocked by NetworkPolicy (502)
The first staging cert stuck at pending:
Reason: Waiting for HTTP-01 challenge propagation: wrong status code '502', expected '200'
Cause: the chart's default-deny-ingress (podSelector: {}) blocks all
ingress; the allow rules cover web/media/auth/article/datastores but not the
ephemeral cm-acme-http-solver pods (acme.cert-manager.io/http01-solver=true,
port 8089). Traefik → solver was denied → 502.
Fix (added to deploy/helm/azatcloud/templates/networkpolicy.yaml, then helm upgrade):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-to-acme-solver
spec:
podSelector:
matchLabels:
acme.cert-manager.io/http01-solver: "true"
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- { protocol: TCP, port: 8089 }
This is required for renewals too — without it certs silently fail to renew (~60 days). Must be synced back to the repo (see §10).
7.2 Switch to the production certificate
helm upgrade azatcloud deploy/helm/azatcloud -n azatcloud \
--set ingress.clusterIssuer=letsencrypt-prod
kubectl -n azatcloud delete secret azatcloud-tls # force fresh issuance from prod
kubectl -n azatcloud wait --for=condition=Ready certificate/azatcloud-tls --timeout=150s
8. Verification
export KUBECONFIG=$HOME/.kube/config
kubectl -n azatcloud get pods # all 1/1 Running
kubectl -n azatcloud get certificate # azatcloud-tls READY=True (issuer letsencrypt-prod)
# Public HTTPS (trusted cert + www redirect)
curl -sS -o /dev/null -w 'apex HTTP %{http_code} verify=%{ssl_verify_result}\n' https://azat.cloud/
curl -sS -o /dev/null -w 'www HTTP %{http_code} -> %{redirect_url}\n' https://www.azat.cloud/
echo | openssl s_client -connect azat.cloud:443 -servername azat.cloud 2>/dev/null \
| openssl x509 -noout -issuer -subject -dates
curl -sS https://azat.cloud/ | grep -o '<title>[^<]*</title>'
# Event pipeline (after publishing an article in /admin)
kubectl -n azatcloud logs deploy/article | grep -i 'outbox relay started'
kubectl -n azatcloud logs deploy/notification | grep 'subscribed to events'
kubectl -n azatcloud logs deploy/notification | grep 'articles.published'
Expected: apex HTTP 200 verify=0, www 301, issuer = Let's Encrypt, title renders.
9. Admin & author roles
Roles live in auth_db.users.roles (text[]; values admin / author /
reader). rolesForEmail grants admin only at first signup for emails in
ADMIN_EMAILS, otherwise reader. There is no self-serve author signup and no
role-management UI/RPC — author is granted manually.
Fix #2 — first login created a reader, not an admin
Because ADMIN_EMAILS was the placeholder at first login, the account was
created as reader. Changing ADMIN_EMAILS does not retroactively promote
an existing account, so we fixed both:
# correct the secret for future signups, reload auth
kubectl -n azatcloud patch secret azatcloud-secrets --type merge \
-p "{\"data\":{\"ADMIN_EMAILS\":\"$(printf %s 'azat1hajy@gmail.com' | base64 -w0)\"}}"
kubectl -n azatcloud rollout restart deploy/auth
# promote the existing account directly
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
-c "UPDATE users SET roles='{admin}', updated_at=now() WHERE lower(email)=lower('azat1hajy@gmail.com');"
Grant admin or author to any user (manual)
# admin:
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
-c "UPDATE users SET roles='{admin}', updated_at=now() WHERE lower(email)=lower('<email>');"
# author (keep reader too -> '{author,reader}'):
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
-c "UPDATE users SET roles='{author}', updated_at=now() WHERE lower(email)=lower('<email>');"
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db -c 'SELECT email, roles FROM users;'
The user must have logged in once (so the row exists) and must log out and back in after — the role is baked into the JWT at login.
Role capabilities (pkg/authz/authz.go): admin = manage all; author =
create + edit/delete own articles (counts as staff for /admin); reader = default.
10. Sync back to git (before merging phase-3 → main)
Two things made on the box are NOT yet in the repo:
- The Helm chart is gitignored. Fix
.gitignore(azatcloud→/azatcloud), thengit add deploy/helm. - The
allow-ingress-to-acme-solverNetworkPolicy (§7.1) — confirm it's intemplates/networkpolicy.yaml.
sed -i 's#^azatcloud$#/azatcloud#' .gitignore
git add .gitignore deploy/helm deploy/infradeploy.md
git status # verify deploy/helm/azatcloud/** is staged
git commit -m "Phase 3: helm chart + infra deploy guide + ACME netpol fix"
git push
11. Day-2 operations
# Redeploy after a code change
make images && make k3s-import
kubectl -n azatcloud rollout restart deploy/<svc> # repull :latest (IfNotPresent + same tag)
# Logs / status / scale
kubectl -n azatcloud get pods -o wide
kubectl -n azatcloud logs -f deploy/<svc>
kubectl -n azatcloud get hpa
kubectl -n azatcloud port-forward svc/jaeger 16686:16686 # traces UI -> http://localhost:16686
# Certs (auto-renew ~30 days before expiry; needs §7.1 netpol)
kubectl -n azatcloud get certificate
kubectl -n azatcloud describe certificate azatcloud-tls
# Refresh-token pruning (auth image supports a prune subcommand)
kubectl -n azatcloud run prune --rm -it --restart=Never \
--image=registry.gitlab.com/azat1hajy/azatcloud/auth:latest -- /auth-service prune
Rollback
DNS is unchanged, so rollback = stop the cluster app and restart the Phase 2
monolith (see CUTOVER.md):
kubectl -n azatcloud scale deploy --all --replicas=0 # or: helm uninstall azatcloud
sudo docker compose -f docker-compose.prod.yml up -d # monolith back
12. Outstanding hardening
- Firewall — node currently has all ports open. Restrict to 22/80/443 via a DigitalOcean Cloud Firewall (host
ufwon k3s risks breaking pod networking). - HTTP→HTTPS redirect —
http://azat.cloudserves 200 instead of redirecting; add a TraefikredirectSchememiddleware on the web entrypoint. - Google OAuth — add
https://azat.cloud/...as an authorized redirect URI in the Google console. - Email — set a real
RESEND_API_KEYand verifyazat.cloudin Resend to enable sending. - Secrets — DB passwords are plaintext in the chart (
auth_pw, …) and app secrets are plain k8s Secrets; migrate to External-Secrets/Vault (deploy/external-secrets/) and per-DB credentials. - Role management — add a
SetRolesRPC + admin users page so authors/admins can be promoted from the UI instead of viapsql(§9). - Backups — schedule
pg_dumpof the 4 databases and snapshot the media PVC.