azatcloud / devops notes
← back

The architecture of microservice

Jun 10, 2026 · 11 min read microservicegok8straefik

azatcloud — Infrastructure deployment, as actually performed

End-to-end record of bringing azatcloud Phase 3 (microservices) up from a bare Ubuntu VPS to a live, TLS-terminated https://azat.cloud on single-node K3s. This is the "what we really did" companion to DEPLOYMENT.md — including the two problems we hit and how we fixed them.

  • Box: DigitalOcean droplet, 8 vCPU / 16 GB / 463 GB, Ubuntu 22.04.5, public IP 165.232.108.66.
  • User: appuser (uid 1000, in sudo group; passwordless sudo intentionally NOT enabled).
  • Domain: azat.cloud (+ www) on Cloudflare, DNS-only (grey cloud) so HTTP-01 reaches the origin.
  • Result: all 5 services + Postgres/Redis/NATS-JetStream/Jaeger running; trusted Let's Encrypt cert; admin access working.

Toolchain installed: Docker 29.5.3 · K3s v1.35.5+k3s1 · Helm v3.21.0 · cert-manager (jetstack chart) · openssl (preinstalled).


1. Architecture

1.1 Request + event topology

                          Internet
                             │  DNS: azat.cloud, www.azat.cloud  ──►  165.232.108.66
                             ▼
                  ┌──────────────────────┐
                  │  Traefik (k3s)        │  :80 (ACME/HTTP-01) + :443 (TLS)
                  │  ingressClass=traefik │  cert: azatcloud-tls (Let's Encrypt prod)
                  └───────────┬──────────-┘
            /media/*          │ /  (catch-all)        www  ── 301 ──► apex
                ▼             ▼
        ┌────────────┐  ┌────────────┐
        │  media     │  │  web (BFF) │  :8080   server-side rendered UI + /admin
        │  :8083     │  └─────┬──────┘
        └─────┬──────┘        │ Connect-go RPC (in-cluster)
              │        ┌───────┴───────┐
              │        ▼               ▼
              │   ┌────────┐      ┌──────────┐
              │   │ auth   │      │ article  │
              │   │ :8081  │      │ :8082    │
              │   └───┬────┘      └────┬─────┘
              │       │                │ outbox relay (transactional outbox)
              │       │                ▼
              │       │           ┌──────────┐   subject: articles.published
              │       │           │  NATS    │◄──────────────────────────┐
              │       │           │ JetStream│                            │
              │       │           │ :4222    │───► subscribe ──► ┌──────────────┐
              │       │           └──────────┘                  │ notification │ :8084
              │       │                                         │ (email/log)  │
              ▼       ▼                ▼            ▼            └──────────────┘
        ┌──────────────────────────────────────────────────┐
        │  Postgres :5432   (auth_db, article_db,           │   Redis :6379 (auth sessions)
        │                    media_db, notification_db)      │   Jaeger :4317 OTLP (traces)
        └──────────────────────────────────────────────────┘
  • RS256 JWT, verify-only: auth signs with the private key; article/media/web verify with the public key. Both PEMs live in the jwt-keys secret, mounted read-only at /keys.
  • Reliable events: article writes domain events to a DB outbox in the same transaction; an outbox relay publishes them to NATS JetStream; notification subscribes to articles.published.

1.2 Kubernetes objects (namespace azatcloud, 31 objects)

Kind Count What
Deployment 6 auth, article, media, notification, web, jaeger
StatefulSet 3 postgres, redis, nats (each with a PVC via volumeClaimTemplate)
Service 9 5 services + postgres + redis + nats + jaeger
Ingress 1 apex /media→media, /→web; www→web (redirected)
Middleware (Traefik) 1 www-to-apex 301 redirect
HorizontalPodAutoscaler 2 article (2–5), web (2–6) @ 70% CPU
PodDisruptionBudget 2 article, web
NetworkPolicy 5 default-deny + 4 allow rules (see §7.1)
PersistentVolumeClaim 1 media-data (5Gi)
ConfigMap 1 postgres-init (creates the 4 DBs/roles)
Secret 2 jwt-keys, azatcloud-secrets

2. Prerequisites (DNS + access)

# DNS (at Cloudflare): A records, grey cloud (DNS only)
#   azat.cloud      A  165.232.108.66
#   www.azat.cloud  A  165.232.108.66
# Verify from the box:
getent hosts azat.cloud www.azat.cloud      # both -> 165.232.108.66
# SSH deploy key so the box can pull the private GitLab repo (read-only)
ssh-keygen -t ed25519 -C "appuser@azatcloud-vps" -f ~/.ssh/id_ed25519 -N ""
cat ~/.ssh/id_ed25519.pub        # add as a read-only Deploy Key in GitLab
ssh -T git@gitlab.com            # expect: "Welcome to GitLab, @<user>!"
git clone git@gitlab.com:azat1hajy/azatcloud.git ~/azatcloud
cd ~/azatcloud && git checkout phase-3-microservices

⚠️ Gitignore gotcha: .gitignore contains an unanchored azatcloud line (meant for the built binary) which also matches the chart directory deploy/helm/azatcloud/ — so the chart is invisible to git. Either copy the chart onto the box manually, or fix the rule (azatcloud/azatcloud) and git add deploy/helm. See §10.


3. Host tooling (run by appuser; needs sudo)

# Docker — only to build the 5 images (k3s uses its own containerd)
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER            # docker without sudo (re-login to apply)

# K3s — single-node Kubernetes; bundles Traefik, CoreDNS, metrics-server,
#       local-path storage, and a NetworkPolicy controller
curl -sfL https://get.k3s.io | sh -

# kubeconfig owned by appuser (so kubectl/helm work without sudo)
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config && chmod 600 ~/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc   # k3s' kubectl defaults to the root file otherwise
export KUBECONFIG=$HOME/.kube/config

# Helm
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Build tool (not preinstalled on the minimal image)
sudo apt-get update -qq && sudo apt-get install -y make

# Verify
docker --version && kubectl get nodes -o wide && helm version

kubectl get nodes must show the node Ready before continuing.


4. RS256 JWT keypair

cd ~/azatcloud
make gen-keys            # or, if make is unavailable, the openssl equivalent:
#   mkdir -p keys
#   openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out keys/jwt_private.pem
#   openssl rsa -in keys/jwt_private.pem -pubout -out keys/jwt_public.pem
#   chmod 644 keys/*.pem
ls -l keys/              # jwt_private.pem, jwt_public.pem  (keys/ is gitignored)

5. cert-manager + Let's Encrypt issuers

helm repo add jetstack https://charts.jetstack.io && helm repo update
helm install cert-manager jetstack/cert-manager -n cert-manager \
  --create-namespace --set crds.enabled=true --wait

kubectl apply -f deploy/cert-manager/cluster-issuer.yaml   # letsencrypt-staging + -prod (HTTP-01 via Traefik)
kubectl get clusterissuer                                   # both READY=True

ACME / HTTP-01 flow (why §7.1's NetworkPolicy matters):

Let's Encrypt ──GET http://azat.cloud/.well-known/acme-challenge/<token>──►
   Traefik :80 ──► cm-acme-http-solver pod :8089 ──► 200 <key-authz> ──► cert issued

6. Namespace + secrets

kubectl create namespace azatcloud

# (a) RS256 keypair, mounted at /keys
kubectl -n azatcloud create secret generic jwt-keys \
  --from-file=keys/jwt_private.pem --from-file=keys/jwt_public.pem

# (b) App config secret (envFrom). Build it from a FILTERED .env so compose-era
#     keys (DATABASE_URL, JWT_PRIVATE_KEY_PATH, POSTGRES_PASSWORD…) don't leak
#     into every service via envFrom. Only the 7 keys the chart expects:
umask 077
grep -E '^(GOOGLE_CLIENT_ID|GOOGLE_CLIENT_SECRET|RESEND_API_KEY|EMAIL_FROM|ADMIN_EMAILS)=' .env > /tmp/s.env
echo 'GOOGLE_LOGIN_ENABLED=true'        >> /tmp/s.env
echo 'NOTIFY_EMAIL=azat1hajy@gmail.com' >> /tmp/s.env
kubectl -n azatcloud create secret generic azatcloud-secrets --from-env-file=/tmp/s.env
rm -f /tmp/s.env
kubectl -n azatcloud get secret azatcloud-secrets    # DATA=7

azatcloud-secrets keys

Key Purpose Notes
GOOGLE_CLIENT_ID / GOOGLE_CLIENT_SECRET Google OAuth login redirect URI must include https://azat.cloud in Google console
GOOGLE_LOGIN_ENABLED toggle Google login true
RESEND_API_KEY transactional email empty here → email sending off until set
EMAIL_FROM sender address no-reply@azat.cloud (verify the domain in Resend to actually send)
ADMIN_EMAILS bootstrap admins see §9 — only applied at first signup
NOTIFY_EMAIL admin notification recipient

The .env originally shipped ADMIN_EMAILS=you@example.com (placeholder) — the cause of the admin issue in §9.


7. Build images, import, deploy

cd ~/azatcloud
make images        # builds auth/article/media/notification/web (multi-stage, golang:1.26)
make k3s-import    # docker save | sudo k3s ctr images import   (no external registry)
sudo k3s ctr images ls | grep azat1hajy/azatcloud   # all 5, tag :latest

# Deploy against STAGING first (avoids burning LE prod rate limits)
helm upgrade --install azatcloud deploy/helm/azatcloud -n azatcloud \
  --set ingress.clusterIssuer=letsencrypt-staging
kubectl -n azatcloud wait --for=condition=Ready pods --all --timeout=240s
kubectl -n azatcloud wait --for=condition=Ready certificate/azatcloud-tls --timeout=150s

Pods crash-loop 2–3× on first boot while Postgres initialises the per-service DBs, then connect, run their embedded migrations (//go:embed *.sql + postgres.Migrate in each main.go), and go Ready. This is expected.

7.1 Fix #1 — ACME challenge blocked by NetworkPolicy (502)

The first staging cert stuck at pending:

Reason: Waiting for HTTP-01 challenge propagation: wrong status code '502', expected '200'

Cause: the chart's default-deny-ingress (podSelector: {}) blocks all ingress; the allow rules cover web/media/auth/article/datastores but not the ephemeral cm-acme-http-solver pods (acme.cert-manager.io/http01-solver=true, port 8089). Traefik → solver was denied → 502.

Fix (added to deploy/helm/azatcloud/templates/networkpolicy.yaml, then helm upgrade):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-to-acme-solver
spec:
  podSelector:
    matchLabels:
      acme.cert-manager.io/http01-solver: "true"
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - { protocol: TCP, port: 8089 }

This is required for renewals too — without it certs silently fail to renew (~60 days). Must be synced back to the repo (see §10).

7.2 Switch to the production certificate

helm upgrade azatcloud deploy/helm/azatcloud -n azatcloud \
  --set ingress.clusterIssuer=letsencrypt-prod
kubectl -n azatcloud delete secret azatcloud-tls     # force fresh issuance from prod
kubectl -n azatcloud wait --for=condition=Ready certificate/azatcloud-tls --timeout=150s

8. Verification

export KUBECONFIG=$HOME/.kube/config

kubectl -n azatcloud get pods                      # all 1/1 Running
kubectl -n azatcloud get certificate               # azatcloud-tls READY=True (issuer letsencrypt-prod)

# Public HTTPS (trusted cert + www redirect)
curl -sS -o /dev/null -w 'apex HTTP %{http_code} verify=%{ssl_verify_result}\n' https://azat.cloud/
curl -sS -o /dev/null -w 'www  HTTP %{http_code} -> %{redirect_url}\n'          https://www.azat.cloud/
echo | openssl s_client -connect azat.cloud:443 -servername azat.cloud 2>/dev/null \
  | openssl x509 -noout -issuer -subject -dates
curl -sS https://azat.cloud/ | grep -o '<title>[^<]*</title>'

# Event pipeline (after publishing an article in /admin)
kubectl -n azatcloud logs deploy/article      | grep -i 'outbox relay started'
kubectl -n azatcloud logs deploy/notification | grep 'subscribed to events'
kubectl -n azatcloud logs deploy/notification | grep 'articles.published'

Expected: apex HTTP 200 verify=0, www 301, issuer = Let's Encrypt, title renders.


9. Admin & author roles

Roles live in auth_db.users.roles (text[]; values admin / author / reader). rolesForEmail grants admin only at first signup for emails in ADMIN_EMAILS, otherwise reader. There is no self-serve author signup and no role-management UI/RPCauthor is granted manually.

Fix #2 — first login created a reader, not an admin

Because ADMIN_EMAILS was the placeholder at first login, the account was created as reader. Changing ADMIN_EMAILS does not retroactively promote an existing account, so we fixed both:

# correct the secret for future signups, reload auth
kubectl -n azatcloud patch secret azatcloud-secrets --type merge \
  -p "{\"data\":{\"ADMIN_EMAILS\":\"$(printf %s 'azat1hajy@gmail.com' | base64 -w0)\"}}"
kubectl -n azatcloud rollout restart deploy/auth

# promote the existing account directly
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
  -c "UPDATE users SET roles='{admin}', updated_at=now() WHERE lower(email)=lower('azat1hajy@gmail.com');"

Grant admin or author to any user (manual)

# admin:
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
  -c "UPDATE users SET roles='{admin}',  updated_at=now() WHERE lower(email)=lower('<email>');"
# author (keep reader too -> '{author,reader}'):
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db \
  -c "UPDATE users SET roles='{author}', updated_at=now() WHERE lower(email)=lower('<email>');"
kubectl -n azatcloud exec postgres-0 -- psql -U postgres -d auth_db -c 'SELECT email, roles FROM users;'

The user must have logged in once (so the row exists) and must log out and back in after — the role is baked into the JWT at login.

Role capabilities (pkg/authz/authz.go): admin = manage all; author = create + edit/delete own articles (counts as staff for /admin); reader = default.


10. Sync back to git (before merging phase-3 → main)

Two things made on the box are NOT yet in the repo:

  1. The Helm chart is gitignored. Fix .gitignore (azatcloud/azatcloud), then git add deploy/helm.
  2. The allow-ingress-to-acme-solver NetworkPolicy (§7.1) — confirm it's in templates/networkpolicy.yaml.
sed -i 's#^azatcloud$#/azatcloud#' .gitignore
git add .gitignore deploy/helm deploy/infradeploy.md
git status                       # verify deploy/helm/azatcloud/** is staged
git commit -m "Phase 3: helm chart + infra deploy guide + ACME netpol fix"
git push

11. Day-2 operations

# Redeploy after a code change
make images && make k3s-import
kubectl -n azatcloud rollout restart deploy/<svc>   # repull :latest (IfNotPresent + same tag)

# Logs / status / scale
kubectl -n azatcloud get pods -o wide
kubectl -n azatcloud logs -f deploy/<svc>
kubectl -n azatcloud get hpa
kubectl -n azatcloud port-forward svc/jaeger 16686:16686   # traces UI -> http://localhost:16686

# Certs (auto-renew ~30 days before expiry; needs §7.1 netpol)
kubectl -n azatcloud get certificate
kubectl -n azatcloud describe certificate azatcloud-tls

# Refresh-token pruning (auth image supports a prune subcommand)
kubectl -n azatcloud run prune --rm -it --restart=Never \
  --image=registry.gitlab.com/azat1hajy/azatcloud/auth:latest -- /auth-service prune

Rollback

DNS is unchanged, so rollback = stop the cluster app and restart the Phase 2 monolith (see CUTOVER.md):

kubectl -n azatcloud scale deploy --all --replicas=0    # or: helm uninstall azatcloud
sudo docker compose -f docker-compose.prod.yml up -d     # monolith back

12. Outstanding hardening

  • Firewall — node currently has all ports open. Restrict to 22/80/443 via a DigitalOcean Cloud Firewall (host ufw on k3s risks breaking pod networking).
  • HTTP→HTTPS redirecthttp://azat.cloud serves 200 instead of redirecting; add a Traefik redirectScheme middleware on the web entrypoint.
  • Google OAuth — add https://azat.cloud/... as an authorized redirect URI in the Google console.
  • Email — set a real RESEND_API_KEY and verify azat.cloud in Resend to enable sending.
  • Secrets — DB passwords are plaintext in the chart (auth_pw, …) and app secrets are plain k8s Secrets; migrate to External-Secrets/Vault (deploy/external-secrets/) and per-DB credentials.
  • Role management — add a SetRoles RPC + admin users page so authors/admins can be promoted from the UI instead of via psql (§9).
  • Backups — schedule pg_dump of the 4 databases and snapshot the media PVC.