- Definition of extra eveai_ops service to run (db) jobs

- Definition of manifests for all jobs - Definition of manifests for all eveai services
2025-09-03 15:20:54 +02:00
parent 898bb32318
commit 2a0c92b064
34 changed files with 1345 additions and 26 deletions
--- a/Setup/phase-8-application-services.md
+++ b/Setup/phase-8-application-services.md
@@ -0,0 +1,133 @@
+# Phase 8: Application Services (Staging)
+
+This guide describes how to deploy EveAI application services to the Scaleway Kubernetes cluster, building on Phases 1–7 in cluster-install.md.
+
+## Prerequisites
+- Ingress-NGINX running with external IP
+- cert-manager installed and Certificate evie-staging-tls is READY (via HTTP ACME first, then HTTPS-only)
+- External Secrets Operator installed; Kubernetes Secret eveai-secrets exists in namespace eveai-staging
+- Verification service deployed and reachable via /verify
+- Optional: Monitoring stack running, Pushgateway deployed or reachable; PUSH_GATEWAY_HOST/PORT available to apps (via eveai-secrets)
+
+## What we deploy (structure)
+- Frontend (web) services
+  - eveai-app → exposed at /admin
+  - eveai-api → exposed at /api
+  - eveai-chat-client → exposed at /client
+- Backend worker services (internal)
+  - eveai-workers (queue: embeddings)
+  - eveai-chat-workers (queue: llm_interactions)
+  - eveai-entitlements (queue: entitlements)
+- Ops Jobs (manual DB ops)
+  - 00-env-check
+  - 02-db-bootstrap-ext
+  - 03-db-migrate-public
+  - 04-db-migrate-tenant
+  - 05-seed-or-init-data
+  - 06-verify-minimal
+
+Manifests are under:
+- scaleway/manifests/base/applications/frontend/
+- scaleway/manifests/base/applications/backend/
+- scaleway/manifests/base/applications/ops/jobs/
+- Aggregate kustomization: scaleway/manifests/base/applications/kustomization.yaml
+
+## Step 1: Validate secrets
+```bash
+kubectl get secret eveai-secrets -n eveai-staging
+kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data}' | jq 'keys'
+```
+Confirm presence of DB_*, REDIS_*, OPENAI_API_KEY, MISTRAL_API_KEY, JWT_SECRET_KEY, API_ENCRYPTION_KEY, MINIO_*, PUSH_GATEWAY_HOST, PUSH_GATEWAY_PORT.
+
+## Step 2: Deploy Ops Jobs (manual pre-deploy)
+Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.
+```bash
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s
+
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/02-db-bootstrap-ext-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-bootstrap-ext --timeout=1800s
+
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-public --timeout=1800s
+
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/04-db-migrate-tenant-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-tenant --timeout=3600s
+
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/05-seed-or-init-data-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-seed-or-init --timeout=1800s
+
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/06-verify-minimal-job.yaml
+kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-verify-minimal --timeout=900s
+```
+View logs:
+```bash
+kubectl -n eveai-staging get jobs
+kubectl -n eveai-staging logs job/<created-job-name>
+```
+
+## Step 3: Deploy backend workers
+```bash
+kubectl apply -k scaleway/manifests/base/applications/backend/
+
+kubectl -n eveai-staging get deploy | egrep 'eveai-(workers|chat-workers|entitlements)'
+# Optional: quick logs
+kubectl -n eveai-staging logs deploy/eveai-workers --tail=100 || true
+kubectl -n eveai-staging logs deploy/eveai-chat-workers --tail=100 || true
+kubectl -n eveai-staging logs deploy/eveai-entitlements --tail=100 || true
+```
+
+## Step 4: Deploy frontend services
+```bash
+kubectl apply -k scaleway/manifests/base/applications/frontend/
+
+kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
+```
+
+## Step 5: Verify Ingress routes
+The HTTPS ingress has paths enabled for /admin, /api, /client. Verify:
+```bash
+kubectl -n eveai-staging describe ingress eveai-staging-ingress
+
+curl -k https://evie-staging.askeveai.com/verify/health
+curl -k https://evie-staging.askeveai.com/admin/healthz/ready
+curl -k https://evie-staging.askeveai.com/api/healthz/ready
+curl -k https://evie-staging.askeveai.com/client/healthz/ready
+```
+
+## Resources and probes (staging defaults)
+- Web (app, api, chat-client):
+  - requests: 150m CPU, 256Mi RAM; limits: 500m CPU, 512Mi RAM; replicas: 1
+  - readiness/liveness: GET /healthz/ready
+- Workers:
+  - eveai-workers: req 200m/512Mi, lim 1CPU/1Gi
+  - eveai-chat-workers: req 500m/1Gi, lim 2CPU/3Gi
+  - eveai-entitlements: req 100m/256Mi, lim 500m/512Mi
+
+## Pushgateway usage
+- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
+- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
+
+## Bunny.net WAF (TODO)
+- Configure Pull Zone for evie-staging.askeveai.com
+- Set Origin to the LoadBalancer IP with HTTPS and Host header evie-staging.askeveai.com
+- Define rate limits primarily on /api, looser on /client; enable bot filtering
+- Only switch DNS (CNAME) to Bunny after TLS issuance completed directly against LoadBalancer
+
+## Troubleshooting
+```bash
+kubectl get all -n eveai-staging
+kubectl get events -n eveai-staging --sort-by=.lastTimestamp
+kubectl describe ingress eveai-staging-ingress -n eveai-staging
+kubectl logs -n eveai-staging deploy/eveai-api --tail=200
+```
+
+## Rollback / Cleanup
+```bash
+# Remove frontend/backend (keeps verification and other base resources)
+kubectl delete -k scaleway/manifests/base/applications/frontend/
+kubectl delete -k scaleway/manifests/base/applications/backend/
+
+# Jobs are kept for history due to ttlSecondsAfterFinished; to delete immediately:
+kubectl -n eveai-staging delete jobs --all
+```