195 lines
8.8 KiB
Markdown
195 lines
8.8 KiB
Markdown
# Phase 8: Application Services (Staging)
|
||
|
||
This guide describes how to deploy EveAI application services to the Scaleway Kubernetes cluster, building on Phases 1–7 in cluster-install.md.
|
||
|
||
## Prerequisites
|
||
- Ingress-NGINX running with external IP
|
||
- cert-manager installed and Certificate evie-staging-tls is READY (via HTTP ACME first, then HTTPS-only)
|
||
- External Secrets Operator installed; Kubernetes Secret eveai-secrets exists in namespace eveai-staging
|
||
- Verification service deployed and reachable via /verify
|
||
- Optional: Monitoring stack running, Pushgateway deployed or reachable; PUSH_GATEWAY_HOST/PORT available to apps (via eveai-secrets)
|
||
|
||
## What we deploy (structure)
|
||
- Frontend (web) services
|
||
- eveai-app → exposed at /admin
|
||
- eveai-api → exposed at /api
|
||
- eveai-chat-client → exposed at /client
|
||
- Backend worker services (internal)
|
||
- eveai-workers (queue: embeddings)
|
||
- eveai-chat-workers (queue: llm_interactions)
|
||
- eveai-entitlements (queue: entitlements)
|
||
- Ops Jobs (manual DB ops)
|
||
- 00-env-check
|
||
- 02-db-bootstrap-ext
|
||
- 03-db-migrate-public
|
||
- 04-db-migrate-tenant
|
||
- 05-seed-or-init-data
|
||
- 06-verify-minimal
|
||
|
||
Manifests are under:
|
||
- scaleway/manifests/base/applications/frontend/
|
||
- scaleway/manifests/base/applications/backend/
|
||
- scaleway/manifests/base/applications/ops/jobs/
|
||
- Aggregate kustomization (apps only): scaleway/manifests/base/applications/kustomization.yaml
|
||
|
||
Note:
|
||
- The staging Kustomize overlay deploys only frontend and backend apps.
|
||
- Ingress remains managed manually via scaleway/manifests/base/networking/ingress-https.yaml and your cluster-install.md guide.
|
||
- Ops Jobs are not part of the overlay and should be executed manually with kubectl create -f.
|
||
|
||
## Step 1: Validate secrets
|
||
```bash
|
||
kubectl get secret eveai-secrets -n eveai-staging
|
||
kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data}' | jq 'keys'
|
||
```
|
||
Confirm presence of DB_*, REDIS_*, OPENAI_API_KEY, MISTRAL_API_KEY, JWT_SECRET_KEY, API_ENCRYPTION_KEY, MINIO_*, PUSH_GATEWAY_HOST, PUSH_GATEWAY_PORT.
|
||
|
||
## Step 2: Deploy Ops Jobs (manual pre-deploy)
|
||
Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.
|
||
|
||
Notes for images:
|
||
- Ops Jobs now reference the private Scaleway registry directly and set imagePullPolicy: Always.
|
||
- Ensure the docker pull secret exists (scaleway-registry-cred) — see the Private registry section.
|
||
- After pushing a new :staging image, delete any previous Job (if present) and create a new one to force a fresh Pod pull.
|
||
|
||
```bash
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/02-db-bootstrap-ext-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-bootstrap-ext --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-public --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/04-db-migrate-tenant-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-tenant --timeout=3600s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/05-seed-or-init-data-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-seed-or-init --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/06-verify-minimal-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-verify-minimal --timeout=900s
|
||
```
|
||
View logs:
|
||
```bash
|
||
kubectl -n eveai-staging get jobs
|
||
kubectl -n eveai-staging logs job/<created-job-name>
|
||
```
|
||
|
||
### Runtime environment for Ops Jobs
|
||
Each Ops Job sets the same non-secret runtime variables required by the shared bootstrap (start.sh/run.py):
|
||
- FLASK_APP=/app/scripts/run.py
|
||
- COMPONENT_NAME=eveai_ops
|
||
- PYTHONUNBUFFERED=1
|
||
- LOGLEVEL=debug (for staging)
|
||
- ROLE=web
|
||
- PORT=8080
|
||
- WORKERS=1
|
||
- WORKER_CLASS=gevent
|
||
- WORKER_CONN=100
|
||
- MAX_REQUESTS=1000
|
||
- MAX_REQUESTS_JITTER=100
|
||
|
||
Secrets (DB_*, REDIS_*, etc.) still come from `envFrom: secretRef: eveai-secrets`.
|
||
|
||
Tip: After pushing a new :staging image, delete any previous Job with the same label to force a fresh Pod and pull:
|
||
```bash
|
||
kubectl -n eveai-staging delete job -l component=ops,job-type=db-migrate-public || true
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
|
||
```
|
||
|
||
## Step 3: Deploy backend workers
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/base/applications/backend/
|
||
|
||
kubectl -n eveai-staging get deploy | egrep 'eveai-(workers|chat-workers|entitlements)'
|
||
# Optional: quick logs
|
||
kubectl -n eveai-staging logs deploy/eveai-workers --tail=100 || true
|
||
kubectl -n eveai-staging logs deploy/eveai-chat-workers --tail=100 || true
|
||
kubectl -n eveai-staging logs deploy/eveai-entitlements --tail=100 || true
|
||
```
|
||
|
||
## Step 4: Deploy frontend services
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/base/applications/frontend/
|
||
|
||
kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
|
||
```
|
||
|
||
## Step 5: Verify Ingress routes (Ingress managed separately)
|
||
Ingress is intentionally not managed by the staging Kustomize overlay. Apply or update it manually using your existing manifest and handle it per your cluster-install.md guide:
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
|
||
kubectl -n eveai-staging describe ingress eveai-staging-ingress
|
||
```
|
||
Then verify the routes:
|
||
```bash
|
||
curl -k https://evie-staging.askeveai.com/verify/health
|
||
curl -k https://evie-staging.askeveai.com/admin/healthz/ready
|
||
curl -k https://evie-staging.askeveai.com/api/healthz/ready
|
||
curl -k https://evie-staging.askeveai.com/client/healthz/ready
|
||
```
|
||
|
||
## Resources and probes (staging defaults)
|
||
- Web (app, api, chat-client):
|
||
- requests: 150m CPU, 256Mi RAM; limits: 500m CPU, 512Mi RAM; replicas: 1
|
||
- readiness/liveness: GET /healthz/ready
|
||
- Workers:
|
||
- eveai-workers: req 200m/512Mi, lim 1CPU/1Gi
|
||
- eveai-chat-workers: req 500m/1Gi, lim 2CPU/3Gi
|
||
- eveai-entitlements: req 100m/256Mi, lim 500m/512Mi
|
||
|
||
## Pushgateway usage
|
||
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
|
||
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
|
||
|
||
## Image tags strategy (staging/production channels)
|
||
- The push script now creates and pushes two tags per service:
|
||
- A versioned tag: :vX.Y.Z (e.g., :v1.2.3)
|
||
- An environment channel tag based on ENVIRONMENT: :staging or :production
|
||
- Recommendation for staging manifests:
|
||
- Refer to the channel tag (e.g., rg.fr-par.scw.cloud/eveai-staging/...:<staging>) and set imagePullPolicy: Always so new pushes are picked up without manifest changes.
|
||
- Production can later use immutable version tags or digests via a production overlay.
|
||
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
|
||
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
|
||
|
||
## Bunny.net WAF (TODO)
|
||
- Configure Pull Zone for evie-staging.askeveai.com
|
||
- Set Origin to the LoadBalancer IP with HTTPS and Host header evie-staging.askeveai.com
|
||
- Define rate limits primarily on /api, looser on /client; enable bot filtering
|
||
- Only switch DNS (CNAME) to Bunny after TLS issuance completed directly against LoadBalancer
|
||
|
||
## Troubleshooting
|
||
```bash
|
||
kubectl get all -n eveai-staging
|
||
kubectl get events -n eveai-staging --sort-by=.lastTimestamp
|
||
kubectl describe ingress eveai-staging-ingress -n eveai-staging
|
||
kubectl logs -n eveai-staging deploy/eveai-api --tail=200
|
||
```
|
||
|
||
## Rollback / Cleanup
|
||
```bash
|
||
# Remove frontend/backend (keeps verification and other base resources)
|
||
kubectl delete -k scaleway/manifests/base/applications/frontend/
|
||
kubectl delete -k scaleway/manifests/base/applications/backend/
|
||
|
||
# Jobs are kept for history due to ttlSecondsAfterFinished; to delete immediately:
|
||
kubectl -n eveai-staging delete jobs --all
|
||
```
|
||
|
||
|
||
## Private registry (Scaleway)
|
||
1) Create docker pull secret via External Secrets (once):
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
|
||
kubectl -n eveai-staging get secret scaleway-registry-cred -o yaml | grep "type: kubernetes.io/dockerconfigjson"
|
||
```
|
||
2) Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||
```
|
||
Notes:
|
||
- Base manifests keep generic images (josakola/...). The overlay rewrites them to rg.fr-par.scw.cloud/eveai-staging/josakola/...:staging and adds imagePullSecrets to all Pods.
|
||
- Staging uses imagePullPolicy: Always, so new pushes to :staging are pulled automatically.
|