eveAI/documentation/Production Setup/phase-8-application-services.md

# Phase 8: Application Services (Staging)

This guide describes how to deploy EveAI application services to the Scaleway Kubernetes cluster, building on Phases 1–7 in cluster-install.md.

## Prerequisites
- Ingress-NGINX running with external IP
- cert-manager installed and Certificate evie-staging-tls is READY (via HTTP ACME first, then HTTPS-only)
- External Secrets Operator installed; Kubernetes Secret eveai-secrets exists in namespace eveai-staging
- Verification service deployed and reachable via /verify
- Optional: Monitoring stack running, Pushgateway deployed or reachable; PUSH_GATEWAY_HOST/PORT available to apps (via eveai-secrets)

## What we deploy (structure)
- Frontend (web) services
  - eveai-app → exposed at /admin
  - eveai-api → exposed at /api
  - eveai-chat-client → exposed at /client
- Backend worker services (internal)
  - eveai-workers (queue: embeddings)
  - eveai-chat-workers (queue: llm_interactions)
  - eveai-entitlements (queue: entitlements)
- Ops Jobs (manual DB ops)
  - 00-env-check
  - 02-db-bootstrap-ext
  - 03-db-migrate-public
  - 04-db-migrate-tenant
  - 05-seed-or-init-data
  - 06-verify-minimal

Manifests are under:
- scaleway/manifests/base/applications/frontend/
- scaleway/manifests/base/applications/backend/
- scaleway/manifests/base/applications/ops/jobs/
- Aggregate kustomization (apps only): scaleway/manifests/base/applications/kustomization.yaml

Note:
- The staging Kustomize overlay deploys only frontend and backend apps.
- Ingress remains managed manually via scaleway/manifests/base/networking/ingress-https.yaml and your cluster-install.md guide.
- Ops Jobs are not part of the overlay and should be executed manually with kubectl create -f.

## Step 1: Validate secrets
```bash
kubectl get secret eveai-secrets -n eveai-staging
kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data}' | jq 'keys'
```
Confirm presence of DB_*, REDIS_*, OPENAI_API_KEY, MISTRAL_API_KEY, JWT_SECRET_KEY, API_ENCRYPTION_KEY, MINIO_*, PUSH_GATEWAY_HOST, PUSH_GATEWAY_PORT.

## Step 2: Deploy Ops Jobs (manual pre-deploy)
Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.

Notes for images:
- Ops Jobs now reference the private Scaleway registry directly and set imagePullPolicy: Always.
- Ensure the docker pull secret exists (scaleway-registry-cred) — see the Private registry section.
- After pushing a new :staging image, delete any previous Job (if present) and create a new one to force a fresh Pod pull.

```bash
kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s

kubectl create -f scaleway/manifests/base/applications/ops/jobs/02-db-bootstrap-ext-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-bootstrap-ext --timeout=1800s

kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-public --timeout=1800s

kubectl create -f scaleway/manifests/base/applications/ops/jobs/04-db-migrate-tenant-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-tenant --timeout=3600s

kubectl create -f scaleway/manifests/base/applications/ops/jobs/05-seed-or-init-data-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-seed-or-init --timeout=1800s

kubectl create -f scaleway/manifests/base/applications/ops/jobs/06-verify-minimal-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-verify-minimal --timeout=900s
```
View logs:
```bash
kubectl -n eveai-staging get jobs
kubectl -n eveai-staging logs job/<created-job-name>
```

### Runtime environment for Ops Jobs
Each Ops Job sets the same non-secret runtime variables required by the shared bootstrap (start.sh/run.py):
- FLASK_APP=/app/scripts/run.py
- COMPONENT_NAME=eveai_ops
- PYTHONUNBUFFERED=1
- LOGLEVEL=debug (for staging)
- ROLE=web
- PORT=8080
- WORKERS=1
- WORKER_CLASS=gevent
- WORKER_CONN=100
- MAX_REQUESTS=1000
- MAX_REQUESTS_JITTER=100

Secrets (DB_*, REDIS_*, etc.) still come from `envFrom: secretRef: eveai-secrets`.

Tip: After pushing a new :staging image, delete any previous Job with the same label to force a fresh Pod and pull:
```bash
kubectl -n eveai-staging delete job -l component=ops,job-type=db-migrate-public || true
kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
```

## Step 3: Deploy backend workers
```bash
kubectl apply -k scaleway/manifests/base/applications/backend/

kubectl -n eveai-staging get deploy | egrep 'eveai-(workers|chat-workers|entitlements)'
# Optional: quick logs
kubectl -n eveai-staging logs deploy/eveai-workers --tail=100 || true
kubectl -n eveai-staging logs deploy/eveai-chat-workers --tail=100 || true
kubectl -n eveai-staging logs deploy/eveai-entitlements --tail=100 || true
```

## Step 4: Deploy frontend services
```bash
kubectl apply -k scaleway/manifests/base/applications/frontend/

kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
```

## Step 5: Verify Ingress routes (Ingress managed separately)
Ingress is intentionally not managed by the staging Kustomize overlay. Apply or update it manually using your existing manifest and handle it per your cluster-install.md guide:
```bash
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
kubectl -n eveai-staging describe ingress eveai-staging-ingress
```
Then verify the routes:
```bash
curl -k https://evie-staging.askeveai.com/verify/health
curl -k https://evie-staging.askeveai.com/admin/healthz/ready
curl -k https://evie-staging.askeveai.com/api/healthz/ready
curl -k https://evie-staging.askeveai.com/client/healthz/ready
```

## Resources and probes (staging defaults)
- Web (app, api, chat-client):
  - requests: 150m CPU, 256Mi RAM; limits: 500m CPU, 512Mi RAM; replicas: 1
  - readiness/liveness: GET /healthz/ready
- Workers:
  - eveai-workers: req 200m/512Mi, lim 1CPU/1Gi
  - eveai-chat-workers: req 500m/1Gi, lim 2CPU/3Gi
  - eveai-entitlements: req 100m/256Mi, lim 500m/512Mi

## Pushgateway usage
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.

## Image tags strategy (staging/production channels)
- The push script now creates and pushes two tags per service:
  - A versioned tag: :vX.Y.Z (e.g., :v1.2.3)
  - An environment channel tag based on ENVIRONMENT: :staging or :production
- Recommendation for staging manifests:
  - Refer to the channel tag (e.g., rg.fr-par.scw.cloud/eveai-staging/...:<staging>) and set imagePullPolicy: Always so new pushes are picked up without manifest changes.
- Production can later use immutable version tags or digests via a production overlay.
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.

## Bunny.net WAF (TODO)
- Configure Pull Zone for evie-staging.askeveai.com
- Set Origin to the LoadBalancer IP with HTTPS and Host header evie-staging.askeveai.com
- Define rate limits primarily on /api, looser on /client; enable bot filtering
- Only switch DNS (CNAME) to Bunny after TLS issuance completed directly against LoadBalancer

## Troubleshooting
```bash
kubectl get all -n eveai-staging
kubectl get events -n eveai-staging --sort-by=.lastTimestamp
kubectl describe ingress eveai-staging-ingress -n eveai-staging
kubectl logs -n eveai-staging deploy/eveai-api --tail=200
```

## Rollback / Cleanup
```bash
# Remove frontend/backend (keeps verification and other base resources)
kubectl delete -k scaleway/manifests/base/applications/frontend/
kubectl delete -k scaleway/manifests/base/applications/backend/

# Jobs are kept for history due to ttlSecondsAfterFinished; to delete immediately:
kubectl -n eveai-staging delete jobs --all
```


## Private registry (Scaleway)
1) Create docker pull secret via External Secrets (once):
```bash
kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
kubectl -n eveai-staging get secret scaleway-registry-cred -o yaml | grep "type: kubernetes.io/dockerconfigjson"
```
2) Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
```bash
kubectl apply -k scaleway/manifests/overlays/staging/
```
Notes:
- Base manifests keep generic images (josakola/...). The overlay rewrites them to rg.fr-par.scw.cloud/eveai-staging/josakola/...:staging and adds imagePullSecrets to all Pods.
- Staging uses imagePullPolicy: Always, so new pushes to :staging are pulled automatically.