- Definition and Improvements to job-system

- Definition of k8s pods for application services
2025-09-04 11:49:19 +02:00
parent 2a0c92b064
commit af8b5f54cd
16 changed files with 352 additions and 48 deletions
--- a/Setup/phase-8-application-services.md
+++ b/Setup/phase-8-application-services.md
@@ -30,7 +30,12 @@ Manifests are under:
 - scaleway/manifests/base/applications/frontend/
 - scaleway/manifests/base/applications/backend/
 - scaleway/manifests/base/applications/ops/jobs/
- Aggregate kustomization: scaleway/manifests/base/applications/kustomization.yaml
+- Aggregate kustomization (apps only): scaleway/manifests/base/applications/kustomization.yaml
+
+Note:
+- The staging Kustomize overlay deploys only frontend and backend apps.
+- Ingress remains managed manually via scaleway/manifests/base/networking/ingress-https.yaml and your cluster-install.md guide.
+- Ops Jobs are not part of the overlay and should be executed manually with kubectl create -f.

 ## Step 1: Validate secrets
 ```bash
@@ -41,6 +46,12 @@ Confirm presence of DB_*, REDIS_*, OPENAI_API_KEY, MISTRAL_API_KEY, JWT_SECRET_K

 ## Step 2: Deploy Ops Jobs (manual pre-deploy)
 Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.
+
+Notes for images:
+- Ops Jobs now reference the private Scaleway registry directly and set imagePullPolicy: Always.
+- Ensure the docker pull secret exists (scaleway-registry-cred) — see the Private registry section.
+- After pushing a new :staging image, delete any previous Job (if present) and create a new one to force a fresh Pod pull.
+
 ```bash
 kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
 kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s
@@ -66,6 +77,28 @@ kubectl -n eveai-staging get jobs
 kubectl -n eveai-staging logs job/<created-job-name>
 ```

+### Runtime environment for Ops Jobs
+Each Ops Job sets the same non-secret runtime variables required by the shared bootstrap (start.sh/run.py):
+- FLASK_APP=/app/scripts/run.py
+- COMPONENT_NAME=eveai_ops
+- PYTHONUNBUFFERED=1
+- LOGLEVEL=debug (for staging)
+- ROLE=web
+- PORT=8080
+- WORKERS=1
+- WORKER_CLASS=gevent
+- WORKER_CONN=100
+- MAX_REQUESTS=1000
+- MAX_REQUESTS_JITTER=100
+
+Secrets (DB_*, REDIS_*, etc.) still come from `envFrom: secretRef: eveai-secrets`.
+
+Tip: After pushing a new :staging image, delete any previous Job with the same label to force a fresh Pod and pull:
+```bash
+kubectl -n eveai-staging delete job -l component=ops,job-type=db-migrate-public || true
+kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
+```
+
 ## Step 3: Deploy backend workers
 ```bash
 kubectl apply -k scaleway/manifests/base/applications/backend/
@@ -84,11 +117,14 @@ kubectl apply -k scaleway/manifests/base/applications/frontend/
 kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
 ```

-## Step 5: Verify Ingress routes
-The HTTPS ingress has paths enabled for /admin, /api, /client. Verify:
+## Step 5: Verify Ingress routes (Ingress managed separately)
+Ingress is intentionally not managed by the staging Kustomize overlay. Apply or update it manually using your existing manifest and handle it per your cluster-install.md guide:
 ```bash
+kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
 kubectl -n eveai-staging describe ingress eveai-staging-ingress
-
+```
+Then verify the routes:
+```bash
 curl -k https://evie-staging.askeveai.com/verify/health
 curl -k https://evie-staging.askeveai.com/admin/healthz/ready
 curl -k https://evie-staging.askeveai.com/api/healthz/ready
@@ -108,6 +144,16 @@ curl -k https://evie-staging.askeveai.com/client/healthz/ready
 - Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
 - Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.

+## Image tags strategy (staging/production channels)
+- The push script now creates and pushes two tags per service:
+  - A versioned tag: :vX.Y.Z (e.g., :v1.2.3)
+  - An environment channel tag based on ENVIRONMENT: :staging or :production
+- Recommendation for staging manifests:
+  - Refer to the channel tag (e.g., rg.fr-par.scw.cloud/eveai-staging/...:<staging>) and set imagePullPolicy: Always so new pushes are picked up without manifest changes.
+- Production can later use immutable version tags or digests via a production overlay.
+- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
+- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
+
 ## Bunny.net WAF (TODO)
 - Configure Pull Zone for evie-staging.askeveai.com
 - Set Origin to the LoadBalancer IP with HTTPS and Host header evie-staging.askeveai.com
@@ -131,3 +177,18 @@ kubectl delete -k scaleway/manifests/base/applications/backend/
 # Jobs are kept for history due to ttlSecondsAfterFinished; to delete immediately:
 kubectl -n eveai-staging delete jobs --all
 ```
+
+
+## Private registry (Scaleway)
+1) Create docker pull secret via External Secrets (once):
+```bash
+kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
+kubectl -n eveai-staging get secret scaleway-registry-cred -o yaml | grep "type: kubernetes.io/dockerconfigjson"
+```
+2) Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
+```bash
+kubectl apply -k scaleway/manifests/overlays/staging/
+```
+Notes:
+- Base manifests keep generic images (josakola/...). The overlay rewrites them to rg.fr-par.scw.cloud/eveai-staging/josakola/...:staging and adds imagePullSecrets to all Pods.
+- Staging uses imagePullPolicy: Always, so new pushes to :staging are pulled automatically.