- Definition and Improvements to job-system

- Definition of k8s pods for application services
This commit is contained in:
Josako
2025-09-04 11:49:19 +02:00
parent 2a0c92b064
commit af8b5f54cd
16 changed files with 352 additions and 48 deletions

View File

@@ -30,7 +30,12 @@ Manifests are under:
- scaleway/manifests/base/applications/frontend/
- scaleway/manifests/base/applications/backend/
- scaleway/manifests/base/applications/ops/jobs/
- Aggregate kustomization: scaleway/manifests/base/applications/kustomization.yaml
- Aggregate kustomization (apps only): scaleway/manifests/base/applications/kustomization.yaml
Note:
- The staging Kustomize overlay deploys only frontend and backend apps.
- Ingress remains managed manually via scaleway/manifests/base/networking/ingress-https.yaml and your cluster-install.md guide.
- Ops Jobs are not part of the overlay and should be executed manually with kubectl create -f.
## Step 1: Validate secrets
```bash
@@ -41,6 +46,12 @@ Confirm presence of DB_*, REDIS_*, OPENAI_API_KEY, MISTRAL_API_KEY, JWT_SECRET_K
## Step 2: Deploy Ops Jobs (manual pre-deploy)
Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.
Notes for images:
- Ops Jobs now reference the private Scaleway registry directly and set imagePullPolicy: Always.
- Ensure the docker pull secret exists (scaleway-registry-cred) — see the Private registry section.
- After pushing a new :staging image, delete any previous Job (if present) and create a new one to force a fresh Pod pull.
```bash
kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s
@@ -66,6 +77,28 @@ kubectl -n eveai-staging get jobs
kubectl -n eveai-staging logs job/<created-job-name>
```
### Runtime environment for Ops Jobs
Each Ops Job sets the same non-secret runtime variables required by the shared bootstrap (start.sh/run.py):
- FLASK_APP=/app/scripts/run.py
- COMPONENT_NAME=eveai_ops
- PYTHONUNBUFFERED=1
- LOGLEVEL=debug (for staging)
- ROLE=web
- PORT=8080
- WORKERS=1
- WORKER_CLASS=gevent
- WORKER_CONN=100
- MAX_REQUESTS=1000
- MAX_REQUESTS_JITTER=100
Secrets (DB_*, REDIS_*, etc.) still come from `envFrom: secretRef: eveai-secrets`.
Tip: After pushing a new :staging image, delete any previous Job with the same label to force a fresh Pod and pull:
```bash
kubectl -n eveai-staging delete job -l component=ops,job-type=db-migrate-public || true
kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
```
## Step 3: Deploy backend workers
```bash
kubectl apply -k scaleway/manifests/base/applications/backend/
@@ -84,11 +117,14 @@ kubectl apply -k scaleway/manifests/base/applications/frontend/
kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
```
## Step 5: Verify Ingress routes
The HTTPS ingress has paths enabled for /admin, /api, /client. Verify:
## Step 5: Verify Ingress routes (Ingress managed separately)
Ingress is intentionally not managed by the staging Kustomize overlay. Apply or update it manually using your existing manifest and handle it per your cluster-install.md guide:
```bash
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
kubectl -n eveai-staging describe ingress eveai-staging-ingress
```
Then verify the routes:
```bash
curl -k https://evie-staging.askeveai.com/verify/health
curl -k https://evie-staging.askeveai.com/admin/healthz/ready
curl -k https://evie-staging.askeveai.com/api/healthz/ready
@@ -108,6 +144,16 @@ curl -k https://evie-staging.askeveai.com/client/healthz/ready
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
## Image tags strategy (staging/production channels)
- The push script now creates and pushes two tags per service:
- A versioned tag: :vX.Y.Z (e.g., :v1.2.3)
- An environment channel tag based on ENVIRONMENT: :staging or :production
- Recommendation for staging manifests:
- Refer to the channel tag (e.g., rg.fr-par.scw.cloud/eveai-staging/...:<staging>) and set imagePullPolicy: Always so new pushes are picked up without manifest changes.
- Production can later use immutable version tags or digests via a production overlay.
- Ensure PUSH_GATEWAY_HOST and PUSH_GATEWAY_PORT are provided (e.g., pushgateway.monitoring.svc.cluster.local:9091), typically via eveai-secrets or a ConfigMap.
- Apps will continue to push business metrics; Prometheus scrapes the Pushgateway.
## Bunny.net WAF (TODO)
- Configure Pull Zone for evie-staging.askeveai.com
- Set Origin to the LoadBalancer IP with HTTPS and Host header evie-staging.askeveai.com
@@ -131,3 +177,18 @@ kubectl delete -k scaleway/manifests/base/applications/backend/
# Jobs are kept for history due to ttlSecondsAfterFinished; to delete immediately:
kubectl -n eveai-staging delete jobs --all
```
## Private registry (Scaleway)
1) Create docker pull secret via External Secrets (once):
```bash
kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
kubectl -n eveai-staging get secret scaleway-registry-cred -o yaml | grep "type: kubernetes.io/dockerconfigjson"
```
2) Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
```bash
kubectl apply -k scaleway/manifests/overlays/staging/
```
Notes:
- Base manifests keep generic images (josakola/...). The overlay rewrites them to rg.fr-par.scw.cloud/eveai-staging/josakola/...:staging and adds imagePullSecrets to all Pods.
- Staging uses imagePullPolicy: Always, so new pushes to :staging are pulled automatically.