- eveai_chat_client updated to retrieve static files from the correct (bunny.net) location when a STATIC_URL is defined.

- Defined locations for crewai crew memory. This failed in k8s.
- Redis connection for pub/sub in ExecutionProgressTracker adapted to conform to TLS-enabled connections
This commit is contained in:
Josako
2025-09-12 10:18:43 +02:00
parent a325fa5084
commit 42cb1de0fd
15 changed files with 306 additions and 50 deletions

View File

@@ -612,6 +612,12 @@ kubectl -n eveai-staging get jobs
kubectl -n eveai-staging logs job/<created-job-name>
```
#### Creating volume for eveai_chat_worker's crewai storage
```bash
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
```
#### Application Services Deployment
Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
```bash
@@ -861,3 +867,63 @@ curl https://evie-staging.askeveai.com/verify/
## EveAI Chat Workers: Persistent logs storage and Celery process behavior
This addendum describes how to enable persistent storage for CrewAI tuning runs under /app/logs for the eveai-chat-workers Deployment and clarifies Celery process behavior relevant to environment variables.
### Celery prefork behavior and env variables
- Pool: prefork (default). Each worker process (child) handles multiple tasks sequentially.
- Implication: any environment variable changed inside a child process persists for subsequent tasks handled by that same child, until it is changed again or the process is recycled.
- Our practice: set required env vars (e.g., CREWAI_STORAGE_DIR/CREWAI_STORAGE_PATH) immediately before initializing CrewAI and restore them immediately after. This prevents leakage to the next task in the same process.
- CELERY_MAX_TASKS_PER_CHILD: the number of tasks a child will process before being recycled. Suggested starting range for heavy LLM/RAG workloads: 200500; 1000 is acceptable if memory growth is stable. Monitor RSS and adjust.
### Create and mount a PersistentVolumeClaim for /app/logs
We persist tuning outputs under /app/logs by mounting a PVC in the worker pod.
Manifests added/updated (namespace: eveai-staging):
- scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
- scaleway/manifests/base/applications/backend/eveai-chat-workers/deployment.yaml (volume mount added)
Apply with kubectl (no Kustomize required):
```bash
# Create or update the PVC for logs
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
# Update the Deployment to mount the PVC at /app/logs
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/deployment.yaml
```
Verify PVC is bound and the pod mounts the volume:
```bash
# Check PVC status
kubectl get pvc -n eveai-staging eveai-chat-workers-logs -o wide
# Inspect the pod to confirm the volume mount
kubectl get pods -n eveai-staging -l app=eveai-chat-workers -o name
kubectl describe pod -n eveai-staging <pod-name>
# (Optional) Exec into the pod to check permissions and path
kubectl exec -n eveai-staging -it <pod-name> -- sh -lc 'id; ls -ld /app/logs'
```
Permissions and securityContext notes:
- The container runs as a non-root user (appuser) per Dockerfile.base. Some storage classes mount volumes owned by root. If you encounter permission issues (EACCES) writing to /app/logs:
- Option A: set a pod-level fsGroup so the mounted volume is group-writable by the container user.
- Option B: use an initContainer to chown/chmod /app/logs on the mounted volume.
- Keep monitoring PVC usage and set alerts to avoid running out of space.
Retention / cleanup recommendation:
- For a 14-day retention, create a CronJob that runs daily to remove files older than 14 days and then delete empty directories, mounting the same PVC at /app/logs. Example command:
```bash
find /app/logs -type f -mtime +14 -print -delete; find /app/logs -type d -empty -mtime +14 -print -delete
```
Operational checks after deployment:
1) Trigger a CrewAI tuning run; verify files appear under /app/logs and remain after pod restarts.
2) Trigger a non-tuning run; verify temporary directories are created and cleaned up automatically.
3) Monitor memory while varying CELERY_CONCURRENCY and CELERY_MAX_TASKS_PER_CHILD.