935 lines
33 KiB
Markdown
935 lines
33 KiB
Markdown
# EveAI Cluster Installation Guide (Updated for Modular Kustomize Setup)
|
||
|
||
## Prerequisites
|
||
|
||
### Required Tools
|
||
```bash
|
||
# Verify required tools are installed
|
||
kubectl version --client
|
||
kustomize version
|
||
helm version
|
||
|
||
# Configure kubectl for Scaleway cluster
|
||
scw k8s kubeconfig install <cluster-id>
|
||
kubectl cluster-info
|
||
```
|
||
|
||
### Scaleway Prerequisites
|
||
- Kubernetes cluster running
|
||
- Managed services configured (PostgreSQL, Redis, MinIO)
|
||
- Secrets stored in Scaleway Secret Manager:
|
||
- `eveai-app-keys`, `eveai-mistral`, `eveai-object-storage`, `eveai-tem`
|
||
- `eveai-openai`, `eveai-postgresql`, `eveai-redis`, `eveai-redis-certificate`
|
||
- Flexible IP address (LoadBalancer)
|
||
- Eerst een loadbalancer aanmaken met publiek IP
|
||
- Daarna de loadbalancer verwijderen maar flexible IPs behouden
|
||
- Dit externe IP is het IP adres dat moet worden verwerkt in ingress-values.yaml!
|
||
|
||
## CDN Setup (Bunny.net - Optional)
|
||
|
||
### Configure Pull Zone
|
||
- Create Pull zone: evie-staging
|
||
- Origin: https://[LoadBalancer-IP] (note HTTPS!) -> pas later in het proces gekend
|
||
- Host header: evie-staging.askeveai.com
|
||
- Force SSL: Enabled
|
||
- In the pull zone's Caching - General settings, ensure to disable 'Strip Response Cookies'
|
||
- Define edge rules for
|
||
- Redirecting the root
|
||
- Redirecting security urls
|
||
|
||
### Update DNS (eurodns) for CDN
|
||
- Change A-record to CNAME pointing to CDN endpoint
|
||
- Or update A-record to CDN IP
|
||
|
||
## New Modular Deployment Process
|
||
|
||
### Phase 1: Infrastructure Foundation
|
||
Deploy core infrastructure components in the correct order:
|
||
|
||
```bash
|
||
# 1. Deploy namespaces
|
||
kubectl apply -f scaleway/manifests/base/infrastructure/00-namespaces.yaml
|
||
|
||
# 2. Add NGINX Ingress Helm repository
|
||
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
|
||
helm repo update
|
||
|
||
# 3. Deploy NGINX ingress controller via Helm
|
||
helm install ingress-nginx ingress-nginx/ingress-nginx \
|
||
--namespace ingress-nginx \
|
||
--create-namespace \
|
||
--values scaleway/manifests/base/infrastructure/ingress-values.yaml
|
||
|
||
# 4. Wait for ingress controller to be ready
|
||
kubectl wait --namespace ingress-nginx \
|
||
--for=condition=ready pod \
|
||
--selector=app.kubernetes.io/component=controller \
|
||
--timeout=300s
|
||
|
||
# 5. Add cert-manager Helm repository
|
||
helm repo add jetstack https://charts.jetstack.io
|
||
helm repo update
|
||
|
||
# 6. Install cert-manager CRDs
|
||
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.crds.yaml
|
||
|
||
# 7. Deploy cert-manager via Helm
|
||
helm install cert-manager jetstack/cert-manager \
|
||
--namespace cert-manager \
|
||
--create-namespace \
|
||
--values scaleway/manifests/base/infrastructure/cert-manager-values.yaml
|
||
|
||
# 8. Wait for cert-manager to be ready
|
||
kubectl wait --namespace cert-manager \
|
||
--for=condition=ready pod \
|
||
--selector=app.kubernetes.io/name=cert-manager \
|
||
--timeout=300s
|
||
|
||
# 9. Deploy cluster issuers
|
||
kubectl apply -f scaleway/manifests/base/infrastructure/03-cluster-issuers.yaml
|
||
```
|
||
|
||
### Phase 2: Verification Infrastructure Components
|
||
|
||
```bash
|
||
# Verify ingress controller
|
||
kubectl get pods -n ingress-nginx
|
||
kubectl get svc -n ingress-nginx
|
||
|
||
# Verify cert-manager
|
||
kubectl get pods -n cert-manager
|
||
kubectl get clusterissuers
|
||
|
||
# Check LoadBalancer external IP
|
||
kubectl get svc -n ingress-nginx ingress-nginx-controller
|
||
```
|
||
|
||
### Phase 3: Monitoring Stack (Optional but Recommended)
|
||
|
||
#### Add Prometheus Community Helm Repository
|
||
|
||
```bash
|
||
# Add Prometheus community Helm repository
|
||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||
helm repo update
|
||
|
||
# Verify chart availability
|
||
helm search repo prometheus-community/kube-prometheus-stack
|
||
```
|
||
|
||
#### Create Monitoring Values File
|
||
|
||
Create `scaleway/manifests/base/monitoring/prometheus-values.yaml`:
|
||
|
||
#### Deploy Monitoring Stack
|
||
|
||
```bash
|
||
# Install complete monitoring stack via Helm
|
||
helm install monitoring prometheus-community/kube-prometheus-stack \
|
||
--namespace monitoring \
|
||
--create-namespace \
|
||
--values scaleway/manifests/base/monitoring/prometheus-values.yaml
|
||
|
||
# Install pushgateway
|
||
helm install monitoring-pushgateway prometheus-community/prometheus-pushgateway \
|
||
-n monitoring --create-namespace \
|
||
--set serviceMonitor.enabled=true
|
||
|
||
# Monitor deployment progress
|
||
kubectl get pods -n monitoring -w
|
||
# Wait until all pods show STATUS: Running
|
||
```
|
||
|
||
#### Verify Monitoring Deployment
|
||
|
||
```bash
|
||
# Check Helm release
|
||
helm list -n monitoring
|
||
|
||
# Verify all components are running
|
||
kubectl get all -n monitoring
|
||
|
||
# Check persistent volumes are created
|
||
kubectl get pvc -n monitoring
|
||
|
||
# Check ServiceMonitor CRDs are available (for application monitoring)
|
||
kubectl get crd | grep monitoring.coreos.com
|
||
```
|
||
|
||
#### Enable cert-manager Monitoring Integration
|
||
|
||
```bash
|
||
# Enable Prometheus monitoring in cert-manager now that ServiceMonitor CRDs exist
|
||
helm upgrade cert-manager jetstack/cert-manager \
|
||
--namespace cert-manager \
|
||
--set prometheus.enabled=true \
|
||
--set prometheus.servicemonitor.enabled=true \
|
||
--reuse-values
|
||
```
|
||
|
||
#### Access Monitoring Services
|
||
|
||
##### Grafana Dashboard
|
||
```bash
|
||
# Port forward to access Grafana
|
||
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
|
||
|
||
# Access via browser: http://localhost:3000
|
||
# Username: admin
|
||
# Password: admin123 (from values file)
|
||
```
|
||
|
||
##### Prometheus UI
|
||
```bash
|
||
# Port forward to access Prometheus
|
||
kubectl port-forward -n monitoring svc/monitoring-prometheus 9090:9090 &
|
||
|
||
# Access via browser: http://localhost:9090
|
||
# Check targets: http://localhost:9090/targets
|
||
```
|
||
|
||
#### Cleanup Commands (if needed)
|
||
|
||
If you need to completely remove monitoring for a fresh start:
|
||
|
||
```bash
|
||
# Uninstall Helm release
|
||
helm uninstall monitoring -n monitoring
|
||
|
||
# Remove namespace
|
||
kubectl delete namespace monitoring
|
||
|
||
# Remove any remaining cluster-wide resources
|
||
kubectl get clusterroles | grep monitoring | awk '{print $1}' | xargs -r kubectl delete clusterrole
|
||
kubectl get clusterrolebindings | grep monitoring | awk '{print $1}' | xargs -r kubectl delete clusterrolebinding
|
||
```
|
||
|
||
#### What we installed
|
||
|
||
With monitoring successfully deployed:
|
||
- Grafana provides pre-configured Kubernetes dashboards
|
||
- Prometheus collects metrics from all cluster components
|
||
- ServiceMonitor CRDs are available for application-specific metrics
|
||
- AlertManager handles alert routing and notifications
|
||
|
||
### Phase 4: Secrets
|
||
|
||
#### Stap 1: Installeer External Secrets Operator
|
||
|
||
```bash
|
||
# Add Helm repository
|
||
helm repo add external-secrets https://charts.external-secrets.io
|
||
helm repo update
|
||
|
||
# Install External Secrets Operator
|
||
helm install external-secrets external-secrets/external-secrets \
|
||
--namespace external-secrets-system \
|
||
--create-namespace
|
||
|
||
# Verify installation
|
||
kubectl get pods -n external-secrets-system
|
||
|
||
# Check CRDs zijn geïnstalleerd
|
||
kubectl get crd | grep external-secrets
|
||
```
|
||
|
||
#### Stap 2: Maak Scaleway API credentials aan
|
||
|
||
Je hebt Scaleway API credentials nodig voor de operator:
|
||
|
||
```bash
|
||
# Create secret with Scaleway API credentials
|
||
kubectl create secret generic scaleway-credentials \
|
||
--namespace eveai-staging \
|
||
--from-literal=access-key="JOUW_SCALEWAY_ACCESS_KEY" \
|
||
--from-literal=secret-key="JOUW_SCALEWAY_SECRET_KEY"
|
||
```
|
||
|
||
**Note:** Je krijgt deze credentials via:
|
||
- Scaleway Console → Project settings → API Keys
|
||
- Of via `scw iam api-key list` als je de CLI gebruikt
|
||
|
||
#### Stap 3: Verifieer SecretStore configuratie
|
||
|
||
Verifieer bestand: `scaleway/manifests/base/secrets/clustersecretstore-scaleway.yaml`. Daar moet de juiste project ID worden ingevoerd.
|
||
|
||
#### Stap 4: Verifieer ExternalSecret resource
|
||
|
||
Verifieer bestand: `scaleway/manifests/base/secrets/eveai-external-secrets.yaml`
|
||
|
||
**Belangrijk:**
|
||
- Scaleway provider vereist `key: name:secret-name` syntax
|
||
- SSL/TLS certificaten kunnen niet via `dataFrom/extract` worden opgehaald
|
||
- Certificaten moeten via `data` sectie worden toegevoegd
|
||
|
||
#### Stap 5: Deploy secrets
|
||
|
||
```bash
|
||
# Deploy SecretStore
|
||
kubectl apply -f scaleway/manifests/base/secrets/clustersecretstore-scaleway.yaml
|
||
|
||
# Deploy ExternalSecret
|
||
kubectl apply -f scaleway/manifests/base/secrets/eveai-external-secrets.yaml
|
||
```
|
||
|
||
#### Stap 6: Verificatie
|
||
|
||
```bash
|
||
# Check ExternalSecret status
|
||
kubectl get externalsecrets -n eveai-staging
|
||
|
||
# Check of het Kubernetes secret is aangemaakt
|
||
kubectl get secret eveai-secrets -n eveai-staging
|
||
|
||
# Check alle keys in het secret
|
||
kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data}' | jq 'keys'
|
||
|
||
# Check specifieke waarde (base64 decoded)
|
||
kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data.DB_HOST}' | base64 -d
|
||
|
||
# Check ExternalSecret events voor troubleshooting
|
||
kubectl describe externalsecret eveai-external-secrets -n eveai-staging
|
||
```
|
||
|
||
#### Stap 7: Gebruik in deployment
|
||
|
||
Je kunt nu deze secrets gebruiken in de deployment van de applicatie services die deze nodig hebben (TODO):
|
||
|
||
```yaml
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
metadata:
|
||
name: eveai-app
|
||
namespace: eveai-staging
|
||
spec:
|
||
selector:
|
||
matchLabels:
|
||
app: eveai-app
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: eveai-app
|
||
spec:
|
||
containers:
|
||
- name: eveai-app
|
||
envFrom:
|
||
- secretRef:
|
||
name: eveai-secrets # Alle environment variables uit één secret
|
||
# Je Python code gebruikt gewoon environ.get('DB_HOST') etc.
|
||
```
|
||
|
||
#### Stap 8: Redis certificaat gebruiken in Python
|
||
|
||
Voor SSL Redis connecties met het certificaat:
|
||
|
||
```python
|
||
# Voorbeeld in je config.py
|
||
import tempfile
|
||
import ssl
|
||
import redis
|
||
from os import environ
|
||
|
||
class StagingConfig:
|
||
def __init__(self):
|
||
self.REDIS_CERT_DATA = environ.get('REDIS_CERT')
|
||
self.REDIS_BASE_URI = environ.get('REDIS_BASE_URI', 'redis://localhost:6379/0')
|
||
|
||
def create_redis_connection(self):
|
||
if self.REDIS_CERT_DATA:
|
||
# Schrijf certificaat naar tijdelijk bestand
|
||
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.pem') as f:
|
||
f.write(self.REDIS_CERT_DATA)
|
||
cert_path = f.name
|
||
|
||
# Redis connectie met SSL certificaat
|
||
return redis.from_url(
|
||
self.REDIS_BASE_URI,
|
||
ssl_cert_reqs=ssl.CERT_REQUIRED,
|
||
ssl_ca_certs=cert_path
|
||
)
|
||
else:
|
||
return redis.from_url(self.REDIS_BASE_URI)
|
||
|
||
# Gebruik voor session Redis
|
||
@property
|
||
def SESSION_REDIS(self):
|
||
return self.create_redis_connection()
|
||
```
|
||
|
||
#### Scaleway Secret Manager Vereisten
|
||
|
||
Voor deze setup moeten je secrets in Scaleway Secret Manager correct gestructureerd zijn:
|
||
|
||
**JSON secrets (eveai-postgresql, eveai-redis, etc.):**
|
||
```json
|
||
{
|
||
"DB_HOST": "your-postgres-host.rdb.fr-par.scw.cloud",
|
||
"DB_USER": "eveai_user",
|
||
"DB_PASS": "your-password",
|
||
"DB_NAME": "eveai_staging",
|
||
"DB_PORT": "5432"
|
||
}
|
||
```
|
||
|
||
**SSL/TLS Certificaat (eveai-redis-certificate):**
|
||
```
|
||
-----BEGIN CERTIFICATE-----
|
||
MIIDGTCCAgGg...z69LXyY=
|
||
-----END CERTIFICATE-----
|
||
```
|
||
|
||
#### Voordelen van deze setup
|
||
|
||
- **Automatische sync**: Secrets worden elke 5 minuten geüpdatet
|
||
- **Geen code wijzigingen**: Je `environ.get()` calls blijven werken
|
||
- **Secure**: Credentials zijn niet in manifests, alleen in cluster
|
||
- **Centralized**: Alle secrets in Scaleway Secret Manager
|
||
- **Auditable**: External Secrets Operator logt alle acties
|
||
- **SSL support**: TLS certificaten worden correct behandeld
|
||
|
||
#### File structuur
|
||
|
||
```
|
||
scaleway/manifests/base/secrets/
|
||
├── scaleway-secret-store.yaml
|
||
└── eveai-external-secrets.yaml
|
||
```
|
||
|
||
### Phase 5: TLS en Network setup
|
||
|
||
#### Deploy HTTP ACME ingress
|
||
|
||
Om het certificaat aan te maken, moet in de DNS-zone een A-record worden aangemaakt dat rechtstreeks naar het IP van de loadbalancer wijst.
|
||
We maken nog geen CNAME aan naar Bunny.net. Anders gaat bunny.net het ACME proces mogelijks onderbreken.
|
||
|
||
Om het certificaat aan te maken, moeten we een HTTP ACME ingress gebruiken. Anders kan het certificaat niet worden aangemaakt.
|
||
|
||
```
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-http-acme.yaml
|
||
```
|
||
|
||
Check of het certificaat is aangemaakt (READY moet true zijn):
|
||
|
||
```
|
||
kubectl get certificate evie-staging-tls -n eveai-staging
|
||
|
||
# of met meer detail
|
||
|
||
kubectl -n eveai-staging describe certificate evie-staging-tls
|
||
```
|
||
|
||
Dit kan even duren. Maar zodra het certificaat is aangemaakt, kan je de de https-only ingress opzetten:
|
||
|
||
#### Apply per-prefix headers (moet bestaan vóór de Ingress die ernaar verwijst)
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/networking/headers-configmaps.yaml
|
||
```
|
||
|
||
#### Apply ingresses
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml # alleen /verify
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-admin.yaml # /admin → eveai-app-service
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-api.yaml # /api → eveai-api-service
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-chat-client.yaml # /chat-client → eveai-chat-client-service
|
||
|
||
# Alternatief: via overlay (mits kustomization.yaml is bijgewerkt)
|
||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||
```
|
||
|
||
Om bunny.net te gebruiken:
|
||
- Nu kan het CNAME-record dat verwijst naar de Bunny.net Pull zone worden aangemaakt.
|
||
- In bunny.net moet in de pull-zone worden verwezen naar de loadbalancer IP via het HTTPS-protocol.
|
||
|
||
### Phase 6: Verification Service
|
||
|
||
Deze service kan ook al in Phase 5 worden geïnstalleerd om te verifiëren of de volledige netwerkstack (over bunny, certificaat, ...) werkt.
|
||
|
||
```bash
|
||
# Deploy verification service
|
||
kubectl apply -k scaleway/manifests/base/applications/verification/
|
||
|
||
|
||
### Phase 7: Complete Staging Deployment
|
||
|
||
```bash
|
||
# Deploy everything using the staging overlay
|
||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||
|
||
# Verify complete deployment
|
||
kubectl get all -n eveai-staging
|
||
kubectl get ingress -n eveai-staging
|
||
kubectl get certificates -n eveai-staging
|
||
```
|
||
|
||
### Verificatie commando's
|
||
|
||
Controleer ingresses en headers:
|
||
|
||
```bash
|
||
kubectl -n eveai-staging get ing
|
||
kubectl -n eveai-staging describe ing eveai-admin-ingress
|
||
kubectl -n eveai-staging describe ing eveai-api-ingress
|
||
kubectl -n eveai-staging describe ing eveai-chat-client-ingress
|
||
kubectl -n eveai-staging describe ing eveai-staging-ingress # bevat /verify
|
||
kubectl -n eveai-staging get cm eveai-admin-headers eveai-api-headers eveai-chat-headers -o yaml
|
||
```
|
||
|
||
- In elke prefix-Ingress moeten de annotations zichtbaar zijn: use-regex: true, rewrite-target: /$2, proxy-set-headers: eveai-staging/eveai--headers.
|
||
- In de ConfigMaps moet de key X-Forwarded-Prefix de juiste waarde hebben (/admin, /api, /chat-client).
|
||
End-to-end testen:
|
||
|
||
- https://evie-staging.askeveai.com/admin/login → loginpagina. In app-logs zie je PATH zonder /admin (door rewrite) maar URL met /admin.
|
||
- Na login: 302 Location: /admin/user/tenant_overview.
|
||
- API: https://evie-staging.askeveai.com/api/… → backend ontvangt pad zonder /api.
|
||
- Chat client: https://evie-staging.askeveai.com/chat-client/… → juiste service.
|
||
- Verify: https://evie-staging.askeveai.com/verify → ongewijzigd via ingress-https.yaml.
|
||
- Root: zolang Bunny rule niet actief is, geen automatische redirect op / (verwacht gedrag).
|
||
|
||
### Phase 7: Install PgAdmin Tool
|
||
|
||
#### Secret eveai-pgadmin-admin in Scaleway Secret Manager aanmaken (indien niet bestaat)
|
||
|
||
2 Keys:
|
||
- `PGADMIN_DEFAULT_EMAIL`: E-mailadres voor de admin
|
||
- `PGADMIN_DEFAULT_PASSWORD`: voor de admin
|
||
|
||
#### Secrets deployen
|
||
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/tools/pgadmin/externalsecrets.yaml
|
||
|
||
# Check
|
||
kubectl get externalsecret -n tools
|
||
kubectl get secret -n tools | grep pgadmin
|
||
```
|
||
|
||
#### Helm chart toepassen
|
||
|
||
```bash
|
||
helm repo add runix https://helm.runix.net
|
||
helm repo update
|
||
helm install pgadmin runix/pgadmin4 \
|
||
-n tools \
|
||
--create-namespace \
|
||
-f scaleway/manifests/base/tools/pgadmin/values.yaml
|
||
|
||
# Check status
|
||
kubectl get pods,svc -n tools
|
||
kubectl logs -n tools deploy/pgadmin-pgadmin4 || true
|
||
```
|
||
|
||
#### Port Forward, Local Access
|
||
|
||
```bash
|
||
# Find the service name (often "pgadmin")
|
||
kubectl -n tools get svc
|
||
# Forward local port 8080 to service port 80
|
||
kubectl -n tools port-forward svc/pgadmin-pgadmin4 8080:80
|
||
# Browser: http://localhost:8080
|
||
# Login with PGADMIN_DEFAULT_EMAIL / PGADMIN_DEFAULT_PASSWORD (from eveai-pgadmin-admin)
|
||
```
|
||
|
||
### Phase 8: RedisInsight Tool Deployment
|
||
|
||
#### Installatie via kubectl (zonder Helm)
|
||
Gebruik een eenvoudig manifest met Deployment + Service + PVC in de `tools` namespace. Dit vermijdt externe chart repositories en extra authenticatie.
|
||
```bash
|
||
# Apply manifest (maakt namespace tools aan indien nodig)
|
||
kubectl apply -f scaleway/manifests/base/tools/redisinsight/redisinsight.yaml
|
||
|
||
# Controleer resources
|
||
kubectl -n tools get pods,svc,pvc
|
||
```
|
||
|
||
#### (Optioneel) ExternalSecrets voor gemak (eigenlijk niet nodig)
|
||
Indien je de Redis-credentials en CA-cert in namespace `tools` wil spiegelen (handig om het CA-bestand eenvoudig te exporteren en/of later provisioning te doen):
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/tools/redisinsight/externalsecrets.yaml
|
||
kubectl -n tools get externalsecret
|
||
kubectl -n tools get secret | grep redisinsight
|
||
```
|
||
|
||
CA-bestand lokaal opslaan voor UI-upload (alleen nodig als je ExternalSecrets gebruikte):
|
||
```bash
|
||
kubectl -n tools get secret redisinsight-ca -o jsonpath='{.data.REDIS_CERT}' | base64 -d > /tmp/redis-ca.pem
|
||
```
|
||
|
||
#### Port Forward, Local Access
|
||
```bash
|
||
# RedisInsight v2 luistert op poort 5540
|
||
kubectl -n tools port-forward svc/redisinsight 5540:5540
|
||
# Browser: http://localhost:5540
|
||
```
|
||
|
||
#### UI: Redis verbinden
|
||
- Host: `172.16.16.2`
|
||
- Port: `6379`
|
||
- Auth: username `luke`, password uit secret (eveai-redis of redisinsight-redis)
|
||
- TLS: zet TLS aan en upload het CA-certificaat (PEM)
|
||
- Certificaatverificatie: omdat je via IP verbindt en geen hostname in het certificaat staat, kan strict verify falen. Zet dan "Verify server certificate"/"Check server identity" uit in de UI. Dit is normaal bij private networking via IP.
|
||
|
||
#### Troubleshooting
|
||
- Controleer pods, service en PVC in `tools`:
|
||
```bash
|
||
kubectl -n tools get pods,svc,pvc
|
||
```
|
||
- NetworkPolicies: indien actief, laat egress toe van `tools` → `172.16.16.2:6379`.
|
||
- TLS-issues via IP: zet verify uit of gebruik een DNS-hostnaam die met het cert overeenkomt (indien beschikbaar).
|
||
- PVC niet bound: specificeer een geldige `storageClassName` in het manifest.
|
||
|
||
|
||
### Phase 9: Application Services Deployment
|
||
|
||
#### Create Scaleway Registry Secret
|
||
Create docker pull secret via External Secrets (once):
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
|
||
kubectl -n eveai-staging get secret scaleway-registry-cred -o yaml | grep "type: kubernetes.io/dockerconfigjson"
|
||
```
|
||
|
||
#### Ops Jobs Invocation (if required)
|
||
|
||
Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create.
|
||
|
||
```bash
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/02-db-bootstrap-ext-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-bootstrap-ext --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-public --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/04-db-migrate-tenant-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-tenant --timeout=3600s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/05-seed-or-init-data-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-seed-or-init --timeout=1800s
|
||
|
||
kubectl create -f scaleway/manifests/base/applications/ops/jobs/06-verify-minimal-job.yaml
|
||
kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-verify-minimal --timeout=900s
|
||
```
|
||
|
||
View logs (you can see the created job name as a result from the create command):
|
||
```bash
|
||
kubectl -n eveai-staging get jobs
|
||
kubectl -n eveai-staging logs job/<created-job-name>
|
||
```
|
||
|
||
#### Creating volume for eveai_chat_worker's crewai storage
|
||
|
||
```bash
|
||
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
|
||
```
|
||
|
||
#### Application Services Deployment
|
||
Use the staging overlay to deploy apps with registry rewrite and imagePullSecrets:
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||
```
|
||
|
||
##### Deploy backend workers
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/base/applications/backend/
|
||
|
||
kubectl -n eveai-staging get deploy | egrep 'eveai-(workers|chat-workers|entitlements)'
|
||
# Optional: quick logs
|
||
kubectl -n eveai-staging logs deploy/eveai-workers --tail=100 || true
|
||
kubectl -n eveai-staging logs deploy/eveai-chat-workers --tail=100 || true
|
||
kubectl -n eveai-staging logs deploy/eveai-entitlements --tail=100 || true
|
||
```
|
||
|
||
##### Deploy frontend services
|
||
```bash
|
||
kubectl apply -k scaleway/manifests/base/applications/frontend/
|
||
|
||
kubectl -n eveai-staging get deploy,svc | egrep 'eveai-(app|api|chat-client)'
|
||
```
|
||
|
||
##### Verify Ingress routes (Ingress managed separately)
|
||
Ingress is intentionally not managed by the staging Kustomize overlay. Apply or update it manually using your existing manifest and handle it per your cluster-install.md guide:
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
|
||
kubectl -n eveai-staging describe ingress eveai-staging-ingress
|
||
```
|
||
Then verify the routes:
|
||
```bash
|
||
curl -k https://evie-staging.askeveai.com/verify/health
|
||
curl -k https://evie-staging.askeveai.com/admin/healthz/ready
|
||
curl -k https://evie-staging.askeveai.com/api/healthz/ready
|
||
curl -k https://evie-staging.askeveai.com/client/healthz/ready
|
||
```
|
||
|
||
#### Updating the staging deployment
|
||
|
||
- Als je de images met dezelfde tag (bijv. :staging) opnieuw hebt gepusht én je staging pods gebruiken imagePullPolicy: Always (zoals in de handleiding), dan hoef je alleen een rollout te triggeren zodat de pods opnieuw starten en de nieuwste image pullen.
|
||
- Doe dit in de juiste namespace (waarschijnlijk eveai-staging) met kubectl rollout restart.
|
||
|
||
##### Snelste manier (alle deployments in één keer)
|
||
```bash
|
||
# Staging namespace (pas aan als je een andere gebruikt)
|
||
kubectl -n eveai-staging rollout restart deployment
|
||
|
||
# Optioneel: status volgen totdat alles klaar is
|
||
kubectl -n eveai-staging rollout status deploy --all
|
||
|
||
# Controleren welke image draait per pod
|
||
kubectl -n eveai-staging get pods -o=jsonpath='{range .items[*]}{@.metadata.name}{"\t"}{range .spec.containers[*]}{@.image}{" "}{end}{"\n"}{end}'
|
||
```
|
||
|
||
Dit herstart alle Deployments in de namespace. Omdat imagePullPolicy: Always staat, zal Kubernetes de nieuwste image voor de gebruikte tag (bijv. :staging) ophalen.
|
||
|
||
##### Specifieke services opnieuw starten
|
||
Wil je alleen bepaalde services restarten:
|
||
```bash
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-app
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-api
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-chat-client
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-workers
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-chat-workers
|
||
kubectl -n eveai-staging rollout restart deployment/eveai-entitlements
|
||
|
||
kubectl -n eveai-staging rollout status deployment/eveai-app
|
||
```
|
||
|
||
##### Alternatief: (her)apply van manifesten
|
||
De handleiding plaatst de manifests in scaleway/manifests en beschrijft het gebruik van Kustomize overlays. Je kunt ook simpelweg opnieuw apply-en:
|
||
```bash
|
||
# Overlay die images herschrijft naar de Scaleway registry en imagePullSecrets toevoegt
|
||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||
|
||
# Backend en frontend (indien je base afzonderlijk gebruikt)
|
||
kubectl apply -k scaleway/manifests/base/applications/backend/
|
||
kubectl apply -k scaleway/manifests/base/applications/frontend/
|
||
```
|
||
Let op: apply alleen triggert niet altijd een rollout als er geen inhoudelijke spec-wijziging is. Combineer dit zo nodig met een rollout restart zoals hierboven.
|
||
|
||
##### Als je met versie-tags werkt (productie-achtig)
|
||
- Gebruik je géén channel tag (:staging/:production) maar een vaste, versiegebonden tag (bijv. :v1.2.3) en imagePullPolicy: IfNotPresent, dan moet je óf:
|
||
- de tag in je manifest/overlay aanpassen naar de nieuwe versie en opnieuw apply-en, of
|
||
- met een eenmalige set-image een nieuwe ReplicaSet forceren:
|
||
```bash
|
||
kubectl -n eveai-staging set image deploy/eveai-api eveai-api=rg.fr-par.scw.cloud/<namespace>/josakola/eveai-api:v1.2.4
|
||
kubectl -n eveai-staging rollout status deploy/eveai-api
|
||
```
|
||
|
||
##### Troubleshooting
|
||
- Check of de registry pull secret aanwezig is (volgens handleiding):
|
||
```bash
|
||
kubectl apply -f scaleway/manifests/base/secrets/scaleway-registry-secret.yaml
|
||
kubectl -n eveai-staging get secret scaleway-registry-cred
|
||
```
|
||
- Bekijk events/logs als pods niet up komen:
|
||
```bash
|
||
kubectl get events -n eveai-staging --sort-by=.lastTimestamp
|
||
kubectl -n eveai-staging describe pod <pod-naam>
|
||
kubectl -n eveai-staging logs deploy/eveai-api --tail=200
|
||
```
|
||
|
||
### Phase 10: Cockpit Setup
|
||
|
||
#### Standard Cockpit Setup
|
||
- Create a grafana user (Cockpit > Grafana Users > Add user)
|
||
- Open Grafana Dashboard (Cockpit > Open Dashboards)
|
||
- Er zijn heel wat dashboards beschikbaar.
|
||
- Kubernetes cluster overview (metrics)
|
||
- Kubernetes cluster logs (controlplane logs)
|
||
|
||
### Phase 11: Flower Setup
|
||
|
||
#### Overzicht
|
||
Flower is de Celery monitoring UI. We deployen Flower in de namespace `monitoring` via de bjw-s/app-template Helm chart. Er is geen Ingress; toegang gebeurt enkel lokaal via `kubectl port-forward`. Verbinding naar Redis gebruikt TLS met je private CA; hostname‑verificatie staat uit omdat je via IP verbindt.
|
||
|
||
#### Helm repository toevoegen
|
||
```bash
|
||
helm repo add bjw-s https://bjw-s-labs.github.io/helm-charts
|
||
helm repo update
|
||
helm search repo bjw-s/app-template
|
||
```
|
||
|
||
#### Deploy (aanbevolen: alleen Flower via Helm CLI)
|
||
Gebruik gerichte commando’s zodat enkel Flower wordt beheerd door Helm en de rest van de monitoring stack ongemoeid blijft.
|
||
```bash
|
||
# 1) ExternalSecrets en NetworkPolicy aanmaken
|
||
kubectl apply -f scaleway/manifests/base/monitoring/flower/externalsecrets.yaml
|
||
kubectl apply -f scaleway/manifests/base/monitoring/flower/networkpolicy.yaml
|
||
|
||
# 2) Flower installeren via Helm (alleen deze release)
|
||
helm upgrade --install flower bjw-s/app-template \
|
||
-n monitoring --create-namespace \
|
||
-f scaleway/manifests/base/monitoring/flower/values.yaml
|
||
```
|
||
Wat dit deployt:
|
||
- ExternalSecrets: `flower-redis` (REDIS_USER/PASS/URL/PORT) en `flower-ca` (REDIS_CERT) uit `scaleway-cluster-secret-store`
|
||
- Flower via Helm (bjw-s/app-template):
|
||
- Image: `mher/flower:2.0.1` (gepind)
|
||
- Start: `/usr/local/bin/celery --broker=$(BROKER) flower --address=0.0.0.0 --port=5555`
|
||
- TLS naar Redis met CA-mount op `/etc/ssl/redis/ca.pem` en `ssl_check_hostname=false`
|
||
- Hardened securityContext (non-root, read-only rootfs, capabilities drop)
|
||
- Probes en resource requests/limits
|
||
- Service: ClusterIP `flower` op poort 5555
|
||
- NetworkPolicy: ingress default-deny; egress enkel naar Redis (172.16.16.2:6379/TCP) en CoreDNS (53 TCP/UDP)
|
||
|
||
#### Verifiëren
|
||
```bash
|
||
# Helm release en resources
|
||
helm list -n monitoring
|
||
kubectl -n monitoring get externalsecret
|
||
kubectl -n monitoring get secret | grep flower
|
||
kubectl -n monitoring get deploy,po,svc | grep flower
|
||
kubectl -n monitoring logs deploy/flower --tail=200 || true
|
||
```
|
||
|
||
#### Toegang (port-forward)
|
||
```bash
|
||
kubectl -n monitoring port-forward svc/flower 5555:5555
|
||
# Browser: http://localhost:5555
|
||
```
|
||
|
||
#### Security & TLS
|
||
- Geen Ingress/extern verkeer; enkel port-forward.
|
||
- TLS naar Redis met CA-mount op `/etc/ssl/redis/ca.pem`.
|
||
- Omdat je Redis via IP aanspreekt, staat `ssl_check_hostname=false`.
|
||
- Strikte egress NetworkPolicy: update het IP indien je Redis IP verandert.
|
||
|
||
#### Troubleshooting
|
||
```bash
|
||
# Secrets en ExternalSecrets
|
||
kubectl -n monitoring describe externalsecret flower-redis
|
||
kubectl -n monitoring describe externalsecret flower-ca
|
||
|
||
# Pods & logs
|
||
kubectl -n monitoring get pods -l app=flower -w
|
||
kubectl -n monitoring logs deploy/flower --tail=200
|
||
|
||
# NetworkPolicy
|
||
kubectl -n monitoring describe networkpolicy flower-policy
|
||
```
|
||
|
||
#### Alternatief: Kustomize rendering (let op!)
|
||
Je kunt Flower ook via Kustomize renderen samen met de monitoring chart:
|
||
```bash
|
||
kubectl kustomize --enable-helm scaleway/manifests/base/monitoring | kubectl apply -f -
|
||
```
|
||
Let op: dit rendert en apply’t álle resources in de monitoring Kustomization, inclusief de kube-prometheus-stack chart. Gebruik dit alleen als je bewust de volledige monitoring stack declaratief wil bijwerken.
|
||
|
||
#### Migratie & Opschonen
|
||
Als je eerder de losse Deployment/Service hebt gebruikt:
|
||
```bash
|
||
kubectl -n monitoring delete deploy flower --ignore-not-found
|
||
kubectl -n monitoring delete svc flower --ignore-not-found
|
||
```
|
||
|
||
## Verification and Testing
|
||
|
||
### Check Infrastructure Status
|
||
```bash
|
||
# Verify ingress controller
|
||
kubectl get pods -n ingress-nginx
|
||
kubectl describe service ingress-nginx-controller -n ingress-nginx
|
||
|
||
# Verify cert-manager
|
||
kubectl get pods -n cert-manager
|
||
kubectl get clusterissuers
|
||
|
||
# Check certificate status (may take a few minutes to issue)
|
||
kubectl describe certificate evie-staging-tls -n eveai-staging
|
||
```
|
||
|
||
### Test Services
|
||
```bash
|
||
# Get external IP from LoadBalancer
|
||
kubectl get svc -n ingress-nginx ingress-nginx-controller
|
||
|
||
# Test HTTPS access (replace with your domain)
|
||
curl -k https://evie-staging.askeveai.com/verify/health
|
||
curl -k https://evie-staging.askeveai.com/verify/info
|
||
|
||
# Test monitoring (if deployed)
|
||
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
|
||
# Access Grafana at http://localhost:3000 (admin/admin123)
|
||
```
|
||
|
||
## DNS Configuration
|
||
|
||
### Update DNS Records
|
||
- Create A-record pointing to LoadBalancer external IP
|
||
- Or set up CNAME if using CDN
|
||
|
||
### Test Domain Access
|
||
```bash
|
||
# Test domain resolution
|
||
nslookup evie-staging.askeveai.com
|
||
|
||
# Test HTTPS access via domain
|
||
curl https://evie-staging.askeveai.com/verify/
|
||
```
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
## EveAI Chat Workers: Persistent logs storage and Celery process behavior
|
||
|
||
This addendum describes how to enable persistent storage for CrewAI tuning runs under /app/logs for the eveai-chat-workers Deployment and clarifies Celery process behavior relevant to environment variables.
|
||
|
||
### Celery prefork behavior and env variables
|
||
- Pool: prefork (default). Each worker process (child) handles multiple tasks sequentially.
|
||
- Implication: any environment variable changed inside a child process persists for subsequent tasks handled by that same child, until it is changed again or the process is recycled.
|
||
- Our practice: set required env vars (e.g., CREWAI_STORAGE_DIR/CREWAI_STORAGE_PATH) immediately before initializing CrewAI and restore them immediately after. This prevents leakage to the next task in the same process.
|
||
- CELERY_MAX_TASKS_PER_CHILD: the number of tasks a child will process before being recycled. Suggested starting range for heavy LLM/RAG workloads: 200–500; 1000 is acceptable if memory growth is stable. Monitor RSS and adjust.
|
||
|
||
### Create and mount a PersistentVolumeClaim for /app/logs
|
||
We persist tuning outputs under /app/logs by mounting a PVC in the worker pod.
|
||
|
||
Manifests added/updated (namespace: eveai-staging):
|
||
- scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
|
||
- scaleway/manifests/base/applications/backend/eveai-chat-workers/deployment.yaml (volume mount added)
|
||
|
||
Apply with kubectl (no Kustomize required):
|
||
|
||
```bash
|
||
# Create or update the PVC for logs
|
||
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/pvc.yaml
|
||
|
||
# Update the Deployment to mount the PVC at /app/logs
|
||
kubectl apply -n eveai-staging -f scaleway/manifests/base/applications/backend/eveai-chat-workers/deployment.yaml
|
||
```
|
||
|
||
Verify PVC is bound and the pod mounts the volume:
|
||
|
||
```bash
|
||
# Check PVC status
|
||
kubectl get pvc -n eveai-staging eveai-chat-workers-logs -o wide
|
||
|
||
# Inspect the pod to confirm the volume mount
|
||
kubectl get pods -n eveai-staging -l app=eveai-chat-workers -o name
|
||
kubectl describe pod -n eveai-staging <pod-name>
|
||
|
||
# (Optional) Exec into the pod to check permissions and path
|
||
kubectl exec -n eveai-staging -it <pod-name> -- sh -lc 'id; ls -ld /app/logs'
|
||
```
|
||
|
||
Permissions and securityContext notes:
|
||
- The container runs as a non-root user (appuser) per Dockerfile.base. Some storage classes mount volumes owned by root. If you encounter permission issues (EACCES) writing to /app/logs:
|
||
- Option A: set a pod-level fsGroup so the mounted volume is group-writable by the container user.
|
||
- Option B: use an initContainer to chown/chmod /app/logs on the mounted volume.
|
||
- Keep monitoring PVC usage and set alerts to avoid running out of space.
|
||
|
||
Retention / cleanup recommendation:
|
||
- For a 14-day retention, create a CronJob that runs daily to remove files older than 14 days and then delete empty directories, mounting the same PVC at /app/logs. Example command:
|
||
|
||
```bash
|
||
find /app/logs -type f -mtime +14 -print -delete; find /app/logs -type d -empty -mtime +14 -print -delete
|
||
```
|
||
|
||
Operational checks after deployment:
|
||
1) Trigger a CrewAI tuning run; verify files appear under /app/logs and remain after pod restarts.
|
||
2) Trigger a non-tuning run; verify temporary directories are created and cleaned up automatically.
|
||
3) Monitor memory while varying CELERY_CONCURRENCY and CELERY_MAX_TASKS_PER_CHILD.
|