# EveAI Cluster Installation Guide (Updated for Modular Kustomize Setup) ## Prerequisites ### Required Tools ```bash # Verify required tools are installed kubectl version --client kustomize version helm version # Configure kubectl for Scaleway cluster scw k8s kubeconfig install kubectl cluster-info ``` ### Scaleway Prerequisites - Kubernetes cluster running - Managed services configured (PostgreSQL, Redis, MinIO) - Secrets stored in Scaleway Secret Manager: - `eveai-app-keys`, `eveai-mistral`, `eveai-object-storage` - `eveai-openai`, `eveai-postgresql`, `eveai-redis`, `eveai-redis-certificate` - Flexible IP address (LoadBalancer) - Eerst een loadbalancer aanmaken met publiek IP - Daarna de loadbalancer verwijderen maar flexible IPs behouden - Dit externe IP is het IP adres dat moet worden verwerkt in ingress-values.yaml! ## New Modular Deployment Process ### Phase 1: Infrastructure Foundation Deploy core infrastructure components in the correct order: ```bash # 1. Deploy namespaces kubectl apply -f scaleway/manifests/base/infrastructure/00-namespaces.yaml # 2. Add NGINX Ingress Helm repository helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo update # 3. Deploy NGINX ingress controller via Helm helm install ingress-nginx ingress-nginx/ingress-nginx \ --namespace ingress-nginx \ --create-namespace \ --values scaleway/manifests/base/infrastructure/ingress-values.yaml # 4. Wait for ingress controller to be ready kubectl wait --namespace ingress-nginx \ --for=condition=ready pod \ --selector=app.kubernetes.io/component=controller \ --timeout=300s # 5. Add cert-manager Helm repository helm repo add jetstack https://charts.jetstack.io helm repo update # 6. Install cert-manager CRDs kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.crds.yaml # 7. Deploy cert-manager via Helm helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --values scaleway/manifests/base/infrastructure/cert-manager-values.yaml # 8. Wait for cert-manager to be ready kubectl wait --namespace cert-manager \ --for=condition=ready pod \ --selector=app.kubernetes.io/name=cert-manager \ --timeout=300s # 9. Deploy cluster issuers kubectl apply -f scaleway/manifests/base/infrastructure/03-cluster-issuers.yaml ``` ### Phase 2: Verification Infrastructure Components ```bash # Verify ingress controller kubectl get pods -n ingress-nginx kubectl get svc -n ingress-nginx # Verify cert-manager kubectl get pods -n cert-manager kubectl get clusterissuers # Check LoadBalancer external IP kubectl get svc -n ingress-nginx ingress-nginx-controller ``` ### Phase 3: Monitoring Stack (Optional but Recommended) #### Add Prometheus Community Helm Repository ```bash # Add Prometheus community Helm repository helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Verify chart availability helm search repo prometheus-community/kube-prometheus-stack ``` #### Create Monitoring Values File Create `scaleway/manifests/base/monitoring/prometheus-values.yaml`: #### Deploy Monitoring Stack ```bash # Install complete monitoring stack via Helm helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace \ --values scaleway/manifests/base/monitoring/prometheus-values.yaml # Monitor deployment progress kubectl get pods -n monitoring -w # Wait until all pods show STATUS: Running ``` #### Verify Monitoring Deployment ```bash # Check Helm release helm list -n monitoring # Verify all components are running kubectl get all -n monitoring # Check persistent volumes are created kubectl get pvc -n monitoring # Check ServiceMonitor CRDs are available (for application monitoring) kubectl get crd | grep monitoring.coreos.com ``` #### Enable cert-manager Monitoring Integration ```bash # Enable Prometheus monitoring in cert-manager now that ServiceMonitor CRDs exist helm upgrade cert-manager jetstack/cert-manager \ --namespace cert-manager \ --set prometheus.enabled=true \ --set prometheus.servicemonitor.enabled=true \ --reuse-values ``` #### Access Monitoring Services ##### Grafana Dashboard ```bash # Port forward to access Grafana kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80 # Access via browser: http://localhost:3000 # Username: admin # Password: admin123 (from values file) ``` ##### Prometheus UI ```bash # Port forward to access Prometheus kubectl port-forward -n monitoring svc/monitoring-prometheus 9090:9090 & # Access via browser: http://localhost:9090 # Check targets: http://localhost:9090/targets ``` #### Cleanup Commands (if needed) If you need to completely remove monitoring for a fresh start: ```bash # Uninstall Helm release helm uninstall monitoring -n monitoring # Remove namespace kubectl delete namespace monitoring # Remove any remaining cluster-wide resources kubectl get clusterroles | grep monitoring | awk '{print $1}' | xargs -r kubectl delete clusterrole kubectl get clusterrolebindings | grep monitoring | awk '{print $1}' | xargs -r kubectl delete clusterrolebinding ``` #### What we installed With monitoring successfully deployed: - Grafana provides pre-configured Kubernetes dashboards - Prometheus collects metrics from all cluster components - ServiceMonitor CRDs are available for application-specific metrics - AlertManager handles alert routing and notifications ### Phase 4: Secrets #### Stap 1: Installeer External Secrets Operator ```bash # Add Helm repository helm repo add external-secrets https://charts.external-secrets.io helm repo update # Install External Secrets Operator helm install external-secrets external-secrets/external-secrets \ --namespace external-secrets-system \ --create-namespace # Verify installation kubectl get pods -n external-secrets-system # Check CRDs zijn geïnstalleerd kubectl get crd | grep external-secrets ``` #### Stap 2: Maak Scaleway API credentials aan Je hebt Scaleway API credentials nodig voor de operator: ```bash # Create secret with Scaleway API credentials kubectl create secret generic scaleway-credentials \ --namespace eveai-staging \ --from-literal=access-key="JOUW_SCALEWAY_ACCESS_KEY" \ --from-literal=secret-key="JOUW_SCALEWAY_SECRET_KEY" ``` **Note:** Je krijgt deze credentials via: - Scaleway Console → Project settings → API Keys - Of via `scw iam api-key list` als je de CLI gebruikt #### Stap 3: Verifieer SecretStore configuratie Verifieer bestand: `scaleway/manifests/base/secrets/scaleway-secret-store.yaml`. Daar moet de juiste project ID worden ingevoerd. #### Stap 4: Verifieer ExternalSecret resource Verifieer bestand: `scaleway/manifests/base/secrets/eveai-external-secrets.yaml` **Belangrijk:** - Scaleway provider vereist `key: name:secret-name` syntax - SSL/TLS certificaten kunnen niet via `dataFrom/extract` worden opgehaald - Certificaten moeten via `data` sectie worden toegevoegd #### Stap 5: Deploy secrets ```bash # Deploy SecretStore kubectl apply -f scaleway/manifests/base/secrets/scaleway-secret-store.yaml # Deploy ExternalSecret kubectl apply -f scaleway/manifests/base/secrets/eveai-external-secrets.yaml ``` #### Stap 6: Verificatie ```bash # Check ExternalSecret status kubectl get externalsecrets -n eveai-staging # Check of het Kubernetes secret is aangemaakt kubectl get secret eveai-secrets -n eveai-staging # Check alle keys in het secret kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data}' | jq 'keys' # Check specifieke waarde (base64 decoded) kubectl get secret eveai-secrets -n eveai-staging -o jsonpath='{.data.DB_HOST}' | base64 -d # Check ExternalSecret events voor troubleshooting kubectl describe externalsecret eveai-external-secrets -n eveai-staging ``` #### Stap 7: Gebruik in deployment Je kunt nu deze secrets gebruiken in de deployment van de applicatie services die deze nodig hebben (TODO): ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: eveai-app namespace: eveai-staging spec: template: spec: containers: - name: eveai-app envFrom: - secretRef: name: eveai-secrets # Alle environment variables uit één secret # Je Python code gebruikt gewoon environ.get('DB_HOST') etc. ``` #### Stap 8: Redis certificaat gebruiken in Python Voor SSL Redis connecties met het certificaat: ```python # In je config.py import tempfile import ssl import redis from os import environ class StagingConfig(Config): REDIS_CERT_DATA = environ.get('REDIS_CERT') def create_redis_connection(self): if self.REDIS_CERT_DATA: # Schrijf certificaat naar tijdelijk bestand with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.pem') as f: f.write(self.REDIS_CERT_DATA) cert_path = f.name # Redis connectie met SSL certificaat return redis.from_url( self.REDIS_BASE_URI, ssl_cert_reqs=ssl.CERT_REQUIRED, ssl_ca_certs=cert_path ) else: return redis.from_url(self.REDIS_BASE_URI) # Gebruik voor session Redis SESSION_REDIS = property(lambda self: self.create_redis_connection()) ``` #### Scaleway Secret Manager Vereisten Voor deze setup moeten je secrets in Scaleway Secret Manager correct gestructureerd zijn: **JSON secrets (eveai-postgresql, eveai-redis, etc.):** ```json { "DB_HOST": "your-postgres-host.rdb.fr-par.scw.cloud", "DB_USER": "eveai_user", "DB_PASS": "your-password", "DB_NAME": "eveai_staging", "DB_PORT": "5432" } ``` **SSL/TLS Certificaat (eveai-redis-certificate):** ``` -----BEGIN CERTIFICATE----- MIIDGTCCAgGg...z69LXyY= -----END CERTIFICATE----- ``` #### Voordelen van deze setup - **Automatische sync**: Secrets worden elke 5 minuten geüpdatet - **Geen code wijzigingen**: Je `environ.get()` calls blijven werken - **Secure**: Credentials zijn niet in manifests, alleen in cluster - **Centralized**: Alle secrets in Scaleway Secret Manager - **Auditable**: External Secrets Operator logt alle acties - **SSL support**: TLS certificaten worden correct behandeld #### File structuur ``` scaleway/manifests/base/secrets/ ├── scaleway-secret-store.yaml └── eveai-external-secrets.yaml ``` ### Phase 5: TLS en Network setup #### Deploy HTTP ACME ingress Om het certificaat aan te maken, moet in de DNS-zone een A-record worden aangemaakt dat rechtstreeks naar het IP van de loadbalancer wijst. We maken nog geen CNAME aan naar Bunny.net. Anders gaat bunny.net het ACME proces mogelijks onderbreken. Om het certificaat aan te maken, moeten we een HTTP ACME ingress gebruiken. Anders kan het certificaat niet worden aangemaakt. ``` kubectl apply -f scaleway/manifests/base/networking/ingress-http-acme.yaml ``` Check of het certificaat is aangemaakt (READY moet true zijn): ``` kubectl get certificate evie-staging-tls -n eveai-staging # of met meer detail kubectl -n eveai-staging describe certificate evie-staging-tls ``` Dit kan even duren. Maar zodra het certificaat is aangemaakt, kan je de de https-only ingress opzetten: ``` kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml ``` Om bunny.net te gebruiken: - Nu kan het CNAME-record dat verwijst naar de Bunny.net Pull zone worden aangemaakt. - In bunny.net moet in de pull-zone worden verwezen naar de loadbalancer IP via het HTTPS-protocol. ### Phase 6: Verification Service Deze service kan ook al in Phase 5 worden geïnstalleerd om te verifiëren of de volledige netwerkstack (over bunny, certificaat, ...) werkt. ```bash # Deploy verification service kubectl apply -k scaleway/manifests/base/applications/verification/ ### Phase 7: Complete Staging Deployment ```bash # Deploy everything using the staging overlay kubectl apply -k scaleway/manifests/overlays/staging/ # Verify complete deployment kubectl get all -n eveai-staging kubectl get ingress -n eveai-staging kubectl get certificates -n eveai-staging ``` ### Phase 7: Install PgAdmin Tool #### Secret eveai-pgadmin-admin in Scaleway Secret Manager aanmaken (indien niet bestaat) 2 Keys: - `PGADMIN_DEFAULT_EMAIL`: E-mailadres voor de admin - `PGADMIN_DEFAULT_PASSWORD`: voor de admin #### Secrets deployen ```bash kubectl apply -f scaleway/manifests/base/tools/pgadmin/externalsecrets.yaml # Check kubectl get externalsecret -n tools kubectl get secret -n tools | grep pgadmin ``` #### Helm chart toepassen ```bash helm repo add runix https://helm.runix.net helm repo update helm install pgadmin runix/pgadmin4 \ -n tools \ --create-namespace \ -f scaleway/manifests/base/tools/pgadmin/values.yaml # Check status kubectl get pods,svc -n tools kubectl logs -n tools deploy/pgadmin-pgadmin4 || true ``` #### Port Forward, Local Access ```bash # Find the service name (often "pgadmin") kubectl -n tools get svc # Forward local port 8080 to service port 80 kubectl -n tools port-forward svc/pgadmin-pgadmin4 8080:80 # Browser: http://localhost:8080 # Login with PGADMIN_DEFAULT_EMAIL / PGADMIN_DEFAULT_PASSWORD (from eveai-pgadmin-admin) ``` ### Phase 8: RedisInsight Tool Deployment ### Phase 9: Ops Jobs Invocation (if required) Run the DB ops scripts manually in order. Each manifest uses generateName; use kubectl create. ```bash kubectl create -f scaleway/manifests/base/applications/ops/jobs/00-env-check-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=env-check --timeout=600s kubectl create -f scaleway/manifests/base/applications/ops/jobs/02-db-bootstrap-ext-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-bootstrap-ext --timeout=1800s kubectl create -f scaleway/manifests/base/applications/ops/jobs/03-db-migrate-public-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-public --timeout=1800s kubectl create -f scaleway/manifests/base/applications/ops/jobs/04-db-migrate-tenant-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-migrate-tenant --timeout=3600s kubectl create -f scaleway/manifests/base/applications/ops/jobs/05-seed-or-init-data-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-seed-or-init --timeout=1800s kubectl create -f scaleway/manifests/base/applications/ops/jobs/06-verify-minimal-job.yaml kubectl wait --for=condition=complete job -n eveai-staging -l job-type=db-verify-minimal --timeout=900s ``` ### Phase 10: Application Services Deployment ## Verification and Testing ### Check Infrastructure Status ```bash # Verify ingress controller kubectl get pods -n ingress-nginx kubectl describe service ingress-nginx-controller -n ingress-nginx # Verify cert-manager kubectl get pods -n cert-manager kubectl get clusterissuers # Check certificate status (may take a few minutes to issue) kubectl describe certificate evie-staging-tls -n eveai-staging ``` ### Test Services ```bash # Get external IP from LoadBalancer kubectl get svc -n ingress-nginx ingress-nginx-controller # Test HTTPS access (replace with your domain) curl -k https://evie-staging.askeveai.com/verify/health curl -k https://evie-staging.askeveai.com/verify/info # Test monitoring (if deployed) kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80 # Access Grafana at http://localhost:3000 (admin/admin123) ``` ## DNS Configuration ### Update DNS Records - Create A-record pointing to LoadBalancer external IP - Or set up CNAME if using CDN ### Test Domain Access ```bash # Test domain resolution nslookup evie-staging.askeveai.com # Test HTTPS access via domain curl https://evie-staging.askeveai.com/verify/ ``` ## CDN Setup (Bunny.net - Optional) ### Configure Pull Zone - Create Pull zone: evie-staging - Origin: https://[LoadBalancer-IP] (note HTTPS!) - Host header: evie-staging.askeveai.com - Force SSL: Enabled ### Update DNS for CDN - Change A-record to CNAME pointing to CDN endpoint - Or update A-record to CDN IP ## Key Differences from Old Setup ### Advantages of New Modular Approach 1. **Modular Structure**: Separate infrastructure from applications 2. **Environment Management**: Easy staging/production separation 3. **HTTPS-First**: TLS certificates managed automatically 4. **Monitoring Integration**: Prometheus/Grafana via Helm charts 5. **Scaleway Integration**: Managed services secrets support 6. **Maintainability**: Clear separation of concerns ### Migration Benefits - **Organized**: Base configurations with environment overlays - **Scalable**: Easy to add new services or environments - **Secure**: HTTPS-only from deployment start - **Observable**: Built-in monitoring stack - **Automated**: Less manual intervention required ## Troubleshooting ### Common Issues ```bash # Certificate not issued kubectl describe certificate evie-staging-tls -n eveai-staging kubectl logs -n cert-manager deployment/cert-manager # Ingress not accessible kubectl describe ingress eveai-staging-ingress -n eveai-staging kubectl logs -n ingress-nginx deployment/ingress-nginx-controller # Check events for issues kubectl get events -n eveai-staging --sort-by='.lastTimestamp' ``` For detailed troubleshooting, refer to the main deployment guide: `documentation/scaleway-deployment-guide.md`