- Staging cluster werkend tot op phase 6 van cluster-install.md, inclusief HTTPS, Bunny, verificatie service.
This commit is contained in:
355
documentation/scaleway-deployment-guide.md
Normal file
355
documentation/scaleway-deployment-guide.md
Normal file
@@ -0,0 +1,355 @@
|
||||
# EveAI Scaleway Deployment Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers the deployment of EveAI to Scaleway Kubernetes using a modular Kustomize structure with Helm integration for monitoring services.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Managed Services (Scaleway)
|
||||
- **PostgreSQL**: Database service
|
||||
- **Redis**: Message broker and cache
|
||||
- **MinIO**: Object storage (S3-compatible)
|
||||
- **Secret Manager**: Secure storage for secrets
|
||||
|
||||
### Kubernetes Services
|
||||
- **Infrastructure**: Ingress Controller, Cert-Manager, TLS certificates
|
||||
- **Applications**: EveAI services (app, api, workers, etc.)
|
||||
- **Monitoring**: Prometheus, Grafana, Pushgateway (via Helm)
|
||||
- **Verification**: Permanent cluster health monitoring service
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
scaleway/manifests/
|
||||
├── base/ # Base configurations
|
||||
│ ├── infrastructure/ # Core infrastructure
|
||||
│ │ ├── 00-namespaces.yaml # Namespace definitions
|
||||
│ │ ├── 01-ingress-controller.yaml # NGINX Ingress Controller
|
||||
│ │ ├── 02-cert-manager.yaml # Cert-Manager setup
|
||||
│ │ └── 03-cluster-issuers.yaml # Let's Encrypt issuers
|
||||
│ ├── applications/ # Application services
|
||||
│ │ └── verification/ # Verification service
|
||||
│ │ ├── 00-configmaps.yaml # HTML content and nginx config
|
||||
│ │ ├── 01-deployment.yaml # Deployment specification
|
||||
│ │ └── 02-service.yaml # Service definition
|
||||
│ ├── monitoring/ # Monitoring stack (Helm)
|
||||
│ │ ├── kustomization.yaml # Helm chart integration
|
||||
│ │ └── values-monitoring.yaml # Prometheus stack values
|
||||
│ ├── secrets/ # Secret definitions
|
||||
│ │ └── scaleway-secrets.yaml # Scaleway Secret Manager integration
|
||||
│ └── networking/ # Network configuration
|
||||
│ └── ingress-https.yaml # HTTPS-only ingress
|
||||
└── overlays/ # Environment-specific configs
|
||||
├── staging/ # Staging environment
|
||||
│ └── kustomization.yaml # Staging overlay
|
||||
└── production/ # Production environment (future)
|
||||
└── kustomization.yaml # Production overlay
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### 1. Scaleway Setup
|
||||
- Kubernetes cluster running
|
||||
- Managed services configured:
|
||||
- PostgreSQL database
|
||||
- Redis instance
|
||||
- MinIO object storage
|
||||
- Secrets stored in Scaleway Secret Manager:
|
||||
- `eveai-app-keys`
|
||||
- `eveai-mistral`
|
||||
- `eveai-object-storage`
|
||||
- `eveai-openai`
|
||||
- `eveai-postgresql`
|
||||
- `eveai-redis`
|
||||
- `eveai-redis-certificate`
|
||||
|
||||
### 2. Local Tools
|
||||
```bash
|
||||
# Install required tools
|
||||
kubectl version --client
|
||||
kustomize version
|
||||
helm version
|
||||
|
||||
# Install Kustomize Helm plugin
|
||||
kubectl kustomize --enable-helm
|
||||
```
|
||||
|
||||
### 3. Cluster Access
|
||||
```bash
|
||||
# Configure kubectl for Scaleway cluster
|
||||
scw k8s kubeconfig install <cluster-id>
|
||||
kubectl cluster-info
|
||||
```
|
||||
|
||||
## Deployment Process
|
||||
|
||||
### Phase 1: Infrastructure Foundation
|
||||
|
||||
Deploy core infrastructure components in order:
|
||||
|
||||
```bash
|
||||
# 1. Deploy namespaces
|
||||
kubectl apply -f scaleway/manifests/base/infrastructure/00-namespaces.yaml
|
||||
|
||||
# 2. Deploy ingress controller
|
||||
kubectl apply -f scaleway/manifests/base/infrastructure/01-ingress-controller.yaml
|
||||
|
||||
# Wait for ingress controller to be ready
|
||||
kubectl wait --namespace ingress-nginx \
|
||||
--for=condition=ready pod \
|
||||
--selector=app.kubernetes.io/component=controller \
|
||||
--timeout=300s
|
||||
|
||||
# 3. Install cert-manager CRDs (required first)
|
||||
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.crds.yaml
|
||||
|
||||
# 4. Deploy cert-manager
|
||||
kubectl apply -f scaleway/manifests/base/infrastructure/02-cert-manager.yaml
|
||||
|
||||
# Wait for cert-manager to be ready
|
||||
kubectl wait --namespace cert-manager \
|
||||
--for=condition=ready pod \
|
||||
--selector=app.kubernetes.io/name=cert-manager \
|
||||
--timeout=300s
|
||||
|
||||
# 5. Deploy cluster issuers
|
||||
kubectl apply -f scaleway/manifests/base/infrastructure/03-cluster-issuers.yaml
|
||||
```
|
||||
|
||||
### Phase 2: Monitoring Stack
|
||||
|
||||
Deploy monitoring services using Helm integration:
|
||||
|
||||
```bash
|
||||
# Add Prometheus community Helm repository
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm repo update
|
||||
|
||||
# Deploy monitoring stack via Kustomize + Helm
|
||||
kubectl kustomize --enable-helm scaleway/manifests/base/monitoring/ | kubectl apply -f -
|
||||
|
||||
# Verify monitoring deployment
|
||||
kubectl get pods -n monitoring
|
||||
kubectl get pvc -n monitoring
|
||||
```
|
||||
|
||||
### Phase 3: Application Services
|
||||
|
||||
Deploy verification service and secrets:
|
||||
|
||||
```bash
|
||||
# Deploy secrets (update with actual Scaleway Secret Manager values first)
|
||||
kubectl apply -f scaleway/manifests/base/secrets/scaleway-secrets.yaml
|
||||
|
||||
# Deploy verification service
|
||||
kubectl apply -k scaleway/manifests/base/applications/verification/
|
||||
|
||||
# Deploy HTTPS ingress
|
||||
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
|
||||
```
|
||||
|
||||
### Phase 4: Complete Staging Deployment
|
||||
|
||||
Deploy everything using the staging overlay:
|
||||
|
||||
```bash
|
||||
# Deploy complete staging environment
|
||||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||||
|
||||
# Verify deployment
|
||||
kubectl get all -n eveai-staging
|
||||
kubectl get ingress -n eveai-staging
|
||||
kubectl get certificates -n eveai-staging
|
||||
```
|
||||
|
||||
## Verification and Testing
|
||||
|
||||
### 1. Check Infrastructure
|
||||
```bash
|
||||
# Verify ingress controller
|
||||
kubectl get pods -n ingress-nginx
|
||||
kubectl get svc -n ingress-nginx
|
||||
|
||||
# Verify cert-manager
|
||||
kubectl get pods -n cert-manager
|
||||
kubectl get clusterissuers
|
||||
|
||||
# Check certificate status
|
||||
kubectl describe certificate evie-staging-tls -n eveai-staging
|
||||
```
|
||||
|
||||
### 2. Test Verification Service
|
||||
```bash
|
||||
# Get external IP
|
||||
kubectl get svc -n ingress-nginx
|
||||
|
||||
# Test HTTPS access (replace with actual IP/domain)
|
||||
curl -k https://evie-staging.askeveai.com/verify/health
|
||||
curl -k https://evie-staging.askeveai.com/verify/info
|
||||
```
|
||||
|
||||
### 3. Monitor Services
|
||||
```bash
|
||||
# Check monitoring stack
|
||||
kubectl get pods -n monitoring
|
||||
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
|
||||
|
||||
# Access Grafana at http://localhost:3000
|
||||
# Default credentials: admin/admin123
|
||||
```
|
||||
|
||||
## Secret Management
|
||||
|
||||
### Updating Secrets from Scaleway Secret Manager
|
||||
|
||||
The secrets in `scaleway-secrets.yaml` use template placeholders. To use actual values:
|
||||
|
||||
1. **Manual approach**: Replace template values with actual secrets
|
||||
2. **Automated approach**: Use a secret management tool like External Secrets Operator
|
||||
|
||||
Example manual update:
|
||||
```bash
|
||||
# Replace template placeholders like:
|
||||
# password: "{{ .Values.database.password }}"
|
||||
#
|
||||
# With actual values from Scaleway Secret Manager:
|
||||
# password: "actual-password-from-scaleway"
|
||||
```
|
||||
|
||||
### Recommended: External Secrets Operator
|
||||
|
||||
For production, consider using External Secrets Operator to automatically sync from Scaleway Secret Manager:
|
||||
|
||||
```bash
|
||||
# Install External Secrets Operator
|
||||
helm repo add external-secrets https://charts.external-secrets.io
|
||||
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
|
||||
```
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Grafana Dashboards
|
||||
- **URL**: `https://evie-staging.askeveai.com/monitoring` (when ingress path is configured)
|
||||
- **Credentials**: admin/admin123 (change in production)
|
||||
- **Pre-configured**: EveAI-specific dashboards in `/EveAI` folder
|
||||
|
||||
### Prometheus Metrics
|
||||
- **Internal URL**: `http://monitoring-prometheus:9090`
|
||||
- **Scrapes**: Kubernetes metrics, application metrics, Scaleway managed services
|
||||
|
||||
### Pushgateway
|
||||
- **Internal URL**: `http://monitoring-pushgateway:9091`
|
||||
- **Usage**: For batch job metrics from EveAI workers
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Certificate not issued**
|
||||
```bash
|
||||
kubectl describe certificate evie-staging-tls -n eveai-staging
|
||||
kubectl logs -n cert-manager deployment/cert-manager
|
||||
```
|
||||
|
||||
2. **Ingress not accessible**
|
||||
```bash
|
||||
kubectl describe ingress eveai-staging-ingress -n eveai-staging
|
||||
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
|
||||
```
|
||||
|
||||
3. **Monitoring stack issues**
|
||||
```bash
|
||||
kubectl logs -n monitoring deployment/monitoring-prometheus-server
|
||||
kubectl get pvc -n monitoring # Check storage
|
||||
```
|
||||
|
||||
4. **Secret issues**
|
||||
```bash
|
||||
kubectl get secrets -n eveai-staging
|
||||
kubectl describe secret database-secrets -n eveai-staging
|
||||
```
|
||||
|
||||
### Useful Commands
|
||||
|
||||
```bash
|
||||
# View all resources in staging
|
||||
kubectl get all -n eveai-staging
|
||||
|
||||
# Check resource usage
|
||||
kubectl top pods -n eveai-staging
|
||||
kubectl top nodes
|
||||
|
||||
# View logs
|
||||
kubectl logs -f deployment/verify-service -n eveai-staging
|
||||
|
||||
# Port forwarding for local access
|
||||
kubectl port-forward -n eveai-staging svc/verify-service 8080:80
|
||||
```
|
||||
|
||||
## Scaling and Updates
|
||||
|
||||
### Scaling Services
|
||||
```bash
|
||||
# Scale verification service
|
||||
kubectl scale deployment verify-service --replicas=3 -n eveai-staging
|
||||
|
||||
# Update image
|
||||
kubectl set image deployment/verify-service nginx=nginx:1.21-alpine -n eveai-staging
|
||||
```
|
||||
|
||||
### Rolling Updates
|
||||
```bash
|
||||
# Update using Kustomize
|
||||
kubectl apply -k scaleway/manifests/overlays/staging/
|
||||
|
||||
# Check rollout status
|
||||
kubectl rollout status deployment/verify-service -n eveai-staging
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
For production deployment:
|
||||
|
||||
1. Create `scaleway/manifests/overlays/production/kustomization.yaml`
|
||||
2. Update domain names and certificates
|
||||
3. Adjust resource limits and replicas
|
||||
4. Use production Let's Encrypt issuer
|
||||
5. Configure production monitoring and alerting
|
||||
|
||||
```bash
|
||||
# Production deployment
|
||||
kubectl apply -k scaleway/manifests/overlays/production/
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Secrets**: Use Scaleway Secret Manager integration
|
||||
2. **TLS**: HTTPS-only with automatic certificate renewal
|
||||
3. **Network**: Ingress-based routing with proper annotations
|
||||
4. **RBAC**: Kubernetes role-based access control
|
||||
5. **Images**: Use specific tags, not `latest`
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
- Monitor certificate expiration
|
||||
- Update Helm charts and container images
|
||||
- Review resource usage and scaling
|
||||
- Backup monitoring data
|
||||
- Update secrets rotation
|
||||
|
||||
### Monitoring Alerts
|
||||
Configure alerts for:
|
||||
- Certificate expiration (< 30 days)
|
||||
- Pod failures and restarts
|
||||
- Resource usage thresholds
|
||||
- External service connectivity
|
||||
|
||||
## Support
|
||||
|
||||
For issues and questions:
|
||||
1. Check logs using kubectl commands above
|
||||
2. Verify Scaleway managed services status
|
||||
3. Review Kubernetes events: `kubectl get events -n eveai-staging`
|
||||
4. Check monitoring dashboards for system health
|
||||
Reference in New Issue
Block a user