Files
eveAI/documentation/scaleway-deployment-guide.md

355 lines
10 KiB
Markdown

# EveAI Scaleway Deployment Guide
## Overview
This guide covers the deployment of EveAI to Scaleway Kubernetes using a modular Kustomize structure with Helm integration for monitoring services.
## Architecture
### Managed Services (Scaleway)
- **PostgreSQL**: Database service
- **Redis**: Message broker and cache
- **MinIO**: Object storage (S3-compatible)
- **Secret Manager**: Secure storage for secrets
### Kubernetes Services
- **Infrastructure**: Ingress Controller, Cert-Manager, TLS certificates
- **Applications**: EveAI services (app, api, workers, etc.)
- **Monitoring**: Prometheus, Grafana, Pushgateway (via Helm)
- **Verification**: Permanent cluster health monitoring service
## Directory Structure
```
scaleway/manifests/
├── base/ # Base configurations
│ ├── infrastructure/ # Core infrastructure
│ │ ├── 00-namespaces.yaml # Namespace definitions
│ │ ├── 01-ingress-controller.yaml # NGINX Ingress Controller
│ │ ├── 02-cert-manager.yaml # Cert-Manager setup
│ │ └── 03-cluster-issuers.yaml # Let's Encrypt issuers
│ ├── applications/ # Application services
│ │ └── verification/ # Verification service
│ │ ├── 00-configmaps.yaml # HTML content and nginx config
│ │ ├── 01-deployment.yaml # Deployment specification
│ │ └── 02-service.yaml # Service definition
│ ├── monitoring/ # Monitoring stack (Helm)
│ │ ├── kustomization.yaml # Helm chart integration
│ │ └── values-monitoring.yaml # Prometheus stack values
│ ├── secrets/ # Secret definitions
│ │ └── scaleway-secrets.yaml # Scaleway Secret Manager integration
│ └── networking/ # Network configuration
│ └── ingress-https.yaml # HTTPS-only ingress
└── overlays/ # Environment-specific configs
├── staging/ # Staging environment
│ └── kustomization.yaml # Staging overlay
└── production/ # Production environment (future)
└── kustomization.yaml # Production overlay
```
## Prerequisites
### 1. Scaleway Setup
- Kubernetes cluster running
- Managed services configured:
- PostgreSQL database
- Redis instance
- MinIO object storage
- Secrets stored in Scaleway Secret Manager:
- `eveai-app-keys`
- `eveai-mistral`
- `eveai-object-storage`
- `eveai-openai`
- `eveai-postgresql`
- `eveai-redis`
- `eveai-redis-certificate`
### 2. Local Tools
```bash
# Install required tools
kubectl version --client
kustomize version
helm version
# Install Kustomize Helm plugin
kubectl kustomize --enable-helm
```
### 3. Cluster Access
```bash
# Configure kubectl for Scaleway cluster
scw k8s kubeconfig install <cluster-id>
kubectl cluster-info
```
## Deployment Process
### Phase 1: Infrastructure Foundation
Deploy core infrastructure components in order:
```bash
# 1. Deploy namespaces
kubectl apply -f scaleway/manifests/base/infrastructure/00-namespaces.yaml
# 2. Deploy ingress controller
kubectl apply -f scaleway/manifests/base/infrastructure/01-ingress-controller.yaml
# Wait for ingress controller to be ready
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=300s
# 3. Install cert-manager CRDs (required first)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.crds.yaml
# 4. Deploy cert-manager
kubectl apply -f scaleway/manifests/base/infrastructure/02-cert-manager.yaml
# Wait for cert-manager to be ready
kubectl wait --namespace cert-manager \
--for=condition=ready pod \
--selector=app.kubernetes.io/name=cert-manager \
--timeout=300s
# 5. Deploy cluster issuers
kubectl apply -f scaleway/manifests/base/infrastructure/03-cluster-issuers.yaml
```
### Phase 2: Monitoring Stack
Deploy monitoring services using Helm integration:
```bash
# Add Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Deploy monitoring stack via Kustomize + Helm
kubectl kustomize --enable-helm scaleway/manifests/base/monitoring/ | kubectl apply -f -
# Verify monitoring deployment
kubectl get pods -n monitoring
kubectl get pvc -n monitoring
```
### Phase 3: Application Services
Deploy verification service and secrets:
```bash
# Deploy secrets (update with actual Scaleway Secret Manager values first)
kubectl apply -f scaleway/manifests/base/secrets/scaleway-secrets.yaml
# Deploy verification service
kubectl apply -k scaleway/manifests/base/applications/verification/
# Deploy HTTPS ingress
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
```
### Phase 4: Complete Staging Deployment
Deploy everything using the staging overlay:
```bash
# Deploy complete staging environment
kubectl apply -k scaleway/manifests/overlays/staging/
# Verify deployment
kubectl get all -n eveai-staging
kubectl get ingress -n eveai-staging
kubectl get certificates -n eveai-staging
```
## Verification and Testing
### 1. Check Infrastructure
```bash
# Verify ingress controller
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx
# Verify cert-manager
kubectl get pods -n cert-manager
kubectl get clusterissuers
# Check certificate status
kubectl describe certificate evie-staging-tls -n eveai-staging
```
### 2. Test Verification Service
```bash
# Get external IP
kubectl get svc -n ingress-nginx
# Test HTTPS access (replace with actual IP/domain)
curl -k https://evie-staging.askeveai.com/verify/health
curl -k https://evie-staging.askeveai.com/verify/info
```
### 3. Monitor Services
```bash
# Check monitoring stack
kubectl get pods -n monitoring
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Access Grafana at http://localhost:3000
# Default credentials: admin/admin123
```
## Secret Management
### Updating Secrets from Scaleway Secret Manager
The secrets in `scaleway-secrets.yaml` use template placeholders. To use actual values:
1. **Manual approach**: Replace template values with actual secrets
2. **Automated approach**: Use a secret management tool like External Secrets Operator
Example manual update:
```bash
# Replace template placeholders like:
# password: "{{ .Values.database.password }}"
#
# With actual values from Scaleway Secret Manager:
# password: "actual-password-from-scaleway"
```
### Recommended: External Secrets Operator
For production, consider using External Secrets Operator to automatically sync from Scaleway Secret Manager:
```bash
# Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
```
## Monitoring and Observability
### Grafana Dashboards
- **URL**: `https://evie-staging.askeveai.com/monitoring` (when ingress path is configured)
- **Credentials**: admin/admin123 (change in production)
- **Pre-configured**: EveAI-specific dashboards in `/EveAI` folder
### Prometheus Metrics
- **Internal URL**: `http://monitoring-prometheus:9090`
- **Scrapes**: Kubernetes metrics, application metrics, Scaleway managed services
### Pushgateway
- **Internal URL**: `http://monitoring-pushgateway:9091`
- **Usage**: For batch job metrics from EveAI workers
## Troubleshooting
### Common Issues
1. **Certificate not issued**
```bash
kubectl describe certificate evie-staging-tls -n eveai-staging
kubectl logs -n cert-manager deployment/cert-manager
```
2. **Ingress not accessible**
```bash
kubectl describe ingress eveai-staging-ingress -n eveai-staging
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
```
3. **Monitoring stack issues**
```bash
kubectl logs -n monitoring deployment/monitoring-prometheus-server
kubectl get pvc -n monitoring # Check storage
```
4. **Secret issues**
```bash
kubectl get secrets -n eveai-staging
kubectl describe secret database-secrets -n eveai-staging
```
### Useful Commands
```bash
# View all resources in staging
kubectl get all -n eveai-staging
# Check resource usage
kubectl top pods -n eveai-staging
kubectl top nodes
# View logs
kubectl logs -f deployment/verify-service -n eveai-staging
# Port forwarding for local access
kubectl port-forward -n eveai-staging svc/verify-service 8080:80
```
## Scaling and Updates
### Scaling Services
```bash
# Scale verification service
kubectl scale deployment verify-service --replicas=3 -n eveai-staging
# Update image
kubectl set image deployment/verify-service nginx=nginx:1.21-alpine -n eveai-staging
```
### Rolling Updates
```bash
# Update using Kustomize
kubectl apply -k scaleway/manifests/overlays/staging/
# Check rollout status
kubectl rollout status deployment/verify-service -n eveai-staging
```
## Production Deployment
For production deployment:
1. Create `scaleway/manifests/overlays/production/kustomization.yaml`
2. Update domain names and certificates
3. Adjust resource limits and replicas
4. Use production Let's Encrypt issuer
5. Configure production monitoring and alerting
```bash
# Production deployment
kubectl apply -k scaleway/manifests/overlays/production/
```
## Security Considerations
1. **Secrets**: Use Scaleway Secret Manager integration
2. **TLS**: HTTPS-only with automatic certificate renewal
3. **Network**: Ingress-based routing with proper annotations
4. **RBAC**: Kubernetes role-based access control
5. **Images**: Use specific tags, not `latest`
## Maintenance
### Regular Tasks
- Monitor certificate expiration
- Update Helm charts and container images
- Review resource usage and scaling
- Backup monitoring data
- Update secrets rotation
### Monitoring Alerts
Configure alerts for:
- Certificate expiration (< 30 days)
- Pod failures and restarts
- Resource usage thresholds
- External service connectivity
## Support
For issues and questions:
1. Check logs using kubectl commands above
2. Verify Scaleway managed services status
3. Review Kubernetes events: `kubectl get events -n eveai-staging`
4. Check monitoring dashboards for system health