355 lines
10 KiB
Markdown
355 lines
10 KiB
Markdown
# EveAI Scaleway Deployment Guide
|
|
|
|
## Overview
|
|
|
|
This guide covers the deployment of EveAI to Scaleway Kubernetes using a modular Kustomize structure with Helm integration for monitoring services.
|
|
|
|
## Architecture
|
|
|
|
### Managed Services (Scaleway)
|
|
- **PostgreSQL**: Database service
|
|
- **Redis**: Message broker and cache
|
|
- **MinIO**: Object storage (S3-compatible)
|
|
- **Secret Manager**: Secure storage for secrets
|
|
|
|
### Kubernetes Services
|
|
- **Infrastructure**: Ingress Controller, Cert-Manager, TLS certificates
|
|
- **Applications**: EveAI services (app, api, workers, etc.)
|
|
- **Monitoring**: Prometheus, Grafana, Pushgateway (via Helm)
|
|
- **Verification**: Permanent cluster health monitoring service
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
scaleway/manifests/
|
|
├── base/ # Base configurations
|
|
│ ├── infrastructure/ # Core infrastructure
|
|
│ │ ├── 00-namespaces.yaml # Namespace definitions
|
|
│ │ ├── 01-ingress-controller.yaml # NGINX Ingress Controller
|
|
│ │ ├── 02-cert-manager.yaml # Cert-Manager setup
|
|
│ │ └── 03-cluster-issuers.yaml # Let's Encrypt issuers
|
|
│ ├── applications/ # Application services
|
|
│ │ └── verification/ # Verification service
|
|
│ │ ├── 00-configmaps.yaml # HTML content and nginx config
|
|
│ │ ├── 01-deployment.yaml # Deployment specification
|
|
│ │ └── 02-service.yaml # Service definition
|
|
│ ├── monitoring/ # Monitoring stack (Helm)
|
|
│ │ ├── kustomization.yaml # Helm chart integration
|
|
│ │ └── values-monitoring.yaml # Prometheus stack values
|
|
│ ├── secrets/ # Secret definitions
|
|
│ │ └── scaleway-secrets.yaml # Scaleway Secret Manager integration
|
|
│ └── networking/ # Network configuration
|
|
│ └── ingress-https.yaml # HTTPS-only ingress
|
|
└── overlays/ # Environment-specific configs
|
|
├── staging/ # Staging environment
|
|
│ └── kustomization.yaml # Staging overlay
|
|
└── production/ # Production environment (future)
|
|
└── kustomization.yaml # Production overlay
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
### 1. Scaleway Setup
|
|
- Kubernetes cluster running
|
|
- Managed services configured:
|
|
- PostgreSQL database
|
|
- Redis instance
|
|
- MinIO object storage
|
|
- Secrets stored in Scaleway Secret Manager:
|
|
- `eveai-app-keys`
|
|
- `eveai-mistral`
|
|
- `eveai-object-storage`
|
|
- `eveai-openai`
|
|
- `eveai-postgresql`
|
|
- `eveai-redis`
|
|
- `eveai-redis-certificate`
|
|
|
|
### 2. Local Tools
|
|
```bash
|
|
# Install required tools
|
|
kubectl version --client
|
|
kustomize version
|
|
helm version
|
|
|
|
# Install Kustomize Helm plugin
|
|
kubectl kustomize --enable-helm
|
|
```
|
|
|
|
### 3. Cluster Access
|
|
```bash
|
|
# Configure kubectl for Scaleway cluster
|
|
scw k8s kubeconfig install <cluster-id>
|
|
kubectl cluster-info
|
|
```
|
|
|
|
## Deployment Process
|
|
|
|
### Phase 1: Infrastructure Foundation
|
|
|
|
Deploy core infrastructure components in order:
|
|
|
|
```bash
|
|
# 1. Deploy namespaces
|
|
kubectl apply -f scaleway/manifests/base/infrastructure/00-namespaces.yaml
|
|
|
|
# 2. Deploy ingress controller
|
|
kubectl apply -f scaleway/manifests/base/infrastructure/01-ingress-controller.yaml
|
|
|
|
# Wait for ingress controller to be ready
|
|
kubectl wait --namespace ingress-nginx \
|
|
--for=condition=ready pod \
|
|
--selector=app.kubernetes.io/component=controller \
|
|
--timeout=300s
|
|
|
|
# 3. Install cert-manager CRDs (required first)
|
|
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.2/cert-manager.crds.yaml
|
|
|
|
# 4. Deploy cert-manager
|
|
kubectl apply -f scaleway/manifests/base/infrastructure/02-cert-manager.yaml
|
|
|
|
# Wait for cert-manager to be ready
|
|
kubectl wait --namespace cert-manager \
|
|
--for=condition=ready pod \
|
|
--selector=app.kubernetes.io/name=cert-manager \
|
|
--timeout=300s
|
|
|
|
# 5. Deploy cluster issuers
|
|
kubectl apply -f scaleway/manifests/base/infrastructure/03-cluster-issuers.yaml
|
|
```
|
|
|
|
### Phase 2: Monitoring Stack
|
|
|
|
Deploy monitoring services using Helm integration:
|
|
|
|
```bash
|
|
# Add Prometheus community Helm repository
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm repo update
|
|
|
|
# Deploy monitoring stack via Kustomize + Helm
|
|
kubectl kustomize --enable-helm scaleway/manifests/base/monitoring/ | kubectl apply -f -
|
|
|
|
# Verify monitoring deployment
|
|
kubectl get pods -n monitoring
|
|
kubectl get pvc -n monitoring
|
|
```
|
|
|
|
### Phase 3: Application Services
|
|
|
|
Deploy verification service and secrets:
|
|
|
|
```bash
|
|
# Deploy secrets (update with actual Scaleway Secret Manager values first)
|
|
kubectl apply -f scaleway/manifests/base/secrets/scaleway-secrets.yaml
|
|
|
|
# Deploy verification service
|
|
kubectl apply -k scaleway/manifests/base/applications/verification/
|
|
|
|
# Deploy HTTPS ingress
|
|
kubectl apply -f scaleway/manifests/base/networking/ingress-https.yaml
|
|
```
|
|
|
|
### Phase 4: Complete Staging Deployment
|
|
|
|
Deploy everything using the staging overlay:
|
|
|
|
```bash
|
|
# Deploy complete staging environment
|
|
kubectl apply -k scaleway/manifests/overlays/staging/
|
|
|
|
# Verify deployment
|
|
kubectl get all -n eveai-staging
|
|
kubectl get ingress -n eveai-staging
|
|
kubectl get certificates -n eveai-staging
|
|
```
|
|
|
|
## Verification and Testing
|
|
|
|
### 1. Check Infrastructure
|
|
```bash
|
|
# Verify ingress controller
|
|
kubectl get pods -n ingress-nginx
|
|
kubectl get svc -n ingress-nginx
|
|
|
|
# Verify cert-manager
|
|
kubectl get pods -n cert-manager
|
|
kubectl get clusterissuers
|
|
|
|
# Check certificate status
|
|
kubectl describe certificate evie-staging-tls -n eveai-staging
|
|
```
|
|
|
|
### 2. Test Verification Service
|
|
```bash
|
|
# Get external IP
|
|
kubectl get svc -n ingress-nginx
|
|
|
|
# Test HTTPS access (replace with actual IP/domain)
|
|
curl -k https://evie-staging.askeveai.com/verify/health
|
|
curl -k https://evie-staging.askeveai.com/verify/info
|
|
```
|
|
|
|
### 3. Monitor Services
|
|
```bash
|
|
# Check monitoring stack
|
|
kubectl get pods -n monitoring
|
|
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
|
|
|
|
# Access Grafana at http://localhost:3000
|
|
# Default credentials: admin/admin123
|
|
```
|
|
|
|
## Secret Management
|
|
|
|
### Updating Secrets from Scaleway Secret Manager
|
|
|
|
The secrets in `scaleway-secrets.yaml` use template placeholders. To use actual values:
|
|
|
|
1. **Manual approach**: Replace template values with actual secrets
|
|
2. **Automated approach**: Use a secret management tool like External Secrets Operator
|
|
|
|
Example manual update:
|
|
```bash
|
|
# Replace template placeholders like:
|
|
# password: "{{ .Values.database.password }}"
|
|
#
|
|
# With actual values from Scaleway Secret Manager:
|
|
# password: "actual-password-from-scaleway"
|
|
```
|
|
|
|
### Recommended: External Secrets Operator
|
|
|
|
For production, consider using External Secrets Operator to automatically sync from Scaleway Secret Manager:
|
|
|
|
```bash
|
|
# Install External Secrets Operator
|
|
helm repo add external-secrets https://charts.external-secrets.io
|
|
helm install external-secrets external-secrets/external-secrets -n external-secrets-system --create-namespace
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Grafana Dashboards
|
|
- **URL**: `https://evie-staging.askeveai.com/monitoring` (when ingress path is configured)
|
|
- **Credentials**: admin/admin123 (change in production)
|
|
- **Pre-configured**: EveAI-specific dashboards in `/EveAI` folder
|
|
|
|
### Prometheus Metrics
|
|
- **Internal URL**: `http://monitoring-prometheus:9090`
|
|
- **Scrapes**: Kubernetes metrics, application metrics, Scaleway managed services
|
|
|
|
### Pushgateway
|
|
- **Internal URL**: `http://monitoring-pushgateway:9091`
|
|
- **Usage**: For batch job metrics from EveAI workers
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Certificate not issued**
|
|
```bash
|
|
kubectl describe certificate evie-staging-tls -n eveai-staging
|
|
kubectl logs -n cert-manager deployment/cert-manager
|
|
```
|
|
|
|
2. **Ingress not accessible**
|
|
```bash
|
|
kubectl describe ingress eveai-staging-ingress -n eveai-staging
|
|
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
|
|
```
|
|
|
|
3. **Monitoring stack issues**
|
|
```bash
|
|
kubectl logs -n monitoring deployment/monitoring-prometheus-server
|
|
kubectl get pvc -n monitoring # Check storage
|
|
```
|
|
|
|
4. **Secret issues**
|
|
```bash
|
|
kubectl get secrets -n eveai-staging
|
|
kubectl describe secret database-secrets -n eveai-staging
|
|
```
|
|
|
|
### Useful Commands
|
|
|
|
```bash
|
|
# View all resources in staging
|
|
kubectl get all -n eveai-staging
|
|
|
|
# Check resource usage
|
|
kubectl top pods -n eveai-staging
|
|
kubectl top nodes
|
|
|
|
# View logs
|
|
kubectl logs -f deployment/verify-service -n eveai-staging
|
|
|
|
# Port forwarding for local access
|
|
kubectl port-forward -n eveai-staging svc/verify-service 8080:80
|
|
```
|
|
|
|
## Scaling and Updates
|
|
|
|
### Scaling Services
|
|
```bash
|
|
# Scale verification service
|
|
kubectl scale deployment verify-service --replicas=3 -n eveai-staging
|
|
|
|
# Update image
|
|
kubectl set image deployment/verify-service nginx=nginx:1.21-alpine -n eveai-staging
|
|
```
|
|
|
|
### Rolling Updates
|
|
```bash
|
|
# Update using Kustomize
|
|
kubectl apply -k scaleway/manifests/overlays/staging/
|
|
|
|
# Check rollout status
|
|
kubectl rollout status deployment/verify-service -n eveai-staging
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
For production deployment:
|
|
|
|
1. Create `scaleway/manifests/overlays/production/kustomization.yaml`
|
|
2. Update domain names and certificates
|
|
3. Adjust resource limits and replicas
|
|
4. Use production Let's Encrypt issuer
|
|
5. Configure production monitoring and alerting
|
|
|
|
```bash
|
|
# Production deployment
|
|
kubectl apply -k scaleway/manifests/overlays/production/
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Secrets**: Use Scaleway Secret Manager integration
|
|
2. **TLS**: HTTPS-only with automatic certificate renewal
|
|
3. **Network**: Ingress-based routing with proper annotations
|
|
4. **RBAC**: Kubernetes role-based access control
|
|
5. **Images**: Use specific tags, not `latest`
|
|
|
|
## Maintenance
|
|
|
|
### Regular Tasks
|
|
- Monitor certificate expiration
|
|
- Update Helm charts and container images
|
|
- Review resource usage and scaling
|
|
- Backup monitoring data
|
|
- Update secrets rotation
|
|
|
|
### Monitoring Alerts
|
|
Configure alerts for:
|
|
- Certificate expiration (< 30 days)
|
|
- Pod failures and restarts
|
|
- Resource usage thresholds
|
|
- External service connectivity
|
|
|
|
## Support
|
|
|
|
For issues and questions:
|
|
1. Check logs using kubectl commands above
|
|
2. Verify Scaleway managed services status
|
|
3. Review Kubernetes events: `kubectl get events -n eveai-staging`
|
|
4. Check monitoring dashboards for system health |