Kubernetes Clusters¶
Kubernetes cluster configurations and management for the RCIIS DevOps platform.
Cluster Types¶
Local Development Clusters¶
- Technology: Kind (Kubernetes in Docker)
- Purpose: Developer workstations and local testing
- Configuration: Single or multi-node clusters
- Networking: Cilium or Calico CNI options
Testing Clusters (Proxmox/Talos)¶
- Technology: Talos Linux on Proxmox VE
- Purpose: Automated testing and CI/CD integration
- Configuration: 3 control plane + 3 worker nodes
- Networking: Cilium CNI with kube-proxy replacement
- Provisioning: OpenTofu/Terraform IaC
- Documentation: See Proxmox Talos Guide
Staging Clusters¶
- Purpose: Pre-production testing and validation
- Configuration: Production-like setup and scale
- Data: Sanitized production data copies
- Access: Controlled access for testing teams
Production Clusters (Planned)¶
- Purpose: Live production workloads
- Configuration: High availability and resilience
- Security: Enhanced security controls and monitoring
- Compliance: Regulatory and audit requirements
Cluster Configuration¶
Kind Cluster Setup¶
Basic Configuration:
# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: rciis-local
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
Multi-Node Setup:
# Create multi-node cluster
kind create cluster --config kind-config.yaml --name rciis-local
# Verify cluster
kubectl cluster-info --context kind-rciis-local
kubectl get nodes
CNI Configuration¶
Cilium Installation:
# Install Cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set nodeinit.enabled=true \
--set kubeProxyReplacement=partial \
--set hostServices.enabled=false \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set image.pullPolicy=IfNotPresent \
--set ipam.mode=kubernetes
Calico Alternative:
# Install Calico
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# Verify installation
kubectl get pods -n kube-system -l k8s-app=calico-node
Cluster Management¶
Node Management¶
Adding Nodes:
# Scale worker nodes in Kind
kind get clusters
docker ps | grep rciis-local
# For cloud clusters, use cluster autoscaler
kubectl apply -f cluster-autoscaler.yaml
Node Maintenance:
# Drain node for maintenance
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Uncordon node after maintenance
kubectl uncordon <node-name>
# Remove node from cluster
kubectl delete node <node-name>
Resource Management¶
Resource Quotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: development
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
persistentvolumeclaims: "10"
Limit Ranges:
apiVersion: v1
kind: LimitRange
metadata:
name: pod-limit-range
namespace: development
spec:
limits:
- default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container
Storage Configuration¶
Storage Classes¶
Local Development:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: rancher.io/local-path
parameters:
nodePath: /opt/local-path-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
Production Storage:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-ssd
replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
Persistent Volumes¶
Static Provisioning:
apiVersion: v1
kind: PersistentVolume
metadata:
name: database-pv
spec:
capacity:
storage: 20Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: fast-ssd
hostPath:
path: /mnt/data/database
Network Configuration¶
Service Mesh Preparation¶
Istio Integration (Future):
# Install Istio operator
istioctl install --set values.defaultRevision=default
# Enable sidecar injection
kubectl label namespace nucleus istio-injection=enabled
Linkerd Integration (Alternative):
# Install Linkerd
linkerd install | kubectl apply -f -
# Inject proxy
kubectl get deployment nucleus -o yaml | linkerd inject - | kubectl apply -f -
Network Policies¶
Default Deny:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: nucleus
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Security Configuration¶
Pod Security Standards¶
Restricted Policy:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Security Contexts¶
Container Security:
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
Monitoring and Maintenance¶
Cluster Health Monitoring¶
Health Checks:
# Check cluster components
kubectl get componentstatuses
# Check node health
kubectl describe nodes
# Check system pods
kubectl get pods -n kube-system
# Check cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
Maintenance Procedures¶
Cluster Updates:
# Update Kind cluster
kind delete cluster --name rciis-local
kind create cluster --config kind-config.yaml --name rciis-local
# Update Kubernetes version
kubeadm upgrade plan
kubeadm upgrade apply v1.28.0
Backup Procedures:
# Backup cluster state
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml
# Backup etcd (for managed clusters)
kubectl exec -n kube-system etcd-master -- etcdctl snapshot save /tmp/etcd-backup.db
Troubleshooting¶
Common Issues¶
Node Not Ready:
# Check node status
kubectl describe node <node-name>
# Check kubelet logs
journalctl -u kubelet
# Restart kubelet
systemctl restart kubelet
Pod Scheduling Issues:
# Check scheduler logs
kubectl logs -n kube-system deployment/kube-scheduler
# Check resource availability
kubectl describe node <node-name>
# Check taints and tolerations
kubectl describe node <node-name> | grep Taints
Diagnostic Commands¶
# Cluster information
kubectl cluster-info
kubectl version
# Resource usage
kubectl top nodes
kubectl top pods --all-namespaces
# Network connectivity
kubectl exec -it <pod-name> -- ping <target-ip>
kubectl exec -it <pod-name> -- nslookup kubernetes.default
Best Practices¶
Cluster Design¶
- Separation of Concerns: Separate clusters for different environments
- Resource Planning: Adequate capacity planning and monitoring
- Security First: Apply security controls from the start
- Backup Strategy: Regular backups and disaster recovery testing
Operational Excellence¶
- Monitoring: Comprehensive monitoring and alerting
- Automation: Automated deployment and scaling
- Documentation: Maintain current operational documentation
- Training: Regular team training and knowledge sharing
For specific cluster configurations, refer to the setup scripts in the /scripts directory.