Skip to content

Kubernetes Clusters

Kubernetes cluster configurations and management for the RCIIS DevOps platform.

Cluster Types

Local Development Clusters

  • Technology: Kind (Kubernetes in Docker)
  • Purpose: Developer workstations and local testing
  • Configuration: Single or multi-node clusters
  • Networking: Cilium or Calico CNI options

Testing Clusters (Proxmox/Talos)

  • Technology: Talos Linux on Proxmox VE
  • Purpose: Automated testing and CI/CD integration
  • Configuration: 3 control plane + 3 worker nodes
  • Networking: Cilium CNI with kube-proxy replacement
  • Provisioning: OpenTofu/Terraform IaC
  • Documentation: See Proxmox Talos Guide

Staging Clusters

  • Purpose: Pre-production testing and validation
  • Configuration: Production-like setup and scale
  • Data: Sanitized production data copies
  • Access: Controlled access for testing teams

Production Clusters (Planned)

  • Purpose: Live production workloads
  • Configuration: High availability and resilience
  • Security: Enhanced security controls and monitoring
  • Compliance: Regulatory and audit requirements

Cluster Configuration

Kind Cluster Setup

Basic Configuration:

# kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: rciis-local
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
- role: worker

Multi-Node Setup:

# Create multi-node cluster
kind create cluster --config kind-config.yaml --name rciis-local

# Verify cluster
kubectl cluster-info --context kind-rciis-local
kubectl get nodes

CNI Configuration

Cilium Installation:

# Install Cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set nodeinit.enabled=true \
  --set kubeProxyReplacement=partial \
  --set hostServices.enabled=false \
  --set externalIPs.enabled=true \
  --set nodePort.enabled=true \
  --set hostPort.enabled=true \
  --set image.pullPolicy=IfNotPresent \
  --set ipam.mode=kubernetes

Calico Alternative:

# Install Calico
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

# Verify installation
kubectl get pods -n kube-system -l k8s-app=calico-node

Cluster Management

Node Management

Adding Nodes:

# Scale worker nodes in Kind
kind get clusters
docker ps | grep rciis-local

# For cloud clusters, use cluster autoscaler
kubectl apply -f cluster-autoscaler.yaml

Node Maintenance:

# Drain node for maintenance
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Uncordon node after maintenance
kubectl uncordon <node-name>

# Remove node from cluster
kubectl delete node <node-name>

Resource Management

Resource Quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: development
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    persistentvolumeclaims: "10"

Limit Ranges:

apiVersion: v1
kind: LimitRange
metadata:
  name: pod-limit-range
  namespace: development
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    type: Container

Storage Configuration

Storage Classes

Local Development:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: rancher.io/local-path
parameters:
  nodePath: /opt/local-path-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

Production Storage:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

Persistent Volumes

Static Provisioning:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: database-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fast-ssd
  hostPath:
    path: /mnt/data/database

Network Configuration

Service Mesh Preparation

Istio Integration (Future):

# Install Istio operator
istioctl install --set values.defaultRevision=default

# Enable sidecar injection
kubectl label namespace nucleus istio-injection=enabled

Linkerd Integration (Alternative):

# Install Linkerd
linkerd install | kubectl apply -f -

# Inject proxy
kubectl get deployment nucleus -o yaml | linkerd inject - | kubectl apply -f -

Network Policies

Default Deny:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: nucleus
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Security Configuration

Pod Security Standards

Restricted Policy:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Contexts

Container Security:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        runAsGroup: 1001
        fsGroup: 1001
      containers:
      - name: app
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL

Monitoring and Maintenance

Cluster Health Monitoring

Health Checks:

# Check cluster components
kubectl get componentstatuses

# Check node health
kubectl describe nodes

# Check system pods
kubectl get pods -n kube-system

# Check cluster events
kubectl get events --sort-by=.metadata.creationTimestamp

Maintenance Procedures

Cluster Updates:

# Update Kind cluster
kind delete cluster --name rciis-local
kind create cluster --config kind-config.yaml --name rciis-local

# Update Kubernetes version
kubeadm upgrade plan
kubeadm upgrade apply v1.28.0

Backup Procedures:

# Backup cluster state
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml

# Backup etcd (for managed clusters)
kubectl exec -n kube-system etcd-master -- etcdctl snapshot save /tmp/etcd-backup.db

Troubleshooting

Common Issues

Node Not Ready:

# Check node status
kubectl describe node <node-name>

# Check kubelet logs
journalctl -u kubelet

# Restart kubelet
systemctl restart kubelet

Pod Scheduling Issues:

# Check scheduler logs
kubectl logs -n kube-system deployment/kube-scheduler

# Check resource availability
kubectl describe node <node-name>

# Check taints and tolerations
kubectl describe node <node-name> | grep Taints

Diagnostic Commands

# Cluster information
kubectl cluster-info
kubectl version

# Resource usage
kubectl top nodes
kubectl top pods --all-namespaces

# Network connectivity
kubectl exec -it <pod-name> -- ping <target-ip>
kubectl exec -it <pod-name> -- nslookup kubernetes.default

Best Practices

Cluster Design

  1. Separation of Concerns: Separate clusters for different environments
  2. Resource Planning: Adequate capacity planning and monitoring
  3. Security First: Apply security controls from the start
  4. Backup Strategy: Regular backups and disaster recovery testing

Operational Excellence

  1. Monitoring: Comprehensive monitoring and alerting
  2. Automation: Automated deployment and scaling
  3. Documentation: Maintain current operational documentation
  4. Training: Regular team training and knowledge sharing

For specific cluster configurations, refer to the setup scripts in the /scripts directory.