Skip to main content

Modern Platform Engineering Tools - The Essential DevOps Stack for 2024

· 10 min read
Hariprasath Ravichandran
Senior Platform Engineer @ CData

Platform engineering has emerged as a critical discipline that bridges development and operations, focusing on building internal developer platforms (IDPs) that enable teams to ship software faster and more reliably. In this comprehensive guide, we'll explore the cutting-edge tools and practices that define modern platform engineering in 2024.


Disclaimer: Kubernetes®, Docker®, Terraform®, AWS®, Azure®, Google Cloud®, ArgoCD™, and other product names mentioned in this article are trademarks of their respective owners. All logos and trademarks are used for representation purposes only. No prior copyright or trademark authorization has been obtained. This content is for educational purposes only.


What is Platform Engineering?

Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. It's about creating "golden paths" that reduce cognitive load while maintaining flexibility.

Core Principles

🎯 Developer Experience First - Reduce friction and complexity
🔄 Self-Service - Enable teams to be autonomous
🔧 Automation - Eliminate manual toil
📊 Observability - Make systems transparent
🔒 Security by Default - Shift left on security

1. Infrastructure as Code (IaC)

Terraform: The Industry Standard

Terraform has become the de facto standard for infrastructure provisioning across multiple cloud providers.

# Modern Terraform with best practices
terraform {
required_version = ">= 1.6"

required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}

backend "s3" {
bucket = "terraform-state-prod"
key = "infrastructure/eks-cluster"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}

# VPC Module
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"

name = "production-vpc"
cidr = "10.0.0.0/16"

azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

enable_nat_gateway = true
single_nat_gateway = false
enable_dns_hostnames = true

tags = {
Environment = "production"
ManagedBy = "terraform"
}
}

# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"

cluster_name = "production-eks"
cluster_version = "1.28"

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

eks_managed_node_groups = {
general = {
desired_size = 3
min_size = 2
max_size = 10

instance_types = ["t3.large"]
capacity_type = "ON_DEMAND"

labels = {
workload = "general"
}
}
}

tags = {
Environment = "production"
}
}

Pulumi: Infrastructure as Real Code

For teams preferring general-purpose languages:

import * as aws from "@pulumi/aws";
import * as eks from "@pulumi/eks";

// Create VPC
const vpc = new aws.ec2.Vpc("production-vpc", {
cidrBlock: "10.0.0.0/16",
enableDnsHostnames: true,
tags: { Name: "production-vpc" },
});

// Create EKS Cluster
const cluster = new eks.Cluster("production-eks", {
vpcId: vpc.id,
instanceType: "t3.medium",
desiredCapacity: 3,
minSize: 2,
maxSize: 10,
version: "1.28",
});

export const kubeconfig = cluster.kubeconfig;

OpenTofu: Open-Source Terraform Alternative

With Terraform's license change, OpenTofu has emerged as the open-source alternative:

# Install OpenTofu
brew install opentofu

# Compatible with Terraform syntax
tofu init
tofu plan
tofu apply

2. GitOps: Declarative Infrastructure

ArgoCD: Kubernetes GitOps Leader

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-app
namespace: argocd
spec:
project: default

source:
repoURL: https://github.com/company/k8s-manifests
targetRevision: main
path: apps/production

# Helm values
helm:
valueFiles:
- values-production.yaml
parameters:
- name: image.tag
value: "v1.2.3"

destination:
server: https://kubernetes.default.svc
namespace: production

syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false

syncOptions:
- CreateNamespace=true
- PruneLast=true

retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m

Flux CD: Alternative GitOps Solution

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: production-repo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/company/k8s-manifests
ref:
branch: main
secretRef:
name: git-credentials
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: production-apps
namespace: flux-system
spec:
interval: 10m
sourceRef:
kind: GitRepository
name: production-repo
path: ./apps/production
prune: true
wait: true
timeout: 5m

3. CI/CD: Modern Pipeline Tools

GitHub Actions: Integrated CI/CD

name: Production Deployment

on:
push:
branches: [main]
tags: ['v*']

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-test:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=semver,pattern={{version}}
type=sha,prefix={{branch}}-

- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Run tests
run: |
docker run --rm ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
npm test

deploy:
needs: build-and-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'

steps:
- name: Update K8s manifests
run: |
git clone https://${GITHUB_TOKEN}@github.com/company/k8s-manifests
cd k8s-manifests
yq e -i '.spec.template.spec.containers[0].image = "${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}"' \
apps/production/deployment.yaml
git add .
git commit -m "Update image to ${{ github.sha }}"
git push

Tekton: Kubernetes-Native Pipelines

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: build-and-deploy
spec:
params:
- name: git-url
- name: image-name

workspaces:
- name: shared-workspace

tasks:
- name: fetch-repository
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-workspace
params:
- name: url
value: $(params.git-url)

- name: build-image
taskRef:
name: buildah
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
params:
- name: IMAGE
value: $(params.image-name)

- name: deploy-to-k8s
taskRef:
name: kubernetes-actions
runAfter:
- build-image
params:
- name: script
value: |
kubectl set image deployment/app app=$(params.image-name)
kubectl rollout status deployment/app

4. Service Mesh: Traffic Management

Istio: Feature-Rich Service Mesh

# Virtual Service for Canary Deployment
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-route
spec:
hosts:
- reviews
http:
- match:
- headers:
user-agent:
regex: ".*Chrome.*"
route:
- destination:
host: reviews
subset: v2
weight: 100
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
# Destination Rule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
loadBalancer:
simple: LEAST_REQUEST
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2

Linkerd: Lightweight Alternative

# Linkerd Traffic Split for Canary
apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
name: api-canary
spec:
service: api
backends:
- service: api-stable
weight: 90
- service: api-canary
weight: 10

5. Observability Stack

The Three Pillars

Metrics → Logs → Traces

Prometheus + Grafana: Metrics

# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-metrics
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
# PrometheusRule for Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
spec:
groups:
- name: app
interval: 30s
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value | humanizePercentage }}"

Loki: Log Aggregation

# Promtail configuration
clients:
- url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
pipeline_stages:
- docker: {}
- json:
expressions:
level: level
message: message
- labels:
level:

Jaeger/Tempo: Distributed Tracing

# OpenTelemetry Collector
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:
timeout: 10s
send_batch_size: 1024

memory_limiter:
check_interval: 1s
limit_mib: 512

exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true

prometheus:
endpoint: "0.0.0.0:8889"

service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]

6. Secret Management

HashiCorp Vault

# Vault Kubernetes Auth
path "auth/kubernetes/login" {
capabilities = ["create", "read"]
}

# Policy for application secrets
path "secret/data/production/app/*" {
capabilities = ["read", "list"]
}

# Dynamic database credentials
path "database/creds/app-readonly" {
capabilities = ["read"]
}

External Secrets Operator

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "https://vault.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "app-role"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: production/app/database
property: password

7. Policy as Code

Open Policy Agent (OPA)

package kubernetes.admission

# Deny pods without resource limits
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.resources.limits
msg := sprintf("Container %v must have resource limits", [container.name])
}

# Require non-root user
deny[msg] {
input.request.kind.kind == "Pod"
not input.request.object.spec.securityContext.runAsNonRoot
msg := "Pods must run as non-root user"
}

# Deny privileged containers
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
container.securityContext.privileged
msg := sprintf("Container %v cannot run in privileged mode", [container.name])
}

Kyverno: Kubernetes Native Policies

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
spec:
validationFailureAction: enforce
rules:
- name: check-labels
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Labels 'app' and 'env' are required"
pattern:
metadata:
labels:
app: "?*"
env: "?*"
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-network-policy
spec:
rules:
- name: default-deny
match:
any:
- resources:
kinds:
- Namespace
generate:
kind: NetworkPolicy
name: default-deny
namespace: "{{request.object.metadata.name}}"
data:
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

8. Cost Management

OpenCost: Kubernetes Cost Monitoring

apiVersion: v1
kind: ConfigMap
metadata:
name: opencost-config
data:
default.json: |
{
"provider": "aws",
"description": "AWS Cost Configuration",
"CPU": "0.031611",
"spotCPU": "0.006655",
"RAM": "0.004237",
"spotRAM": "0.000892",
"GPU": "0.95",
"storage": "0.00005479452",
"zoneNetworkEgress": "0.01",
"regionNetworkEgress": "0.01",
"internetNetworkEgress": "0.12"
}

Kubecost Integration

# Install Kubecost
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="YOUR_TOKEN"

# View cost allocation
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

9. Developer Portals

Backstage: Internal Developer Platform

# app-config.yaml
app:
title: FreeOps Developer Portal
baseUrl: https://portal.freeops.io

backend:
baseUrl: https://portal.freeops.io
cors:
origin: https://portal.freeops.io
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}

catalog:
rules:
- allow: [Component, System, API, Resource, Location]
locations:
- type: url
target: https://github.com/company/software-catalog/blob/main/catalog-info.yaml

Port: Developer Portal Alternative

# port-entity.yaml
identifier: payment-service
title: Payment Service
blueprint: microservice
properties:
language: golang
team: payments
tier: critical
lifecycle: production
runtime: kubernetes
relations:
dependsOn:
- postgres-db
- redis-cache

10. Platform Automation

Crossplane: Universal Control Plane

apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
name: production-db
spec:
forProvider:
region: us-east-1
dbInstanceClass: db.t3.medium
masterUsername: admin
allocatedStorage: 100
engine: postgres
engineVersion: "14.7"
skipFinalSnapshot: false
writeConnectionSecretToRef:
name: db-credentials
namespace: production
providerConfigRef:
name: aws-provider

Emerging Tools to Watch

🔥 Kratix - Framework for building platforms
🔥 Score - Workload specification standard
🔥 Dapr - Distributed application runtime
🔥 Telepresence - Local development for K8s
🔥 Garden - Development orchestration
🔥 Tilt - Multi-service development

Building Your Platform Stack

┌─────────────────────────────────────┐
│ Developer Portal (Backstage) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ GitOps (ArgoCD/Flux) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Kubernetes (EKS/GKE/AKS) │
│ + Service Mesh (Istio/Linkerd) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Observability (Prometheus/Loki) │
│ + Tracing (Jaeger/Tempo) │
└─────────────────────────────────────┘

Conclusion

Platform engineering is about creating paved roads that make the right way the easy way. The tools landscape is rich and constantly evolving, but the principles remain constant:

Automate everything - Reduce manual toil
Self-service - Enable developer autonomy
Observability - Make systems transparent
Security - Shift left on security practices
Standards - Create golden paths
Documentation - Make knowledge accessible

Next Steps

  1. Assess current state - Understand pain points
  2. Define platform vision - Set clear goals
  3. Start small - Pick one area to improve
  4. Measure success - Track DORA metrics
  5. Iterate - Continuously improve based on feedback

The future of platform engineering is bright, with tools becoming more powerful and accessible. Start building your platform today and empower your teams to ship software faster and more reliably.


Which platform engineering tools are you excited about? Share your experiences and challenges in the comments!