all systems operational// portfolio — pune · remote · utc+5:30

RISHABH GUPTA

Multi-cloud Kubernetes platforms on AWS, Azure & GCP — codified with Terraform, delivered by GitOps, secured with Vault, watched by Prometheus.

0+
years
0
clouds
0+
services
0%
faster MTTR

// 01 — whoami

Principal DevOps Engineer building enterprise-grade cloud platforms

I'm a Principal DevOps Engineer with 7+ years of experience designing and operating multi-cloud Kubernetes platforms on AWS, Azure, and GCP. I own the infrastructure-as-code, deployment, and observability systems that let engineering teams ship quickly and safely.

I built the Terraform module library and Terraform Cloud workflows that provision everything from EKS node groups to private PostgreSQL, and a service deployment framework that standardizes how 50+ microservices receive Vault secrets, Istio mesh, RBAC, and Argo CD delivery — onboarding a new service is a single module call.

My approach is simple: codify everything, secure by default, and make the right thing the easy thing. The result is infrastructure that is reproducible, auditable, and boring to operate — in the best possible way.

rg@prod-cluster:~

Multi-Cloud Architecture

Architecting production infrastructure across AWS EKS, Azure AKS, and GCP GKE with Terraform Cloud workspaces and unified state management.

Platform Engineering

Building reusable service deployment frameworks for 50+ microservices with Istio mesh, Vault secrets, RBAC, and technology-aware configurations.

Infrastructure as Code

Maintaining a modular Terraform codebase with GitLab VCS integration and automated plan/apply pipelines via Terraform Cloud.

Security & Compliance

Implementing HashiCorp Vault with Kubernetes auth backends, per-namespace policies, Trivy container scanning, and Calico network policies.

Observability

Enterprise monitoring with Prometheus, Grafana (custom dashboards), AlertManager routing to PagerDuty and Slack, and OpenSearch logging.

Data Infrastructure

Managing production databases: Azure PostgreSQL with private endpoints, GCP Cloud SQL, Memorystore Redis HA, and MongoDB replica sets.

// 02 — stack --list

The stack I run production on — every tile is in active daily use

Cloud Platforms

Production workloads on all three majors

AWS

eks · rds · msk · ecr

Microsoft Azure

aks · postgres · vnet

Google Cloud

gke · cloud sql · memorystore

tools in production0+

Container & Orchestration

KubernetesDockerHelmArgo CDIstio

CI/CD & Infrastructure as Code

TerraformGitLab CIGitHub ActionsJenkinsVault

Monitoring & Observability

PrometheusGrafanaOpenSearchELK StackTrivy

Databases & Messaging

PostgreSQLMongoDBMySQLRedisRabbitMQ

Platform & Tools

LinuxNginxGit

// 03 — projects --featured

KubernetesCloudTerraform

Multi-Cloud Kubernetes Platform (EKS / AKS / GKE)

Production Kubernetes clusters on AWS EKS, Azure AKS, and GCP GKE — provisioned end-to-end with Terraform Cloud and operated with a consistent security and networking baseline across all three clouds.

Problem

The organization needed to run workloads across three cloud providers simultaneously — AWS for primary production, Azure for regional clients, and GCP for specialized services — each with its own networking, identity, and compliance requirements.

Architecture

Cloud-specific Terraform modules per layer: VPC/VNet networking with VPC peering (aws_vpc_peering_connection for cross-VPC access to HashiCorp Vault), EKS/AKS/GKE cluster provisioning with Calico network policies, managed node groups with dynamic taints via for_each, and Kubernetes objects layer deploying cert-manager, Traefik ingress, Prometheus monitoring, and Argo CD via Helm releases.

Outcome

Unified multi-cloud platform serving production traffic across 3 clouds with consistent security posture, automated provisioning via Terraform Cloud VCS-triggered runs, and environment parity from staging through production.

Terraform CloudAWS EKSAzure AKSGCP GKECalico+2
TerraformCloud

Enterprise Terraform Module Library

A reusable Terraform module library covering service deployments, infrastructure provisioning, Vault integration, and monitoring — managed through Terraform Cloud workspaces with GitLab VCS-triggered runs.

Problem

Rapid growth from a handful of services to 50+ microservices across multiple clouds created unsustainable infrastructure complexity. Each team provisioned resources ad-hoc, resulting in configuration drift and security gaps.

Architecture

Layered module structure: /modules/infrastructure (eks-node-group, msk, ecr-account-sharing), /modules/kubernetes-objects (service-deployment with Argo CD, Istio, RBAC, Vault auth), /modules/shared (cross-environment variables). Terraform Cloud workspaces per environment (aws-production-infrastructure, azure-staging-kubernetes-objects) with VCS-triggered plans from GitLab.

Outcome

Onboarding a new microservice reduced from days to a single Terraform module call with validated inputs. Zero-drift guarantee via Terraform Cloud auto-apply with workspace-level state isolation and RBAC.

TerraformTerraform CloudGitLab VCSHashiCorp VaultAWS+2
GitOpsSecurityKubernetes

GitOps Pipeline with Argo CD & Vault Secrets

End-to-end GitOps delivery with Argo CD and HashiCorp Vault: every deployment flows through Git, and every secret is scoped to its namespace via Kubernetes auth backend roles.

Problem

Manual kubectl deployments with secrets stored in Kubernetes opaque secrets led to configuration drift, leaked credentials in git, and no audit trail for production changes.

Architecture

Argo CD deployed via helm_release with GitLab OAuth (Dex) for SSO and Slack notifications. Per-service Vault integration using vault_kubernetes_auth_backend_role bound to service accounts, vault_policy with granular KV paths (kv/data/{project}/{env}/{namespace}/*), and token-based access scoped to namespaces.

Outcome

Complete secrets lifecycle management with zero secrets in git, automated rotation, and per-namespace access control. Deployment audit trail through GitLab commits and Argo CD sync history with Slack alerting.

Argo CDHashiCorp VaultTerraformGitLabKubernetes+1
KubernetesSecurity

Observability Stack: Prometheus, Grafana & Trivy

A Terraform-managed observability platform: kube-prometheus-stack, custom Grafana dashboards, Trivy container scanning, and severity-based alert routing to Slack and PagerDuty.

Problem

No unified monitoring across clusters meant incidents were discovered by customers before the engineering team. Security vulnerabilities in container images went undetected until production incidents.

Architecture

Terraform module (modules/kubernetes-objects/prometheus-grafana) deploying kube-prometheus-stack v51.2.0 with templatefile()-driven Helm values. Grafana with GitLab OAuth SSO, auto-provisioned datasources and dashboards (CoreDNS, Costs, Kubernetes, Logging, MongoDB). Trivy Operator v0.24.1 in dedicated namespace with Grafana dashboard integration. AlertManager routing to Slack channels by severity and PagerDuty for critical alerts.

Outcome

MTTR reduced by 70% with proactive alerting, 100% container image scanning coverage, and dashboards covering infrastructure costs, cluster health, application metrics, and security vulnerabilities.

PrometheusGrafanaTrivyAlertManagerPagerDuty+2
KubernetesHelmTerraform

Service Deployment Framework with Istio & RBAC

A single Terraform module that turns a new microservice into a production-ready deployment — namespace, RBAC, Vault auth, Argo CD application, Istio, autoscaling, and network policies — driven by a handful of typed inputs.

Problem

50+ microservices across PHP, Node, Golang, React, and Angular stacks needed standardized deployment patterns while allowing per-service customization for scaling, networking, and security requirements.

Architecture

Single module (modules/kubernetes-objects/service-deployment) with variable-driven feature toggles: istioEnabled, usesVault, requireHpas, requirePdbs, requireNetworkPolicies. Environment map (staging/production/productionUk) drives Vault paths, Sentry configs, Argo labels, and transit encryption backends. Technology map configures language-specific Sentry platforms and team routing.

Outcome

Standardized deployment pattern for all 50+ services with validated input (environment/technology enums), eliminating misconfigurations. New service deployment takes one module call with ~5 required variables.

TerraformKubernetesIstioVaultArgo CD+2
CloudTerraformHA

Multi-Cloud Database Architecture

Production database infrastructure across Azure, GCP, and AWS — private endpoints, enforced TLS, HA failover, and Vault-managed credentials, all provisioned and secured with Terraform.

Problem

Database provisioning was manual, inconsistent across environments, and lacked proper network isolation. No standardized backup policies, TLS enforcement, or private connectivity.

Architecture

Azure: PostgreSQL Flexible Server with TLS 1.2 minimum, private endpoints via azurerm_private_endpoint, auto-grow storage, and Vault-sourced credentials. GCP: Cloud SQL with authorized networks, Memorystore Redis HA with STANDARD_HA tier and dedicated failover zones. AWS: EKS-adjacent RDS with VPC peering for Vault access. All credentials fetched from Vault at plan time.

Outcome

Zero-trust database access with private endpoints, encrypted connections, Vault-managed credentials, and automated provisioning. Database setup time reduced from days to minutes with consistent security baselines.

PostgreSQLMySQLRedisTerraformAzure Private Endpoints+2
More on GitHub

// 04 — career --history

My journey building cloud infrastructure and DevOps culture

Principal DevOps Engineer

October 2018 - Present|Pune, Maharashtra, India
Client Projects:GoDaddyPoyntOrmae
  • Architect and operate a multi-cloud Kubernetes platform across AWS EKS, Azure AKS, and GCP GKE, serving production workloads for enterprise clients worldwide
  • Built an enterprise Terraform module library that fully automates infrastructure provisioning via Terraform Cloud workspaces and GitLab VCS integration, cutting new-service onboarding from days to a single module call
  • Designed a service deployment framework supporting 50+ microservices (PHP, Node, Golang, React, Angular) with Istio service mesh, RBAC, and Vault-based secrets management
  • Implemented GitOps workflows with Argo CD, GitLab OAuth SSO, and automated Slack notifications, giving every production change a complete audit trail
  • Built the observability stack — Prometheus, Grafana, Trivy container scanning, and AlertManager routing to PagerDuty and Slack — reducing incident MTTR by 70%
  • Managed HashiCorp Vault clusters with Kubernetes auth backends, per-namespace policies, and transit encryption for multi-environment secrets lifecycle
  • Architected multi-cloud database infrastructure: Azure PostgreSQL with private endpoints, GCP Cloud SQL, Memorystore Redis HA, and AWS RDS — all Terraform-managed with Vault credentials
  • Deployed cert-manager, Traefik ingress controllers, and Calico network policies across all clusters for TLS automation and zero-trust networking

Cloud Support Engineer

March 2018 - August 2018|Hyderabad, Telangana, India
  • Implemented AWS infrastructure using EC2, S3, EBS, ELB, Route53, and VPC for IaaS workloads
  • Deployed containerized applications using Docker and Docker Swarm orchestration
  • Configured monitoring servers using Nagios/Icinga for proactive alerting and incident detection
  • Set up GCP services including Compute Engine, Cloud SQL, Cloud CDN, and firewall rules
  • Managed automated backups to S3 for MongoDB, Jenkins, and monitoring tools
  • Implemented network security with GCP Firewall rules and AWS Security Groups

// 05 — systems --design

System design patterns I use to build reliable, scalable infrastructure

Multi-Cloud Kubernetes Platform

Production-grade K8s clusters across AWS, Azure, and GCP with unified Terraform management, consistent security, and cross-cloud networking.

Terraform Cloud
VCS-Triggered PlansWorkspace IsolationRemote StateRBAC
Network Layer
VPC PeeringCalico PoliciesPrivate EndpointsSubnet Isolation
Cluster Layer
EKS (AWS)AKS (Azure)GKE (GCP)Node Auto-Scaling
Platform Layer
Argo CDcert-managerTraefikIstio Mesh

Observability & Security Stack

End-to-end monitoring, alerting, and security scanning across all clusters with centralized dashboards and automated incident response.

Collection
PrometheusFluentd / LokiTrivy OperatorCustom Metrics
Storage
Prometheus TSDBOpenSearchGrafana Datasources
Visualization
Grafana DashboardsCoreDNS MetricsCost AnalyticsSecurity Reports
Alerting
AlertManagerPagerDutySlack ChannelsSentry Integration

Secrets & Identity Architecture

HashiCorp Vault-based secrets management with Kubernetes auth, per-namespace policies, and transit encryption across all environments.

Identity
Vault K8s AuthService AccountsGitLab OAuth / DexRBAC Roles
Secrets Engine
KV v2 SecretsTransit EncryptionTOTP EngineDynamic Creds
Policy
Per-Namespace PoliciesEnvironment ScopingLeast PrivilegeAudit Logging
Delivery
Vault Agent InjectTerraform Data SourcesSealed SecretsCert-Manager

Service Deployment Pipeline

Standardized deployment framework for 50+ microservices with technology-aware configs, feature toggles, and automated GitOps delivery.

Source
GitLab ReposTerraform ModulesHelm ValuesKustomize Overlays
Build
GitLab CIDocker BuildECR / ACR PushTrivy Scan
Deploy
Argo CD SyncRolling UpdatesCanary / Blue-GreenPDB-Safe Rollouts
Runtime
HPA Auto-ScaleIstio Traffic MgmtNetwork PoliciesVault Secrets

// cat infra/*.tf

Production-ready configuration examples from real infrastructure setups

1# Multi-Cloud EKS Node Group Module with Dynamic Taints
2# Based on actual production Terraform patterns
3
4resource "aws_eks_node_group" "main" {
5  for_each = var.subnetIds
6
7  cluster_name    = var.clusterName
8  node_group_name = "${var.namePrefix}-${each.key}"
9  node_role_arn   = var.roleArn
10  ami_type        = var.amiType
11  capacity_type   = var.capacityType
12
13  subnet_ids     = [each.value]
14  instance_types = var.instanceTypes
15
16  launch_template {
17    version = aws_launch_template.main.latest_version
18    id      = aws_launch_template.main.id
19  }
20
21  scaling_config {
22    desired_size = var.initialSize == null ? var.minSize : var.initialSize
23    max_size     = var.maxSize
24    min_size     = var.minSize
25  }
26
27  labels = var.labels
28
29  lifecycle {
30    ignore_changes = [scaling_config[0].desired_size]
31  }
32
33  dynamic "taint" {
34    for_each = var.noScheduleTaints
35    content {
36      effect = "NO_SCHEDULE"
37      key    = taint.key
38      value  = taint.value
39    }
40  }
41
42  tags = merge({ Name = "${var.namePrefix}-${each.key}" }, local.mergedTags)
43}
44
45# Terraform Cloud Backend with VCS Integration
46terraform {
47  cloud {
48    organization = "my-org"
49    workspaces { name = "aws-production-infrastructure" }
50  }
51}

// git remote -v

Open source projects and infrastructure code

// 06 — connect --now

Open to senior platform and DevOps opportunities, consulting, and conversations about cloud architecture.

Location

Pune, India (open to remote)

Response time

Usually within 24 hours