Professional Summary
Staff Site Reliability Engineer with 10+ years building large-scale CI/CD platforms and Kubernetes infrastructure. Lead SRE for Datadog's CI infrastructure processing 13M+ builds monthly, saving $7M annually. Expert in platform engineering, multi-region Kubernetes (5,000+ nodes), incident command, and cost optimization. Core Incident Commander leading enterprise-wide outage response.
Technical Skills
Core
Kubernetes, Docker, AWS, GCP, Python, Go, Java, Terraform, GitLab, Jenkins, CI/CD Platforms
Distributed Systems
Apache Kafka, gRPC, Service Mesh (Istio, Envoy), Event-Driven Architecture, Microservices
Observability
Prometheus, Grafana, Datadog, ELK Stack, SLI/SLO/SLA, Incident Management, On-Call Operations
Security
DevSecOps, Vault, PCI, SOC 2, FedRAMP, HIPAA, Policy-as-Code
Certifications
CKAD - Linux Foundation, 2020 | AWS Solutions Architect - Professional | AWS Solutions Architect - Associate | AWS Developer - Associate
Professional Experience
Staff Software Engineer, CI Infrastructure
Datadog
- Engineer and evolve custom CI platform in Go/Python processing 13M+ builds/month—implementing advanced features including distributed task scheduling, smart caching layers, and real-time build analytics serving 1,000+ engineers across 100+ teams
- Architect and implement multi-tenancy framework in Go using namespace isolation, resource quotas, and pod security policies—enabling secure build isolation across 100+ teams while maintaining 99.95% platform availability and preventing resource contention
- Continuously optimize CI infrastructure costs through intelligent workload placement algorithms, spot instance orchestration, and resource right-sizing automation—maintaining $7M annual savings while scaling platform to handle 30% growth in build volume
- Design and implement caching improvements and build optimization strategies reducing cold start times by 40%, including Docker layer optimization, artifact reuse across pipelines, and predictive cache warming based on historical build patterns
- Lead technical design reviews for 15+ RFCs covering CI infrastructure improvements, platform evolution, and architectural decisions—providing code-level feedback and prototyping proof-of-concepts to validate feasibility before team implementation
- Drive consensus across 30+ engineering teams on CI strategies, monorepo architecture, and repository patterns through cross-team collaboration
- Serve as Core Incident Commander for enterprise-wide severe outages (since 2023), debugging production issues across distributed systems, implementing automated remediation, and leading post-incident improvements
Senior Software Engineer, CI Infrastructure
Datadog
- Built and maintained CI/CD infrastructure processing 13M builds/month, achieving 99.95%+ uptime SLA through automated failover and graceful degradation strategies
- Reduced annual CI infrastructure costs from $10M to $3M (70% reduction, $7M savings) through intelligent node selection, resource optimization, and automated scaling strategies
- Designed and built custom enterprise CI system with task engine framework enabling reusable pipelines and smart dependency detection—foundation for company-wide standard
- Reduced pipeline execution time from 70 minutes to 7-12 minutes (up to 90% faster) through persistent runner framework, Docker image warm caching, and build impact analysis integration
- Architected build impact analysis service analyzing code changes to determine affected dependencies—eliminating unnecessary builds and reducing infrastructure waste by 60%+
- Optimized Gitaly cluster configuration to handle 10,000+ commits/day across 20GB+ monorepo, reducing Git clone times from 15+ minutes to <2 minutes through custom checkout strategies
- Engineered persistent runner framework with intelligent caching for extreme-scale Git operations, solving checkout performance challenges for massive monorepo
- Implemented Vertical Pod Autoscaler (VPA) for automatic resource sizing across CI workloads, eliminating manual tuning overhead and optimizing cluster utilization
- Became Core Incident Commander in 2023, training IC team members on incident response playbooks, simulation exercises, and best practices for high-pressure incident management
Staff Site Reliability Engineer
VMware
- Led VMware's largest SaaS Kubernetes platform: 5,000+ nodes, 100+ clusters, 99.99%+ uptime for mission-critical workloads
- Built custom Kubernetes operators in Go automating cluster lifecycle—reduced manual operations from 40 hrs/week to <12 hrs/week
- Deployed global Istio service mesh across multi-cloud with zero-trust networking, mTLS, and circuit breaking for 300+ microservices
- Maintained PCI, HIPAA, FedRAMP compliance through automated policy enforcement (OPA) and infrastructure-as-code validation
Senior Site Reliability Engineer
Toyota Connected
- Built enterprise Kubernetes platform on AWS for 80+ teams, improving availability from 99.5% to 99.9%, reducing costs 40%
- Deployed global ELK cluster with Kafka processing 3TB/day, achieving 80% cost reduction vs commercial alternatives
- Created self-service developer platform reducing provisioning time from 3 days to 15 minutes
Site Reliability Engineer Manager
Capital One
- Managed Kubernetes platform architecture spanning AWS/GCP supporting 500+ microservices as technical lead (60% IC, 40% leadership)
- Led cloud migration of 100+ microservices from on-premise to AWS/Kubernetes, completing 3 months ahead of schedule
- Reduced incident MTTR from 2.1 hours to 52 minutes through automated runbooks and enhanced observability
Senior Software Engineer
Pariveda Solutions
- Architected automated hybrid cloud solutions using AWS, Chef, and Jenkins for enterprise clients
- Built RESTful API services for cross-platform mobile applications using Java and Spring Framework
Education
University of Texas at Dallas — B.S. Computer Science, Cum Laude, GPA: 3.92
2013
Open Source & Leadership
Kubernetes 1.21 Bug Triage - CNCF Release Team Member (2021) | Core Incident Commander - Enterprise outage response leader (2023-Present) | Technical Portfolio: github.com/desponda