Lead DevOps Engineer
About Mastercard
an American multinational payment card services corporation headquartered in Purchase, New York.[5] It offers a range of payment transaction processing and other related-payment services (such as travel-related payments and bookings)
Description
Qualifications
Experience & Background
Years of Experience: 8–12+ years in DevOps, Site Reliability Engineering (SRE), or Platform Engineering, including senior or lead roles
Production Scale: Proven track record of architecting and operating production-grade infrastructure, specifically those supporting AI/ML workloads
Leadership: Experience translating ambiguous goals into clear plans, guiding engineers, and leading technical execution in fast-moving R&D environments
Technical Skills & Expertise
Cloud & Infrastructure
Cloud Platforms: Strong expertise in AWS, Azure, or GCP, including cloud-native services, serverless computing, and managed Kubernetes (EKS, AKS, GKE)
Infrastructure as Code (IaC): Must be an expert in Terraform and orchestration tools like Terragrunt
Containerization: Mastery of Kubernetes and Docker, including cluster management at scale, container security, and networking
AI/ML Platform Knowledge
MLOps: Experience building and scaling AI/ML infrastructure, including model registries, feature stores, and AI agents
Frameworks & Tools: Familiarity with Databricks, MLflow, LangChain, LlamaIndex, and Retrieval-Augmented Generation (RAG) techniques is highly valued
ML Workflows: Understanding of model serving, pipeline orchestration, and specific observability needs for ML workloads
CI/CD & Automation
Tools: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools
Security: Ability to build secure CI/CD systems, enforce workload isolation, and partner with security teams on access control and auditability
Scripting: Advanced skills in Bash and Python; familiarity with Go is a plus
Observability & Monitoring
Stacks: Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk, and ELK
Tuning: Ability to tune observability specifically for ML-specific use cases to ensure performance and reliability
Education
Degree: Bachelor’s degree in Computer Science, Engineering, or a related field
Soft Skills & Mindset
Problem-Solving: A systematic approach to issues, using data to select scalable and maintainable solutions
Collaboration: Strong communication skills to partner with cross-functional teams (ML engineers, software engineers, security)
Agile: Experience delivering iteratively using agile practices and managing milestones
Security Focus: Deep understanding of security best practices for MLOps, including data privacy, compliance, encryption, and secure service communication (e.g., mTLS)
Preferred (Bonus) Qualifications
Databricks: Hands-on experience with workspace administration, Unity Catalog, and Delta Lake
Advanced ML Platforms: Experience with Azure ML or SageMaker
ML Frameworks: Knowledge of TensorFlow, PyTorch, or Scikit-learn
Innovation: Experience implementing self-service platform automation or internal developer platforms (IDPs)
Certifications: Relevant certifications, personal projects, or open-source contributions are considered a plus