We are seeking two senior ML Platform Engineers to join a high-impact team building scalable, cloud-native services that operationalize machine learning workflows. This role is ideal for someone who blends backend/service engineering with a strong AWS infrastructure background, has hands-on experience with Databricks, and thrives in a complex, less structured, fast-paced environment. You’ll be working alongside architects and DevOps, supporting the integration of ML pipelines and services into a robust platform ecosystem.
You will be part of a 6–7 person cross-functional team, including 4 platform engineers, 1 DevOps engineer, and data scientists. You’ll receive high-level task goals (e.g., “build a configuration service”) and will be expected to research, architect, and implement solutions with minimal hand-holding. Strong problem-solving mindset, curiosity, and self-drive are crucial. You’ll be encouraged to challenge assumptions and explore better approaches. The project builds upon an existing POC, but integration into a complex platform ecosystem is the key challenge ahead.
Responsibilities:
- Build AWS-connected services to orchestrate, scale, and support ML workloads, using TypeScript and Python.
- Design and implement CI/CD pipelines using GitHub Actions, enforcing high test coverage and deployment automation.
- Support Databricks-based ML workflows, including provisioning, configuration, and resource optimization.
- Integrate services into the broader customer platform, ensuring compatibility with authentication, data flow, and API architecture.
- Participate in service design discussions, ensuring that implementations support high concurrency, scalability, and fault tolerance.
- Collaborate with DevOps to ensure infrastructure is provisioned correctly (e.g., CloudFormation, Terraform for Databricks).
- Validate and iterate on architectural plans with internal architects while being capable of autonomous research and decision-making on implementation-level concerns.
- Engage with internal teams (data scientists, ML engineers, DevOps) to deliver robust, production-ready infrastructure and services.
Requirements
Must-have skills:
- 8+ years of experience in software/platform engineering with focus on AWS.
- Strong proficiency in TypeScript (Node.js services) and Python (for data/ML workflow scripting).
- Experience working with Databricks (jobs, clusters, configurations).
- Knowledge of Terraform (especially for Databricks provisioning) and CloudFormation (for AWS infra setup).
- Solid understanding of MLOps fundamentals—from model orchestration to serving and monitoring.
- Familiarity with AWS services like Lambda, Step Functions, S3, IAM, VPC but not only.
- Practical experience with CI/CD pipelines, 100% unit testing coverage, and GitHub Actions.
- Experience designing services that handle high-concurrency traffic and scalable workloads.
Nice-to-Have:
- Understanding of Jupyter-based model development and what it takes to production such workflows.
- Experience with service orchestration involving large-scale event processing or configuration management.
- Previous work integrating services into multi-tenant SaaS platforms.