Machine Learning Platform Engineer

We are seeking two senior ML Platform Engineers to join a high-impact team building scalable, cloud-native services that operationalize machine learning workflows. This role is ideal for someone who blends backend/service engineering with a strong AWS infrastructure background, has hands-on experience with Databricks, and thrives in a complex, less structured, fast-paced environment. You’ll be working alongside architects and DevOps, supporting the integration of ML pipelines and services into a robust platform ecosystem.

You will be part of a 6–7 person cross-functional team, including 4 platform engineers, 1 DevOps engineer, and data scientists. You’ll receive high-level task goals (e.g., “build a configuration service”) and will be expected to research, architect, and implement solutions with minimal hand-holding. Strong problem-solving mindset, curiosity, and self-drive are crucial. You’ll be encouraged to challenge assumptions and explore better approaches. The project builds upon an existing POC, but integration into a complex platform ecosystem is the key challenge ahead.

Responsibilities:

Build AWS-connected services to orchestrate, scale, and support ML workloads, using TypeScript and Python.
Design and implement CI/CD pipelines using GitHub Actions, enforcing high test coverage and deployment automation.
Support Databricks-based ML workflows, including provisioning, configuration, and resource optimization.
Integrate services into the broader customer platform, ensuring compatibility with authentication, data flow, and API architecture.
Participate in service design discussions, ensuring that implementations support high concurrency, scalability, and fault tolerance.
Collaborate with DevOps to ensure infrastructure is provisioned correctly (e.g., CloudFormation, Terraform for Databricks).
Validate and iterate on architectural plans with internal architects while being capable of autonomous research and decision-making on implementation-level concerns.
Engage with internal teams (data scientists, ML engineers, DevOps) to deliver robust, production-ready infrastructure and services.

Requirements

Must-have skills:

8+ years of experience in software/platform engineering with focus on AWS.
Strong proficiency in TypeScript (Node.js services) and Python (for data/ML workflow scripting).
Experience working with Databricks (jobs, clusters, configurations).
Knowledge of Terraform (especially for Databricks provisioning) and CloudFormation (for AWS infra setup).
Solid understanding of MLOps fundamentals—from model orchestration to serving and monitoring.
Familiarity with AWS services like Lambda, Step Functions, S3, IAM, VPC but not only.
Practical experience with CI/CD pipelines, 100% unit testing coverage, and GitHub Actions.
Experience designing services that handle high-concurrency traffic and scalable workloads.

Nice-to-Have:

Understanding of Jupyter-based model development and what it takes to production such workflows.
Experience with service orchestration involving large-scale event processing or configuration management.
Previous work integrating services into multi-tenant SaaS platforms.

Submit Resume

Send us an application, and we’ll contact you in shortly.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Machine Learning Platform Engineer

Requirements

Get the latest news about CloudGeometry, AI Agents, GenAI, Data, Kubernetes & Application Modernization solutions in your Inbox

Email

Phone

Office