At CloudGeometry, we're redefining how modern data and AI systems are built. As a leading cloud-native engineering firm, we work with pioneering technology companies to deliver high-impact solutions across infrastructure, machine learning, and intelligent applications.

We are looking for a highly skilled AI Infrastructure Engineers x 5 people to join our growing team supporting large-scale AI/ML systems. This is a hands-on engineering role focused on building scalable, secure, and production-ready infrastructure that powers ML workflows end-to-end—from experimentation to deployment and monitoring.


What You’ll Do

  • Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments.
  • Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows.
  • Develop APIs and backend services in TypeScript and Python to support model integration and orchestration.
  • Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred).
  • Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines.
  • Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features.
  • Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets.
  • Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines.



Why Join CloudGeometry?

You’ll work alongside top-tier engineers across the US, LATAM, and Europe on cutting-edge projects in AI, cloud, and enterprise SaaS. We value deep technical curiosity, strong collaboration, and a bias for action in solving meaningful problems.

  • Seniority Level

    Mid-Senior level

  • Industry

    • Software Development

  • Employment Type

    Full-time

  • Job Functions

    • Engineering
    • Information Technology

  • Skills

    • Large Language Models (LLM)
    • Software as a Service (SaaS)
    • Databricks Products
    • Python (Programming Language)
    • Infrastructure
    • TypeScript
    • MLflow
    • MLOps
    • Amazon Web Services (AWS)




Requirements

What We’re Looking For

  • 7+ years in software or infrastructure engineering with proven experience supporting AI/ML systems.
  • Deep hands-on experience with AWS services and modern IaC practices (Terraform/CDK).
  • Strong backend programming skills in TypeScript and Python.
  • Production-level use of MLFlow for model management and deployment.
  • Expertise in containerization (Docker), CI/CD automation, and orchestration tools.
  • Solid understanding of designing scalable and secure systems in cloud-native environments.
  • Strong communication skills, able to bridge gaps between engineering and product stakeholders.
  • Comfortable in fast-paced, collaborative environments working across time zones.


Nice to Have

  • Exposure to LLM infrastructure and frameworks (e.g., DSPy, LangChain).
  • Knowledge of LLM performance metrics: latency, cost monitoring, and usage optimization.
  • Familiarity with semantic search tools and vector stores (e.g., OpenSearch, Pinecone).


Benefits

  • Remote anywhere
  • Coworking space financial coverage
  • Flexible working hours
  • B2B with multiple benefits 
  • Paid days off annually
  • Workspace program: 2500$ for work equipment of your choice.
  • Paid courses and certifications: example AWS, CKA, ML certifications
  • Participation at international conferences: like CNCF Summits, Kubecon, others