At CloudGeometry, we're redefining how modern data and AI systems are built. As a leading cloud-native engineering firm, we work with pioneering technology companies to deliver high-impact solutions across infrastructure, machine learning, and intelligent applications.
We are looking for a highly skilled AI Infrastructure Engineers x 5 people to join our growing team supporting large-scale AI/ML systems. This is a hands-on engineering role focused on building scalable, secure, and production-ready infrastructure that powers ML workflows end-to-end—from experimentation to deployment and monitoring.
What You’ll Do
- Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments.
- Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows.
- Develop APIs and backend services in TypeScript and Python to support model integration and orchestration.
- Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred).
- Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines.
- Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features.
- Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets.
- Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines.
Why Join CloudGeometry?
You’ll work alongside top-tier engineers across the US, LATAM, and Europe on cutting-edge projects in AI, cloud, and enterprise SaaS. We value deep technical curiosity, strong collaboration, and a bias for action in solving meaningful problems.
Seniority Level
Mid-Senior level
Industry
- Software Development
Employment Type
Full-time
Job Functions
- Engineering
- Information Technology
Skills
- Large Language Models (LLM)
- Software as a Service (SaaS)
- Databricks Products
- Python (Programming Language)
- Infrastructure
- TypeScript
- MLflow
- MLOps
- Amazon Web Services (AWS)
Requirements
What We’re Looking For
- 7+ years in software or infrastructure engineering with proven experience supporting AI/ML systems.
- Deep hands-on experience with AWS services and modern IaC practices (Terraform/CDK).
- Strong backend programming skills in TypeScript and Python.
- Production-level use of MLFlow for model management and deployment.
- Expertise in containerization (Docker), CI/CD automation, and orchestration tools.
- Solid understanding of designing scalable and secure systems in cloud-native environments.
- Strong communication skills, able to bridge gaps between engineering and product stakeholders.
- Comfortable in fast-paced, collaborative environments working across time zones.
Nice to Have
- Exposure to LLM infrastructure and frameworks (e.g., DSPy, LangChain).
- Knowledge of LLM performance metrics: latency, cost monitoring, and usage optimization.
- Familiarity with semantic search tools and vector stores (e.g., OpenSearch, Pinecone).
Benefits
- Remote anywhere
- Coworking space financial coverage
- Flexible working hours
- B2B with multiple benefits
- Paid days off annually
- Workspace program: 2500$ for work equipment of your choice.
- Paid courses and certifications: example AWS, CKA, ML certifications
- Participation at international conferences: like CNCF Summits, Kubecon, others