Staff ML Infrastructure Engineer
Playlab
Location
Remote
Employment Type
Full time
Location Type
Remote
Department
Engineering
Compensation
- Base Salary $180K – $240K
About Playlab
Playlab is a tech non-profit dedicated to helping educators and students become critical consumers and creators of AI.
We believe that an open-source, community-driven approach is key to harnessing the potential of AI in education. We equip communities with AI tools and hands-on professional development that empowers educators & students to build custom AI apps for their unique context. Over 60,000 educators have published apps on Playlab – and the impact is growing every day.
At Playlab, we believe that AI is a new design material - one that should be shaped by many to bring their ideas about learning to life. If you're passionate about building creative, equitable futures for students and teachers, we hope you’ll join us.
The Role
Playlab seeks a Staff Machine Learning Engineer to join our growing Engineering team. As a Staff ML Infrastructure Engineer, you'll be designing the systems that keep AI accessible as we grow - balancing cutting-edge capabilities with cost efficiency, powering research into what works in educational AI, and building toward a future where sophisticated AI can run anywhere in the world.
Examples of the work
Build data pipelines that scrub PII, create research datasets, and power the research portal for educational AI studies
Architect the path toward self-hosted and on-device model deployments for privacy and global accessibility
Design and implement model orchestration systems that intelligently route requests across multiple AI providers (OpenAI, Anthropic, AWS Bedrock, open-source models)
Build cost optimization infrastructure - implement conversation compression, prompt caching, and smart model selection to keep AI accessible
Create comprehensive observability systems for ML operations - track costs, latency, quality, and usage patterns across thousands of applications
Design and implement infrastructure for fine-tuning and deploying custom models
Build monitoring and alerting systems that help us maintain reliability as AI interactions scale
And more…
Expectations
Design, build, and maintain production ML infrastructure that balances performance, cost, and reliability
Own data quality and research dataset creation - ensure data is properly scrubbed, documented, and useful for research partners
Stay on top of ML infrastructure technologies and techniques - from model serving to cost optimization to observability tools
Work cross-functionally with ML engineers, backend engineers, and product to ensure infrastructure supports real needs
Balance innovation with operational excellence - experiment with new approaches while maintaining system reliability and data quality
Mentor engineers on ML operations, cost optimization, and production ML best practices
Qualifications
7+ years building production ML/data systems, with experience in ML operations and infrastructure
Strong experience with model serving, orchestration, and optimization in production environments
Proficient in Python and data pipeline technologies (Airflow, ETL tools, etc.)
Experience with cloud infrastructure (AWS preferred) and containerization (Kubernetes, Docker)
Experience with cost optimization strategies for LLM-based systems
Thrive in high-agency, high collaboration cultures
Great communication that makes working remote-first work
Bonus Points For...
Experience in education or building in edtech
Experience with educational technology or mission-driven organizations
Experience with designing creative platforms
Experience with LiteLLM or similar model routing frameworks
Background in privacy-preserving ML or PII handling
Experience building research data infrastructure
Contributions to open source ML infrastructure projects
Technologies
Python, AWS, Kubernetes, Docker, Airflow, LiteLLM, PostgreSQL, Neo4J, Vector Databases, Terraform, Monitoring tools (New Relic, OpenTelemetry)
Compensation Range: $180K - $240K