In A Nutshell
Location
Remote
Salary
$106,000 - $120,000 / year
Job Type
Full-time
Experience Level
Mid-level
Deadline to apply
April 18, 2025
Rresponsible for maintaining data management systems and deploying machine learning models within those systems.
Responsibilities
Design, build, test, and maintain machine learning pipeline architectures (70%)
- Produce high-quality, reusable code for data ingestion, validation, and processing pipelines
- Architect and implement end-to-end ML pipelines including training, retraining, and inference systems for schools using the SST
- Design and build APIs to easily access, integrate, and manage data from different sources
- Ensure data infrastructure is in compliance with data governance and security policies
- Create comprehensive documentation for data infrastructure and ML pipelines, tailored for both technical and non-technical stakeholders
- Advance internal analytics reporting and automation capabilities as needed
Provide direct data support to partners (15%)
- Manage initial data lifecycle processes for new school onboarding including ingestion, transfer, audit, and validation
- Collaborate with data platform partners on integration and data transfer pipelines
- Provide technical guidance to partners on how to share data formatted in alignment with our data model and with appropriate data governance measures
- Address partner concerns regarding data security and ensure their specific requirements are satisfied
- Support data science initiatives through processing, cleaning, and analyzing data as needed
Collaborate and contribute across DataKind (15%)
- Support other data team members through code reviews and knowledge sharing across products
- Collaborate with the Product, Engineering, and Research teams to ensure seamless integration and alignment of work
- Effectively communicate project status and manage expectations with internal teams and partner organizations
- Maintain accurate and current project information in project management tools like Asana
Skillset
Required
- Alignment with DataKind’s mission and values, including our commitment to anti-racism
- Experience working across lines of difference (culture, identity, and time zone)
- At least 3 years of professional work experience in developing and deploying a machine learning product at scale
- Foundational understanding of machine learning and statistical methods for predictive modeling
- Expert in Python
- Experience with cloud computing (GCP preferred)
- Experience with databases (SQL, Postgres, PySpark, and/or other data query languages)
- Experience with DataBricks or a similar data intelligence platform
- Experience with data warehousing, orchestration, integration, and ETL tools
- Experience with modern source code management and software repository systems (i.e. Git)
- Experience documenting and implementing RESTful APIs
- Proven track record of successfully managing full life-cycle machine learning implementation projects with multiple stakeholders
- Solid understanding of Software Engineering principles and best practices and the data science project life-cycle
- Comfort and skill in communicating highly technical information to semi- and non- technical audiences
- Self-motivated, results-driven, and persistent in the face of challenges
Preferred
- Experience integrating data from SaaS providers
- Experience in the nonprofit sector and/or in a small startup organization
- Experience in scaling machine learning products, handling data quality and volume
- Certifications in cloud computing
- Advanced experience in machine learning—confident in applying, tuning, and evaluating a wide variety of algorithms
- Experience with software development and/or web-dev work (frontends, dashboards, etc.)
- Track record of strong technical writing for a variety of audiences
- Proven track record of (internal or external) client service orientation