In A Nutshell
Location
Remote Anywhere in Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Kingdom, United States of America and Uruguay.
Salary
$101,102-$156,045
Job Type
Full-time
Experience Level
Entry-level
Deadline to apply
August 28, 2025
Contribute to the Data Platform Engineering team’s effort to unify data systems across the Wikimedia Foundation to deliver scalable solutions.
Responsibilities
- Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka.
- Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly.
- Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines.
- Collaborate with peers to improve and evolve the shared data platform, enabling use cases like product analytics, bot detection, and image classification.
- Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance.
Skillset
- 3+ years of data engineering experience, with exposure to on-premise systems (e.g., Spark, Hadoop, HDFS).
- Understanding of engineering best practices with a strong emphasis on writing maintainable and reliable code.
- Hands-on experience in troubleshooting systems and pipelines for performance and scaling.
- Desirable: Exposure to architectural/system design or technical ownership.
- Desirable: Experience in data governance, data lineage, and data quality initiatives.Working experience with data pipeline tools like Airflow, Kafka, Spark, and Hive.
- Proficient in Python or Java/Scala, with working knowledge of development tools and its ecosystem.
- Knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto).
- Working knowledge of CI/CD processes and software containerization.