hero

Tech Nonprofit Job Board

Opportunities from over 600 organizations leveraging tech for social impact.

Looking to hire talent? Contact us to post your jobs here.

Elasticsearch Consultant

Organized Crime and Corruption Reporting Project

Organized Crime and Corruption Reporting Project

Amsterdam, Netherlands
Posted on Monday, May 15, 2023

Location: Remote UTC - UTC+3

Application Deadline: June 30, 2023

About OCCRP

The Organized Crime and Corruption Reporting Project (OCCRP) is a growing, global nonprofit media organization that is reinventing investigative journalism for the public good. By developing and equipping a global network of investigative journalists and publishing their stories, we expose crime and corruption so the public can hold power to account. We see a future where organized crime and corruption are drastically reduced and democracy is strengthened. Our global team includes editors, researchers, data engineers, security specialists, administrators, technologists, and strategists, each with areas of in-depth expertise.

Aleph

To support investigative journalists and member centers in their work to uncover corruption and hold power to account, OCCRP developed an investigative data platform called OCCRP Aleph. The platform serves as a central repository for exploring leaks, datasets and and data-dumps that are critical to effective investigative journalism. This platform now contains over 3 billion records and has served as a cornerstone in many global groundbreaking investigations.

The platform allows journalists to ingest documents and both structured and unstructured data, and maps that data to an ontology that we developed called Follow the Money. Users can then search across all Aleph datasets to find connections, entities and cross-reference entities between datasets to uncover wrongdoing and patterns. This is largely possible through our use of Elasticsearch (ES), which is a central part of the application enabling rapid search through records.

The platform that we’ve developed runs in a kubernetes cluster hosted on GCP. Our ES cluster comprises 21 Nodes including 3 master nodes. Each node uses a 2Tb SSD with 26Gb RAM and 4 vCPUs.

The user-facing side of Aleph is written in Python and React with a Postgres database and an accompanying suite of CLI tools.

The Project

The Challenge

Aleph has now been in use for over seven years and in that time our ES index has grown significantly. The ES index is currently 30Tb and is increasing in size all the time. Although this growth is a testament to the success of the platform and the popularity of it with our users, it has brought with it very significant running costs.

As such, the primary goal of this project is to investigate and optimize our current ES setup, in the hope of making it more space-efficient and cost effective and without impacting performance or usefulness of the platform for our users.

Where you can help:

As an ElasticSearch expert, you will work closely with the Aleph development team to:

  • Audit the current ElasticSearch implementation and propose possible tweaks and other ‘low-hanging-fruit’ changes we could make to improve cost efficiency.
  • Understand the current ES setup and implementation and how it is tied into Aleph’s core functionality.
  • Investigate avenues for potential improvement of cost efficiency.
  • Propose architectural changes (small or large) to our ES implementation or Aleph as a whole that could significantly improve cost effectiveness without compromising performance.
  • Help us implement these changes (development, planning, architecture) when we know what they are.As the expert we’d expect you to make recommendations but there are several avenues that we’ve considered:
  • Optimizing our existing configuration to try and reduce the size of our index.
  • Setting up hot/cold nodes within the index for less frequently accessed data where longer retrieval time is more acceptable.
  • Refactoring the way in which data is split across indexes to allow better more efficient searches.
  • Setting up an archive process for taking data offline (and reducing the size of our index).

What you bring to the table:

  • Significant experience working with elasticsearch and ideally with performance and cost optimization.
  • Ideally experience working with an NGO
  • Solutions-oriented outlook and collaborative mindset to bring creative solutions to the table

To Apply

If you’re interested in helping us with the project, either as an individual or as an organization we’d love to hear from you.

For Individuals

If you’re an individual interested in getting involved, please email us a copy of your CV along with a cover letter detailing your interest and daily rates to jobs[at)]occrp.org

Once we’re received this we’ll evaluate and reach out.

For Organizations

If you’re an organization looking to provide us with professional services then please email us at jobs(at)occrp.org including information on your organization, your rates, availability, and why you believe that you are best placed to help us, and we’ll arrange a time to talk.

All applications must be submitted in English. Incomplete applications will not be considered. We apologize that we will not be able to reply to unsuccessful applicants.

As an equal opportunity employer, OCCRP values having a diverse workforce and continuously strives to maintain an inclusive and equitable workplace. We offer competitive compensation and benefits and encourage people from diverse backgrounds to apply. We do not discriminate against any person based upon race, religion, color, national origin, sex, medical conditions, family status, sexual orientation, gender identity, gender expression, age, disability, genetic information, or any other legally protected characteristics. If you are a qualified applicant requiring assistance or an accommodation to complete any step of the application process, please contact hr[at]occrp.org