Data Engineer

Utrecht, Netherlands

Blendle

Advanced relocation package

Adaptation tips

Flight ticket

Housing search assistance

Visa services

About Blendle

Blendle is the biggest platform of premium journalism in the Netherlands.

Blendle is basically an iTunes for Dutch newspapers, an app and a website where readers can buy articles.

Blendle fulfils an important task Journalism is a vital component of any society, but even the greatest newspapers and magazines continuously show a decrease in sales. Blendle offers a new revenue stream which will help ensure the continuity of great journalism.

Position

Journalism is changing, and we are part of that change. Nobody really knows where it’s going. Everything new thing we try is an experiment. The experiments we do, we like to do based on informed decisions. And to make those decisions, we use data, lots of data. The data is collected by the clients and the back-end, but this is raw data. And raw data is only good for filling disk space.

We have an awesome ETL pipeline, which is based on PySpark, runs on Google Dataproc and writes to an Amazon Redshift database, so we can visualize and explore the data using Looker. The ETL pipeline combines data from more than 30 tables and buckets, including an event stream of at least 4m events per day. That data needs to be structured, because with that data we need to get insights and make decisions that will influence the way that people all around the world experience journalism.

We’re looking for a data engineer who has experience with dimensional modelling and big data. Neither is a must, but you should at least have some decent experience with modelling business processes, and you should really love diving into new technologies. Moreover, we are looking for a developer who takes pride in his or her work.

The things we currently work with include:

Python
Spark
Google Dataproc
Amazon Redshift
Luigi
Looker
Kubernetes
ChartMogul
Ruby
Kafka
Go
Mixpanel
Google Analytics
BigQuery

Your main responsibilities will be:

develop and maintain the ETL process
troubleshoot problems in production
collaborate with other teams to identify information that should be available in the data warehouse
incidentally provide our controller with financial reports