Choose your language:

Principal Data Engineer – Datalab

BBC London Full-time

BBC audiences expect the best content to be available to them in a single place, personalised to their preferences and interests. At the moment this is difficult for us to achieve since our content and audience data is distributed across systems that are hard to connect. We’re also missing metadata about lots of our programmes, which makes them difficult to discover. We’re currently unable to properly engage the next generation of TV license fee payers, many of whom already have less affinity with the BBC than the rest of the UK population.

Datalab was formed to address these issues, by creating a simpler way to discover content. We are doing this by bringing all of our data together into one place, and by using machine learning to enrich it. As we do this, we become able to match our programming with individuals’ interests and context. Our approach is to build a data platform that can be extended by other BBC teams, and which allows many different products to create consistent and relevant experiences for audiences.

Individuals within the Principal Data Engineer role in Datalab lead the development of data pipelines that are scalable, repeatable, and secure, and can serve multiple users within the BBC. They help facilitate getting data from a variety of different sources, getting it in the right formats, assuring that it adheres to data quality standards, and assuring that downstream users can get that data quickly. This role usually functions as a leading member of an agile team.

Over the past year, we launched the first in-house personalised recommendation engine for BBC Sounds as well as the first ML-driven recommendation engine for BBC Sport and News short-form videos. We have many more exciting projects on the horizon, working across the BBC product portfolio.

Our team objectives are:

  • Make it easy for BBC teams to rapidly develop and deploy Machine Learning engines
  • Provide great recommendations across multiple BBC products

Main Responsibilities

We are aiming high and have an open brief to define what works best for our audience. We want to stay lean and move quickly to build, test and learn as we go so your contribution will make a difference from day one. We want everyone to feel responsible for our collective success.

You will help us create a data and machine learning environment that can scale to millions of users. You will help integrate new data sources and ensure that the code we write is robust and scalable. You have a keen interest in machine learning (but not necessarily previous experience). You are excited and knowledgeable about a tech stack that includes Google Cloud Platform, Python and Kubernetes with a commitment to micro-services and infrastructure as code.

You’ll engage with engineers working on other BBC apps and services, tapping into the wealth of knowledge and experience of an organisation already serving a vast global audience. Learning is an important part of the role, and you’ll have access to BBC Academy training programmes, along with the opportunity to attend technology conferences and use other resources to progress.

Are you the right candidate?

We would expect you to have significant experience in establishing data applications into production (ideally in a cloud environment) for millions of users, and also in coaching and managing more junior team members.

You will demonstrate having wide exposure to different data storage systems and/or machine learning algorithms. Experience with model management / algorithmic lifecycle management and involvement in the data engineering community would be strong positives.

We are looking for the following skillsets:

  • Ability to communicate with and provide leadership within multi-functional teams
  • Passion for development and data best practices
  • Track record of delivering production-ready code
  • Significant demonstrable experience of writing Python and using associate frameworks
  • Significant experience developing APIs (if possible with async experience)
  • Test-driven development approach, the experience of writing unit and functional tests.
  • Significant experience of cloud-based development, with AWS or GCP experience being most beneficial
  • Working knowledge of machine learning systems

Also desireable:

  • Effectively able to pair programme with junior and senior developers
  • Data streaming systems
  • Document source-to-target mapping
  • Build data streaming systems
  • Experience developing and optimising ETL pipelines
  • Effective data preparation for analysis
  • Implement machine learning & deploy similarity metrics (such as nearest neighbours) at scale
  • An understanding of how software design relates to overall system architecture

To apply for this job please visit