Skip to main content

Data Engineer III, Music Data Lake

Job ID: 2684328 | ADCI - Karnataka

DESCRIPTION

Amazon Music is an immersive audio entertainment service that deepens connections between fans, artists, and creators. From personalized music playlists to exclusive podcasts, concert livestreams to artist merch, Amazon Music is innovating at some of the most exciting intersections of music and culture. We offer experiences that serve all listeners with our different tiers of service: Prime members get access to all the music in shuffle mode, and top ad-free podcasts, included with their membership; customers can upgrade to Amazon Music Unlimited for unlimited, on-demand access to 100 million songs, including millions in HD, Ultra HD, and spatial audio; and anyone can listen for free by downloading the Amazon Music app or via Alexa-enabled devices. Join us for the opportunity to influence how Amazon Music engages fans, artists, and creators on a global scale.

If you love the challenges that come with big data then this role is for you. We collect billions of events a day, manage petabyte scale data on Redshift and S3, and develop data pipelines using Spark/Scala EMR, SQL based ETL, and Java services.

You are a talented, enthusiastic, and detail-oriented Data Engineer, Data Science, Business Intelligence, or Software Development who knows how to take on big data challenges in an agile way. Duties include big data design and analysis, data modeling, and development, deployment, and operations of big data pipelines. You will also help hire, mentor, and develop peers in the the Music Data Experience team including Data Scientists, Data Engineers, and Software Engineers. You'll help build Amazon Music's most important data pipelines and data sets, and expand self-service data knowledge and capabilities through an Amazon Music data university.

This role requires you to live at the cross section of data and engineering. You have a deep understanding of data, analytical techniques, and how to connect insights to the business, and you have practical experience in insisting on highest standards on operations in ETL and big data pipelines. With our Amazon Music Unlimited and Prime Music services, and our top music provider spot on the Alexa platform, providing high quality, high availability data to our internal customers is critical to our customer experiences.


Music Data Experience team develops data specifically for a set of key business domains like personalization and marketing and provides and protects a robust self-service core data experience for all internal customers. We deal in AWS technologies like Redshift, S3, EMR, EC2, DynamoDB, Kinesis Firehose, and Lambda. In 2020 your team will migrate Amazon Music's information model and data pipelines to a data exchange store (Data Lake) and EMR/Spark processing layer. You'll build our data university and partner with Product, Marketing, BI, and ML teams to build new behavioral events, pipelines, datasets, models, and reporting to support their initiatives. You'll also continue to develop big data pipelines.

The successful candidate will work with multiple global site leaders, Business Analysts, Business Intelligence Engineers, Software Developers, Database Engineers, Product Management in addition to stakeholders in sales, finance, marketing and service teams to create a coherent customer view. They will:
- Define and lead the data strategy of various analytical products owned with Amazon Music team.
- Develop and improve the current data architecture using AWS Redshift, AWS S3, and Hadoop/EMR/Spark.
- Improve upon the data ingestion models, ETL jobs, and alarming to maintain data integrity and data availability.
- Stay up-to-date with advances in data persistence and big data technologies and run pilots to design the data architecture to scale with the increased data sets of advertiser experience.
- Design and manage data models that serve multiple releases/feature launches and other business critical reporting.

Key job responsibilities
Build Data Platform and Data Lake solutions
Build Data Engineering tools
Build real time and micro batch data pipelines
Build and manage Data Pipelines

About the team
Data Engineering (IDEA) Team owns the foundational data model and datasets for Amazon Music, the Spark and Datanet ETL jobs and business logic to build them, org wide launch support (when required), the Executive Daily Summary (EDS), and batch dataset data quality and SLAs

BASIC QUALIFICATIONS

- 5+ years of data engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Experience with SQL
- Experience in at least one modern scripting or programming language, such as Python, Java, Scala, or NodeJS
- Experience mentoring team members on best practices

PREFERRED QUALIFICATIONS

- Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
- Experience operating large data warehouses