Data Engineer I
DESCRIPTION
External job description
Amazon Music is an immersive audio entertainment service that deepens connections between fans, artists, and creators. From personalized music playlists to exclusive podcasts, concert livestreams to artist merch, Amazon Music is innovating at some of the most exciting intersections of music and culture. We offer experiences that serve all listeners with our different tiers of service: Prime members get access to all the music in shuffle mode, and top ad-free podcasts, included with their membership; customers can upgrade to Amazon Music Unlimited for unlimited, on-demand access to 100 million songs, including millions in HD, Ultra HD, and spatial audio; and anyone can listen for free by downloading the Amazon Music app or via Alexa-enabled devices. Join us for the opportunity to influence how Amazon Music engages fans, artists, and creators on a global scale.
If you love the challenges that come with big data then this role is for you. We collect billions of events a day, manage petabyte scale data on Redshift and S3, and develop data pipelines using Spark/Scala EMR, SQL based ETL, and Java services.
You are a talented, enthusiastic, and detail-oriented Data Engineer, Data Science, Business Intelligence, or Software Development who knows how to take on big data challenges in an agile way. Duties include big data design and analysis, data modeling, and development, deployment, and operations of big data pipelines. You will also help hire, mentor, and develop peers in the the Music Data Experience team including Data Scientists, Data Engineers, and Software Engineers. You'll help build Amazon Music's most important data pipelines and data sets, and expand self-service data knowledge and capabilities through an Amazon Music data university.
This role requires you to live at the cross section of data and engineering. You have a deep understanding of data, analytical techniques, and how to connect insights to the business, and you have practical experience in insisting on highest standards on operations in ETL and big data pipelines. With our Amazon Music Unlimited and Prime Music services, and our top music provider spot on the Alexa platform, providing high quality, high availability data to our internal customers is critical to our customer experiences.
Music Data Experience team develops data specifically for a set of key business domains like personalization and marketing and provides and protects a robust self-service core data experience for all internal customers. We deal in AWS technologies like Redshift, S3, EMR, EC2, DynamoDB, Kinesis Firehose, and Lambda. In 2020 your team will migrate Amazon Music's information model and data pipelines to a data exchange store (Data Lake) and EMR/Spark processing layer. You'll build our data university and partner with Product, Marketing, BI, and ML teams to build new behavioral events, pipelines, datasets, models, and reporting to support their initiatives. You'll also continue to develop big data pipelines.
Amazon Music
Imagine being a part of an agile team where your ideas have the potential to reach millions. Picture working on cutting-edge consumer-facing products, where every single team member is a critical voice in the decision-making process. Envision being able to leverage the resources of a Fortune-500 company within the atmosphere of a start-up. Welcome to Amazon Music, where ideas are born and come to life as Amazon Music Unlimited, Prime Music, and so much more.
Everyone on our team has a meaningful impact on product features, new directions in music streaming, and customer engagement. We are looking for new team members across a variety of job functions including software engineering/development, marketing, design, ops and more. Come join us as we make history by launching exciting new projects in the coming year.
Our team is focused on building a personalized, curated, and seamless music experience. We want to help our customers discover up-and-coming artists, while also having access to their favorite established musicians. We build systems that are distributed on a large scale, spanning our music apps, web player, and voice-forward audio engagement on mobile and Amazon Echo devices, powered by Alexa to support our customer base. Amazon Music offerings are available in countries around the world, and our applications support our mission of delivering music to customers in new and exciting ways that enhance their day-to-day lives.
Come innovate with the Amazon Music team!
Key job responsibilities
Build Data Platform and Data Lake solutions
Build Data Engineering tools
Build real time and micro batch data pipelines
About the team
The Music Data eXperience (MDX) team is responsible for the definition, design, production, and quality of foundational datasets consumed by the whole org, data management tools, and the self-service data lake and warehouse platforms on which these datasets are published, stored, shared, and consumed for analytics and science modeling. MDX is split into two sub teams *PARAM* (Platform Architecture Research and AutoMation) and *IDEA* (Intelligence, Data Engineering & Analytics). Data Platform (PARAM) team owns the self-service data lake Data EXchange Store (DEX) and Data Warehouse platforms, builds tools and frameworks for efficient data management, and owns the orchestration and configuration platform for data pipelines. Data Engineering (IDEA) Team owns the foundational data model and datasets, the Spark and Datanet ETL jobs and business logic to build them, away team support for datasets, org wide launch support (when required), the Executive Daily Summary (EDS), and future batch dataset data quality frameworks.
BASIC QUALIFICATIONS
- 1+ years of data engineering experience
- Experience with SQL
- Experience with data modeling, warehousing and building ETL pipelines
- Experience with one or more query language (e.g., SQL, PL/SQL, DDL, MDX, HiveQL, SparkSQL, Scala)
- Experience with one or more scripting language (e.g., Python, KornShell)
PREFERRED QUALIFICATIONS
- Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
- Experience with any ETL tool like, Informatica, ODI, SSIS, BODI, Datastage, etc.