Sr Technical Program Manager, AWS Generative AI & ML Servers
DESCRIPTION
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
Our team designs, builds and operates Amazon's fleet of Accelerated Servers using Internal Amazon design silicon or specialized purpose accelerators (EC2.TRN, INF, G, F + more instance types). We solve systemic hardware issues and we build hardware and software systems to detect and mitigate future recurrences so that our our customers can experience the highest quality of service possible!
You will be responsible for program managing the design and operations of a brand new segment of servers for the AWS fleet. You will be responsible for development of individual boards, managing third party ODM's, integrating components from within and outside of amazon and integrating firmware/software stacks ontop of the hardware to produce high quality designs for our customers.
As the end to end owner of a complex server fleet, our team works closely with partners to root cause failures and drive changes back into our current & future designs. Nothing is complete without closed loop corrective actions which drive changes back into our development processes and behavior specifications.
As a member of the AWS Hardware Engineering organization, you will apply your technical experience and work with other subject matter experts in core component development, compute server development, networking development, custom hypervisor/virtualization development and other teams.
Key job responsibilities
As a member of the Accelerated Server Hardware Engineering team you will own the development and operations of either new development accelerated servers or servers in our fleet of AI/ML products.
You will work closely with our customers to understand their technical needs and business goals and partner with development engineers within the org to architect the solutions that we will deploy at scale.
To deliver your products you will work with an interdisciplinary team of hardware design, silicon design, component, firmware, test, qualification, and integration engineers.
A day in the life
Your day to day responsibilities will include interfacing with our internal and external customers to understand project requirements and facilitate system development ontop of your server design. You will be responsible for learning operational challenges to our existing fleet with the goal of improving the current customer experience as well as developing improved systems for future designs. You will work directly with vendors and ODM/JDM design teams to develop and manufacture your product at scale.
About the team
The Hardware Engineering AI / ML development team is a group of engineers and technical program managers directly responsible for launching hardware in the fleet. Located out of Seattle, Cupertino and Austin we work on programs with global development teams (both internal and external to Amazon). Our servers are located in datacenters globally.
The members of our team have a diverse set of technical backgrounds but all share a common trait of Bias for Action and strong Ownership. We enjoy applying a startup model of delivering fully functional solutions for our customers.
BASIC QUALIFICATIONS
- 5+ years of technical product or program management experience
- 7+ years of working directly with engineering teams experience
- Experience managing programs across cross functional teams, building processes and coordinating release schedules
- Ability to apply critical thinking in complex situations; experience working in cross functional groups including mechanical Engineering, test, reliability, failure analysis
PREFERRED QUALIFICATIONS
- 5+ years of project management disciplines including scope, schedule, budget, quality, along with risk and critical path management experience
- Experience managing projects across cross functional teams, building sustainable processes and coordinating release schedules
- Experience defining KPI's/SLA's used to drive multi-million dollar businesses and reporting to senior leadership
- Experience developing and executing/delivering product and technical roadmaps
- Experience with server technologies such as: power optimization, board design, thermal, mechanical, BIOS, BMC, signal integrity, networking.
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $133,900/year in our lowest geographic market up to $231,400/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.