Software Development Engineer, AWS Parallel Computing Service, Slurm team
DESCRIPTION
The Parallel Computing Service (PCS) team at AWS is seeking a Software Development Engineer to join the core Slurm team. The role involves building and shipping services that focus on advancing PCS capabilities to run and scale high-performance computing (HPC) workloads using the open-source Slurm scheduler.
As a Software Development Engineer at PCS, you will learn about and contribute to the massive scale supported by AWS, enhancing customer experience globally. You will influence service development methods, provide best practices guidance, and offer architecture feedback.
As a team member, you'll collaborate with outstanding engineers and leaders to refine product requirements with product managers, define architecture, and take a leadership role in the implementation and launch of software. All team members actively participate in planning, product definition, technical architecture review, iterative development, code review, and operations. Additionally, you'll have the opportunity to interact with enterprise customers to ensure their needs are met. Clear, professional communication with teammates and customers is an essential part of the job.
Furthermore, as a member of the PCS team, you will be part of the larger Amazon EC2 family and its world-class team. This is an opportunity to operate and engineer systems on a massive scale and gain top-notch experience in cloud computing. You'll be surrounded by wickedly smart people who are passionate about cloud computing and believe that world-class service and great user experiences are critical to customer success.
The ideal candidate has thrived and succeeded in delivering high-quality solutions in a geographically distributed and dynamic team where priorities shift rapidly. If you're looking to solve challenging technical problems and create great products for customers, we would like to talk to you.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life Balance
Our team believes in maintaining the right work/life balance. We understand the importance of a healthy and sustainable lifestyle, and we strive to create an environment that fosters productivity while respecting personal time. Join us in building cutting-edge technologies and making an impact in the container orchestration space, all while enjoying a fulfilling and balanced professional life. If you are passionate about High Performance Computing, want to join a collaborative and fast-paced environment, and contribute to the future of Slurm and computation on AWS, we encourage you to apply and be part of our talented team at PCS.
Key job responsibilities
- Architect, develop, and maintain core functionality to manage high performance computing clusters.
- Develop tools to streamline deployment, monitoring, and maintenance processes for the services owned by the team.
- Functionally decompose complex problems into simple, straight-forward solutions.
- You limit the use of short-term workarounds. You do things with the proper level of complexity the first time (or at least minimize incidental complexity).
- You are proficient in a broad range of design approaches and know when it is appropriate to use them (and when it is not). Your solutions are pragmatic.
- Collaborate with the Slurm maintainers and open-source community to drive improvements and ensure alignment with industry best practices.
- Provide mentorship and knowledge sharing within the team to facilitate a collaborative and learning-oriented environment.
About the team
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects that help our team members develop your engineering expertise so you feel empowered to take on more complex tasks in the future.
BASIC QUALIFICATIONS
- Proven experience in software development, with a focus on distributed systems with at least 1 programming language.
- Non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
- Solid knowledge of Linux fundamentals.
- Experience with cloud-native technologies.
PREFERRED QUALIFICATIONS
- Bachelor's degree in computer science or equivalent.
- Several years of experience of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
- Experience programming in Java.
- Experience scripting in Python.
- Experience with Slurm or other HPC schedulers (LSF, PBS, GridEngine, etc.) and/or other HPC technologies.
- Experiencing mentoring junior software development engineers and driving engineering excellence.
Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.