Job description
About The Role
Partners with stakeholders and leads team efforts to build and maintain Machine Learning backend services and solutions to support user-facing products, downstream services, or infrastructure tools and platforms used across Uber.
What the Candidate Will Do
• Design and build tools to empower production teams to innovate and productionize state-of-the-art deep learning models at Uber.
• Develop and maintain scalable, end-to-end deep learning training systems and frameworks.
• Ensure distributed training tools are reliable, efficient, flexible to use for new production use cases.
• Collaborate with cross-functional teams including machine learning engineers, backend engineers, data scientists, and data engineers to deliver robust ML solutions for Uber.
• Basic Qualifications
• Master in relevant fields (CS, EE, Math, Stats, etc.) AND 8+years full-time Software Engineering work experience in deep learning
• Proficiency in Python and PyTorch
• Expertise in designing, debugging, and optimizing distributed deep learning systems.
• Working experience of distributed training in PyTorch at Scale (e.g., data parallelism, model parallelism).
• Strong ability to translate complex DL requirements and problems into scalable solutions.
• Preferred Qualifications
• Expertise in distributed training frameworks such as DDP, DeepSpeed, FSDP, or TorchRec.
• Familiarity with C++, Go or CUDA programming.
• Expertise in optimizing GPU/TPU training performance and data loading efficiency.
• Familiarity with large-scale distributed infrastructure tools like Ray, OpenAI Triton, PyTorch Lightning.
• Built and deployed end-to-end machine learning systems in production.
• Experience training large models (10B+ parameters), such as large recommendation systems or large language models (LLMs)
• PhD in relevant fields (CS, EE, Math, Stats, etc.)
For Sunnyvale, CA-based roles: The base salary range for this role is USD$232,000 per year - USD$258,000 per year. You will be eligible to participate in Uber's bonus program, and may be offered an equity award & other types of comp. All full-time employees are eligible to participate in a 401(k) plan. You will also be eligible for various benefits. More details can be found at the following link https://jobs.uber.com/en/benefits.