Engineering Lead, Benchmarking – AI research

  • Full Time
  • Anywhere
  • $200,000–$300,000 USD / Year

Position: Engineering Lead, Benchmarking
Organization: Epoch AI
Location: Remote (preferred availability to overlap with UTC–8 to UTC)
Compensation: $200,000–$300,000/year
Job Type: Full-time

About the Organization

Epoch AI is a nonprofit research institute focused on studying and communicating AI development trends. Through independent analysis, benchmarking, and data-driven reporting, Epoch aims to support informed decision-making among policymakers, researchers, and the public. Their work includes benchmarking advanced AI models, tracking compute trends, and producing publications at top conferences such as NeurIPS and ICML.

Role Summary

Epoch AI is hiring a Senior Research Engineer to lead its Benchmarking team. This role is responsible for managing the technical execution and development of Epoch’s AI Benchmarking Hub—a platform delivering rigorous and transparent evaluations of top AI models. As the Engineering Lead, you’ll own the engineering roadmap, guide a small team, and personally contribute to evaluating the latest frontier AI models using frameworks like Inspect. You’ll collaborate with researchers to ensure high-quality outputs that serve global stakeholders.

Responsibilities

  • Own and implement the engineering roadmap for Epoch’s Benchmarking Hub

  • Lead and mentor engineering team members

  • Conduct and oversee evaluations of major new AI models, often on tight timelines

  • Implement new benchmarks using the Inspect library and other tools

  • Ensure data integrity and support researchers with evaluation outputs

  • Collaborate closely with analysts and scientists to integrate benchmarking results into wider research

Qualifications

Required

  • Extensive engineering experience (10+ years preferred)

  • Proven leadership in small technical teams

  • Strong coding ability and system-level thinking

  • Ability to manage and build infrastructure to support AI benchmarking

  • Motivation aligned with Epoch AI’s mission for public, trustworthy evaluation of AI models

  • Professional fluency in English

  • Willingness to travel for retreats and occasional work gatherings

Preferred

  • Experience with LLM evaluation tools and frameworks (e.g., Inspect)

  • Familiarity with emerging AI capabilities and trends

  • Domain knowledge of machine learning models and evaluation metrics

Compensation and Benefits

  • Annual salary between $200,000 and $300,000 depending on experience and location

  • Fully remote work with flexible hours

  • Minimum 30 days of paid time off annually (includes vacation and public holidays)

  • Unlimited sick and personal leave

  • Up to 6 months of parental leave (combination of paid/unpaid)

  • Health, life, and pension benefits where legally available

  • Equipment and professional development expense budget

  • Paid travel to retreats and conferences (3 team retreats annually)

  • Access to a Berkeley, CA office for in-person collaboration (optional, 20 days/year minimum access for all staff)

Application Process

Applications are accepted on a rolling basis. No cover letter is required. Please avoid including unnecessary personal information. Submit materials in English and direct inquiries to: careers@epoch.ai.

Apply here

To apply for this job please visit careers.epoch.ai.