A system evaluates modifications to components of an autonomous vehicle (AV) stack. The system receives driving recommendations traffic scenarios based on user annotations of video frames showing each traffic scenario. For each traffic scenario, the system predicts driving recommendations based on…