Methods, systems, and apparatus, including computer programs encoded on computer storage media, for agent behavior prediction using keypoint data. One of the methods includes obtaining data characterizing a scene in an environment, the data comprising: (i) context data comprising data…