I attempted to reproduce the results for one task from the VLMs are Blind paper.
Specifically, Task 1: Counting line intersections.
I ran 150 examples of lines generated by the code from the project with line thickness 4.
I started with the prompt:
How many times do the blue and red lines intersect?
using the model claude-3.5-sonnet with temperature 0.
The paper reported 73.00% correctness…