VLMs aren't blind

I attempted to reproduce the results for one task from the VLMs are Blind paper.
Specifically, Task 1: Counting line intersections.
I ran 150 examples of lines generated by the code from the project with line thickness 4.

I started with the prompt:

How many times do the blue and red lines intersect?

using the model claude-3.5-sonnet with temperature 0.
The paper reported 73.00% correctness…

Read more →
Claude Artifacts

I spent some time working with Claude Artifacts for the first time.
I started with this prompt

I want to see what you can do. Can you please create a 2d rendering of fluid moving around obstacles of different shapes?

In effort to not spend this whole post quoting prompts[^1], I've exported the whole conversation returned from the Anthropic API, using the response from the following…

Read more →
Claude 3.5 Sonnet Codes Really Well

import Chat from '@components/prose/Chat.astro';

One of my favorite things to do with language models is to use them to write code.
I've been wanting to build a variation on tic-tac-toe involving a bit of game theory.
I called it "Tactic".
I wasn't even really sure if the game would be any more interesting than tic-tac-toe itself, which reliably ends in draws for any players who understand the…

Read more →
Page 1