I can’t seem to find an all-in-one solution that can do that right now…
Yes, partly.
There is still no single public site that perfectly answers a plain-English request like “I want natural chat and accurate calculation” and then returns a definitive ranked list. But there are now several places that are much closer to benchmark-first or task-first than the Open LLM…