Ever since ChatGPT surged in popularity in November, the AI chatbot space has become increasingly crowded with a multitude of ChatGPT alternatives. With varying levels of machine learning models (LLMs), pricing, user interfaces, internet access, and more, it can be difficult to decide which chatbot to use. To make comparing them easier, the Large Model Systems Organization (LMYSY Org), an open research organization founded by students and faculty from the University of California, Berkeley, created the Chatbot Arena. This benchmark platform allows users to put two randomized models to the test by inserting a prompt and selecting the best answer without knowledge of which LLM is behind either response. After users pick a chatbot, they can then see which LLMs were used to generate the output. The results of the user ratings are then used to rank the LLMs on a leaderboard based on an Elo rating system, a widely-used rating system in chess, according to LMSYS Org.

To test this out for myself, I used the prompt, “Can you write me an email telling my boss that I will be out because I am going on a vacation that was planned months ago.” The two responses were very different, with one providing much more context, length, and fill-in-the-blanks that would have been appropriate for the email. After picking “Model B” as the winner, I found out it was the LLM created by LMSYS Org, based on Meta’s LLaMA model, “vicuna-7b.” The losing LLM was “gpt4all-13b-snoozy,” an LLM developed by Nomic AI and finetuned from LLaMA 13B.

The leaderboards unsurprisingly place GPT-4, OpenAI’s most advanced LLM, in first place with an Arena Elo rating of 1227. In second place with a rating of 1227 is Claude-v1, an LLM developed by Anthropic. GPT-4 is found in both Bing Chat and ChatGPT Plus making both of those chatbots the best available right now, which aligns with ZDNET’s own AI chatbot rankings. Anthropic’s second-ranking Claude is not available to the public just yet, but it does have a waitlist available where users can sign up for early access. Ranked number eight on the leaderboard is PaLM-Chat-Bison-001, a submodel of PaLM 2, the LLM behind Google Bard. This ranking parallels the general sentiment behind Bard, not the worst but not one of the best.

On the Chatbot Arena site, there is an option where you can select the two different models you want to compare. This feature could be a great resource if you want to experiment with specific LLMs. Additionally, a recent study has shown that financial and legal professionals have seen the value in generative AI. With the Chatbot Arena, users can now easily compare different LLMs and find the best one for their needs.

The Chatbot Arena is an innovative platform that has the potential to revolutionize the AI chatbot space. It allows users to compare LLMs in a fair and unbiased way, and it provides a leaderboard that ranks the LLMs based on user ratings. This platform has made it easier for users to decide which chatbot to use, and it has allowed the AI community to gain a better understanding of what LLMs are the best. With the Chatbot Arena, users can now easily compare different LLMs and find the best one for their needs.

