Products

Resources

Impact on AI

Company

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Test your LLM's math skills with our benchmark for complex problems and step-by-step reasoning

Toloka Team

Oct 18, 2023

Oct 18, 2023

Insights

Insights

Chess with ChatGPT and 160(!) Tolokers

Chess with ChatGPT
Chess with ChatGPT

There’s a recent trend our researchers couldn’t resist: playing chess with ChatGPT, just for fun. But because we’re Toloka, we added a twist and let a crowd of enthusiastic Tolokers play collectively for the human side. We didn’t expect to see sophisticated moves, but the game held some surprises.

In this post we’ll share the setup and gameplay. See if you can guess who won the match!

Player profiles: Toloka and GPT 3.5

The goal of the match was not to discover the best chess players in Toloka’s huge global crowd. We were simply curious how the crowd would play collaboratively and how ChatGPT would fare against them. We wanted to involve as many people as possible, and our only requirement was that participants know how to play the game and follow our guidelines.

As for ChatGPT, it’s not trained to play chess and it makes a lot of mistakes. But it plays surprisingly well for a language model.

Since neither the Tolokers nor GPT 3.5 specialize in chess, it seemed like the teams were evenly matched (despite the imbalance of 160 players against one language model).

GPT 3.5 tends to play strong for the first 15 to 20 moves and then degrade as the game goes on. So the interesting question was whether Tolokers could withstand the pressure in the first phase to hold out for a win.

Game setup

The players analyzed a virtual chessboard in a Toloka task and input the next move in text format, following specific guidelines. We let multiple people answer until the same move was submitted by two or more players, which indicated a consensus and made the move official. There wasn’t any discussion of the game between Tolokers.

game setup

We used APIs to get moves from ChatGPT and send them to the next Toloka task. Game play was tracked by a chess engine to ensure that all moves were legitimate and to decide when the game was over.

In total, 1094 Toloka tasks were initiated, with 45% expiring due to the set time limits. Out of the 604 submitted moves, 73% were rejected as invalid. Ultimately, 160 Tolokers successfully participated in the game, resulting in a 37-move chess match.

Stats

  • Expired: 45% of 1094 started tasks

  • Rejected: 73% of 604 submitted

  • Accepted: 163 answers from 160 tolokers

  • Price: $0.04 per move

  • Cost of the match: $9

What about cheating and random answers?

In terms of quality control, we opted for a no-training, no-exam approach. Anyone could try playing, but they were only paid for valid moves that were verified by the chess engine. If they submitted an invalid move, they were banned from the game.

We wanted to include as many participants as possible, so we placed a 2-hour temporary ban on a player after each valid submission. This helped us avoid having a handful of skilled chess players dominate the decision-making process.

To ensure fairness and prevent cheating with external chess apps, we imposed a strict time limit for each answer. This gave players just enough time to come up with a move, but not enough time to consult an external source.

We couldn’t do anything to keep our opponent from trying to cheat. But whenever the model made an illegal move, we asked it to fix it. If it couldn’t come up with a valid move, we offered a list of all the possible moves and asked the model to pick one.

The play by play

Tolokers played white, and GPT played black. Here are some of the highlights of the 37-move chess match that ensued.

  • Opening phase. The game begins with Tolokers making unconventional first moves, opting for advancement on the sidelines of the board instead of typical center control.

  • Move 9. Too many options for Tolokers to choose from — it takes a long time to make a decision, with about 60 submissions before reaching agreement.

  • Move 10. GPT displays its first sign of weakness with an irrational move, and Tolokers respond reasonably well. Tolokers are trailing.

  • Move 14. GPT makes an obvious mistake, leading to a heated exchange of pieces and a check. Tolokers manage to regain balance on the board.

  • Move 22. In a pivotal moment, GPT overlooks the impending capture of its rook, resulting in the loss of a critical piece.

  • Endgame. With remarkable precision and elegance, Tolokers strategically turn a pawn into a queen, leading to a series of checkmates and eventual victory.

What’s fascinating about the gameplay is that it feels like intentional teamwork, even though it was a collective effort without any communication between players whatsoever.

As expected, GPT 3.5 showed strength in the early stages of the game, but Tolokers persevered and ultimately emerged victorious.

What we’ve learned

We came away with a few thoughts on humans and AI that aren’t exactly new, but still intriguing.

Collective intelligence is powerful. This was a fun way to explore distributed decision-making with a large (and diverse) group of people. The right setup and quality control helped us harness the wisdom of the crowd for complex problem-solving.

Human creativity and adaptability are still relevant. The experiment highlighted the human ability to think creatively and adapt to dynamic situations. Tolokers dominated the endgame with strategic moves that reflected human intuition and the capacity to handle unexpected scenarios with finesse.

GPT 3.5 is not a chess engine. Large language models are not designed to play chess, and they’re not particularly good at it. If you’re serious about playing against AI, you’re better off trying a chess engine like Stockfish. That being said, your first match with a language model could be a memorable experience.

For better or worse, we didn’t prove anything about Tolokers’ mastery of chess. But we witnessed the beauty of solving complex challenges with small contributions from a diverse group of people — the heart of Toloka.

Article written by:

Toloka Team

Updated:

Oct 18, 2023

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe
to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

Subscribe to Toloka News

Case studies, product news, and other articles straight to your inbox.

Subscribe

More about Toloka

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?

What is Toloka’s mission?

Where is Toloka located?

What is Toloka’s key area of expertise?

How long has Toloka been in the AI market?

How does Toloka ensure the quality and accuracy of the data collected?

How does Toloka source and manage its experts and AI tutors?

What types of projects or tasks does Toloka typically handle?

What industries and use cases does Toloka focus on?