Google's contractors working on the company's Gemini AI are comparing its answers with those of Anthropic's rival model, Claude, according to internal correspondence seen by TechCrunch. This development has raised questions about whether Google obtained permission to use Claude's outputs in testing Gemini, and what this means for the future of AI development.
The practice of evaluating AI models against competitors is common in the industry, typically done by running models through industry benchmarks rather than manually evaluating competitors' AI responses. However, in this case, contractors tasked with rating the accuracy of Gemini's outputs are comparing them with Claude's, scoring each response according to multiple criteria such as truthfulness and verbosity.
The contractors are given up to 30 minutes per prompt to determine whose answer is better, Gemini's or Claude's. The internal correspondence seen by TechCrunch reveals that the contractors recently began noticing references to Anthropic's Claude appearing in the internal Google platform they use to compare Gemini to other unnamed AI models. At least one of the outputs presented to Gemini contractors explicitly stated, "I am Claude, created by Anthropic."
The contractors have observed differences in the responses of the two AI models, with Claude's outputs appearing to emphasize safety more than Gemini's. In certain cases, Claude wouldn't respond to prompts that it considered unsafe, such as role-playing a different AI assistant. In another instance, Claude avoided answering a prompt, while Gemini's response was flagged as a "huge safety violation" for including "nudity and bondage."
Anthropic's commercial terms of service forbid customers from accessing Claude "to build a competing product or service" or "train competing AI models" without approval from Anthropic. Notably, Google is a major investor in Anthropic. When asked about obtaining permission to access Claude, a spokesperson for Google DeepMind, which runs Gemini, declined to comment.
Shira McNamara, the spokesperson, did acknowledge that DeepMind compares model outputs as part of its evaluation process, but emphasized that it doesn't train Gemini on Anthropic models. "Of course, in line with standard industry practice, in some cases we compare model outputs as part of our evaluation process," McNamara said. "However, any suggestion that we have used Anthropic models to train Gemini is inaccurate."
This development raises important questions about fair use and permission in AI development. As tech companies race to build better AI models, the lines between collaboration and competition are becoming increasingly blurred. The incident also highlights concerns about the potential risks and biases of AI models, particularly in sensitive areas like healthcare.
Last week, TechCrunch exclusively reported that Google contractors working on the company's AI products are now being made to rate Gemini's AI responses in areas outside of their expertise, sparking concerns about the accuracy of the model's outputs. The latest revelation adds another layer of complexity to the ongoing debate about the responsible development and deployment of AI technology.
As the AI landscape continues to evolve, it remains to be seen how companies like Google and Anthropic will navigate the challenges of collaboration, competition, and responsible innovation. One thing is clear, however: the stakes are high, and the industry must prioritize transparency, accountability, and ethical considerations in the development of AI models that will shape our future.