Skip to content

Google offers its AI watermarking technology as a free open source toolkit

    Google also notes that these types of watermarks work best when there is a lot of “entropy” in the LLM distribution, meaning there are multiple valid candidates for each token (for example, “my favorite tropical fruit is [mango, lychee, papaya, durian]”). In situations where an LLM “almost always returns the exact same answer to a given prompt” – such as basic factual questions or models tuned to a lower “temperature” – the watermark is less effective.

    A diagram explaining how SynthID's text watermarks work.

    A diagram explaining how SynthID's text watermarks work.


    Credit: Google/Nature

    Google says SynthID builds on previous similar AI text watermarking tools by introducing a so-called Tournament sampling approach. During the token generation loop, this approach runs each potential candidate token through a multi-stage bracket-style tournament, with each round being “judged” by a different randomized watermark function. Only the ultimate winner of this process ends up in the final output.

    Can they tell it's Folgers?

    Changing the token selection process of an LLM with a randomized watermarking tool can obviously have a negative effect on the quality of the generated text. But in its article, Google shows that SynthID can be “non-scrambling” at the level of individual tokens or short strings of text, depending on the specific settings used for the tournament algorithm. Other settings can increase the “distortion” introduced by the watermark tool while increasing the watermark's detectability, Google says.

    To test how potential watermark distortions could affect the perceived quality and usability of LLM output, Google ran “a random fraction” of Gemini queries through the SynthID system and compared them to non-watermarked counterparts. Out of a total of 20 million responses, users gave 0.1 percent more “thumbs up” ratings and 0.2 percent fewer “thumbs down” ratings to the watermarked responses, barely showing any humanly perceivable difference in a large number of real LLM interactions.

    Google's research shows that SynthID is more reliable than other AI watermarking tools, but the success rate depends heavily on length and entropy.

    Google's research shows that SynthID is more reliable than other AI watermarking tools, but the success rate depends heavily on length and entropy.


    Credit: Google/Nature

    Google's testing also found that its SynthID detection algorithm could successfully detect AI-generated text significantly more often than previous watermarking programs such as Gumbel sampling. But the extent of this improvement – ​​and the overall speed at which SynthID can successfully detect AI-generated text – depends heavily on the length of the text in question and the temperature setting of the model being used. For example, SynthID was able to detect almost 100 percent of the Gemma 7B-1T's AI-generated text samples of 400 tokens at a temperature of 1.0, compared to about 40 percent for samples of 100 tokens of the same model at a temperature of 0.5 . temperature.