Skip to content

Some top 100,000 websites collect everything you type, before you hit submit

    Some top 100,000 websites collect everything you type, before you hit submit

    When you sign up for a newsletter, make a hotel reservation, or check out online, you probably take it for granted that if you mistype your email three times or change your mind and X leaves the page, it won’t matter. Nothing actually happens until you hit the send button, right? Well, maybe not. As with so many assumptions about the internet, this isn’t always the case, according to new research: A surprising number of websites collect some or all of your data as you type it into a digital form.

    Researchers from KU Leuven, Radboud University and University of Lausanne searched and analyzed the top 100,000 websites, looking at scenarios where a user visits a site in the European Union and a site from the United States. They found that 1,844 websites collected an EU user’s email address without their consent, and a whopping 2,950 collected a US user’s email address in one form or another. Many of the sites apparently have no intention of logging any data, but include third-party marketing and analytics services that cause the behavior.

    After specifically crawling sites for password leaks in May 2021, the researchers also found 52 websites where third parties, including Russian tech giant Yandex, occasionally collected password data before submitting it. The group released their findings to these sites and all 52 cases have since been resolved.

    “If there is a Submit button on a form, the reasonable expectation is that it does something – that it sends your data when you click on it,” says Güneş Acar, a professor and researcher in the digital security group at Radboud University and one of the the leaders of the study. “We were super surprised by these results. We thought we might find a few hundred websites where your email is collected before you submit it, but this far exceeded our expectations.

    The researchers, who will present their findings at the Usenix security conference in August, say they were inspired to investigate what they call “leaky forms” by media reports, most notably from Gizmodo. about third parties that collect form data, regardless of submission status. They point out that the behavior is essentially similar to so-called keyloggers, usually malicious programs that log everything a target types. But on a regular top 1,000 site, users probably won’t expect their information to be keyloged. And in practice, the researchers saw some variations on the behavior. Some sites logged keystrokes by keystroke, but many sites pulled entire entries from one field when users clicked the next.

    “In some cases, when you click on the next field, they collect the previous one, like you click on the password field and they collect the email, or you just click anywhere and they immediately collect all the information,” said Asuman Senol, a privacy and identity researcher at KU Leuven and one of the co-authors of the study: “We didn’t expect to find thousands of websites; and in the US the numbers are very high, which is interesting.”

    The researchers say the regional differences may be related to companies being more careful about user tracking, and possibly even integrating with fewer third parties, due to the EU’s General Data Protection Regulation. But they emphasize that this is just one possibility, and the study did not examine explanations for the disparity.

    Through a substantial effort to notify websites and third parties that collect data in this way, the researchers found that an explanation for some of the unexpected data collection may have to do with the challenge of a “submit” action. distinguishable from other user actions on certain web pages. But the researchers emphasize that this is not a sufficient justification from a privacy perspective.

    Since completing the paper, the group also had a discovery about Meta Pixel and TikTok Pixel, invisible marketing trackers that embed services on their websites to track users across the web and show them ads. Both claimed in their documentation that customers could enable “automatic advanced matching”, which would lead to data collection when a user submitted a form. In practice, however, the researchers found that these tracking pixels grabbed hashed email addresses, an obfuscated version of email addresses used to identify Internet users across platforms, before being submitted. For users in the US, 8,438 sites may have leaked data to Meta, Facebook’s parent company, via pixels, and 7,379 sites may have impacted users in the EU. For TikTok Pixel, the group found 154 sites for US users and 147 for EU users.

    The researchers filed a bug report with Meta on March 25, and the company quickly assigned an engineer to the case, but the group has not heard an update since. The researchers notified TikTok on April 21 — they discovered the TikTok behavior more recently — and haven’t heard back. Meta and TikTok did not immediately return WIRED’s request for comment on the findings.

    “The privacy risks for users are that they are tracked even more efficiently; they can be tracked on different websites, in different sessions, on mobile and desktop,” says Acar. “An email address is such a handy identifier for tracking because it’s global, unique and constant. You can’t delete it like you delete your cookies. It is a very powerful identification.”

    Acar also points out that as tech companies look to phase out cookie-based tracking in a nod to privacy concerns, marketers and other analysts will increasingly rely on static identifiers such as phone numbers and email addresses.

    Because the findings indicate that deleting data in a form before submission may not be enough to protect yourself from all collections, the researchers created a Firefox extension called LeakInspector to detect the collection of fraudulent forms. And they say they hope their findings will raise awareness about the problem not only for regular Internet users, but also for website developers and administrators who can proactively monitor whether their own systems or any of the third parties they use are accessing data from collect forms without permission.

    Leaky forms are just another type of data collection to watch out for in an already extremely busy online field.

    This story originally appeared on wired.com.