Researchers at Princeton's Center for Information Technology Policy (CITP) claim that over 400 of the world's top 50,000 websites use 'session replay scripts' to track user behaviour. While this in itself may not be that disconcerting, the researchers add that these sites often do not strip personally identifiable user information from the behaviour data they glean, potentially giving hackers access to a trove of personal data sometimes even including passwords, should this data be exposed.
Detailing their findings last week in the first of several posts about online privacy, CITP researchers Steve Englehart, Gunes Acar, and Arvind Narayan said they looked at seven of the top session replay companies, which provide session replay scripts and frameworks to websites. These were, namely, Clicktale, FullStory, Hotjar, SessionCam, Smartlook, UserReplay, and Yandex. To scrutinise what data was collected and how the collection took place, the researchers set up test pages with session replay scripts from six of the above-mentioned companies. They were also able to estimate the number of popular sites that use such scripts.
The researchers claim that at least 482 of the world's top 50,000 websites use session replay scripts, and that this number may be on the lower side as the scripts don't record the actions of every user that visits, throwing off the researchers' detection rate. Researchers have compiled a full list of the script-using websites they found. Getting to the bit about why this business practice can backfire on users, researchers say a host of information usually ends up being collected during each session, some of which can be linked to personally identifiable data.
"Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details, and other personal information displayed on a page to leak to the third-party as part of the recording. This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes," the CITP researchers explain.
Some session replay script providers - like SessionCam and UserReplay - don't collect user data at all, instead tracking clicks, and almost all provide a dashboard with automatic and manual redaction tools to remove user data. However, there remain a few problems with this approach, as some user data still usually ends up being collected due to the sheer volume making manual scrubbing infeasible, while content displayed on screen is always collected. This last is especially worrying, as oftentimes even sites with other user data redaction methods in place will end up collecting all displayed content - which in the case of Walgreens contained user names, medical conditions, and prescriptions.
Finally, while websites hosting session replay scripts may themselves be protected by the encrypted HTTPS protocol, the session replay dashboards may use the vulnerable HTTP, like those provided by Hotjar, Smartlook, and Yandex, the CITP researchers noted. HTTP would allow attackers to use man-in-the-middle attacks to get access to the user data as it is transmitted to third-party servers. Yandex in a statement to Motherboard responded to the claims, and said, "HTTP is used intentionally, as session recordings load websites using iframe. Unfortunately, loading HTTP content from HTTPS websites is prohibited on the browser level so HTTP player is required to support HTTP websites for this feature."
Among the sites that use session replay scripts, major names include Bonobos and Fidelity, apart from the already named Walgreens. After the publication of the CITP study last week, Bonobos told Wired it has ended data sharing with FullStory and was reviewing its protocols to better protect user data. A Fidelity spokesperson told Motherboard that the protection of customer data was its highest priority, but didn't clarify if it would stop using such scripts. Walgreens took the same tack as Bonobos, and said it had in an "abundance of caution" stopped sharing data with FullStory while it investigated the claims.
The study notes that ad-blocking lists and tracking protection services like EasyList and EasyPrivacy do provide some measure of safety, but do not block everything. Motherboard reports that Adblock Plus has been updated post the publication of the CITP study to block all named scripts.