Tumblr, WordPress Plan to Sell User Data to OpenAI and Midjourney to Train AI Models: Report

Tumblr and WordPress data compiled for sharing reportedly included private data in addition to public content.

Written by Akash Dutta, Edited by David Delima | Updated: 28 February 2024 18:49 IST

Tumblr, WordPress Plan to Sell User Data to OpenAI and Midjourney to Train AI Models: Report

Photo Credit: Pexels/Tracy Le Blanc

Tumblr and WordPress let users post public and private content

Highlights

Tumblr and WordPress will offer options to opt out of sharing data
Automattic said only public posts will be shared with partners
Google has also struck a similar deal with Reddit

Tumblr and WordPress users might soon find that their data is being used to train artificial intelligence (AI) models, as per a report. The parent company of the blog sites, Automattic, has allegedly struck deals with OpenAI and Midjourney to sell user-generated content that will reportedly be used help train AI. While the details of the deals and the data-sharing practices remain unclear at the moment, this has raised a question on data privacy and the ethics of companies sharing their users' data with third parties.

Internal communications by employees of Automattic, viewed by 404 Media, both confirmed the deal with AI companies and revealed details on these practices. In its report, the publication confirmed that Automattic's deal with OpenAI and Midjourney could be announced soon. Further, it appears data compilation for the AI firms has already begun. Meanwhile, an internal post made by a product manager Cyle Gage suggested that all Tumblr's public post content between 2014 and 2023 was compiled.

The report also highlights a specific message that suggests private and deleted user content was also automatically compiled, alongside public data. It was not clear whether that set of data was already shared with the AI firms or not. Further, since such an accident puts its entire user base's private information in jeopardy, it also raises a question about the company's ethical policy and data safety infrastructure.

Samsung Galaxy A15 5G Gets New 6GB RAM 128GB Storage Variant in India

Automattic on Tuesday issued a statement stating, “AI is rapidly transforming nearly every aspect of our world, including the way we create and consume content. At Automattic, we've always believed in a free and open web and individual choice. Like other tech companies, we're closely following these advancements, including how to work with AI companies in a way that respects our users' preferences.”

The post detailed several things the company is doing for its users including blocking AI platform crawlers, a setting to discourage search engines from indexing a site on WordPress and Tumblr, and an assurance of an opt-out setting for users who do not wish to share data with the third party. “Currently, no law exists that requires crawlers to follow these preferences,” the post stated.

The mechanism to opt-out of data sharing is also somewhat unclear. While the company stated in the post that the AI firms will respect the opt-out settings and even remove the past content from users who have newly opted out, the report claims the reality is more complicated.

Vivo V30, V30 Pro With 50-Megapixel Front Camera Debut: See Specifications

The report found an internal document from February 23 where an employee asked whether the company had any assurance that the data partner would respect the opt-out decision made by users. Andrew Spittle, Automattic's Head of AI, reportedly replied, “We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point. I don't think they gain much overall by retaining it.”

The response was noted to be vague and does not confirm if Automattic had an agreement on the same, according to the report. Further, it appears that the entire line of reasoning holds on the assumption that AI firms will not gain much by retaining the user data. It should be noted that the practice of third-party data sharing is not new, and most social media platforms hold the rights to user-generated public content on the platform. However, making such deals without revealing it to users could potentially expose private information to companies that are using the same data to train AI systems.

This Google Chat Feature Will Reveal The Most Relevant Conversations

Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.

Affiliate links may be automatically generated - see our ethics statement for details.

Comments

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Further reading: Artificial intelligence, AI, Tumblr, WordPress

Akash Dutta Email Akash Dutta

Akash Dutta is a Senior Sub Editor at Gadgets 360. He is particularly interested in the social impact of technological developments and loves reading about emerging fields such as AI, metaverse, and fediverse. In hi... more »