The Flawed Paradigm of Opting Out
There’s an ongoing discourse, that’s the polite description for “lots of noise” on various platforms about what generative AI means for your life, job, family, the economy, the world at large…In the search industry, we're in the middle of a heated debate about AI crawlers and data scraping for training machine learning models. Most discussions focus on whether users should have the option to opt out of data collection and the impact on their visibility and revenue. Right now, it feels like content creators, people bringing something of value to the world, have to jump through hoops to protect their intellectual property by opting out. But why should they bear that burden? Shouldn't platforms and companies need our explicit permission before using our data for AI training? Wasn’t that type of thing already settled? Copyright as a legal concept has evolved over several centuries.
The discussion about whether to allow LLMs to gobble up our valuable creative output fundamentally misaligns with principles of user control and privacy. Instead of debating whether users should opt out, we should look at the various forms of protection we should be implementing. I believe we are turning more and more towards "walled gardens". A walled garden is a closed ecosystem in which the provider controls the content, applications, and media available to users within its network. Doesn't that sound like many successful newsletters or Patreon accounts? Of course it does.
To be clear, I think companies should have offered the ability to opt-out from the start. I also believe that all content creators should be voluntarily opting in vs being opted in by default. The notion of consent underlies this perspective.
The Vanishing Logout Button, a Great Parallel
This isn’t the first time big tech companies get away with outrageous behaviors. Opting in by default was a thing that happened for many people online in the 00s and 2010s. Many were loving the Facebook sign-in option instead of having to remember passwords for everything. To understand the mess we're in, let's look at the concept of logging off in today's digital landscape. Terry Nguyen nailed it when they said the logout button is becoming irrelevant. We're always logged in, and the design of websites and apps reflects this trend. Logout buttons are hidden in settings menus or gone entirely. Despite the cultural emphasis on urging people to "log off" for mental well-being, the technical aspect of staying logged in remains prevalent. Logging off is increasingly impractical.
We all leave browser tabs and apps open. Everything keeps operating in the background, right along with data collection. As a result, the design of sites and apps reflects this. According to UI designer Jesse Showalter, the logout button has lost significance for both companies and consumers.
Companies might have an incentive to keep users logged in, as reflected in interface designs that either bury the logout button in settings menus or remove it entirely. But before we take a detour into conspiracy theories, let's remember that user behavior influences design decisions. The contemporary expectation of seamless data transfer across devices contributes to the need for users to stay constantly logged in.
Until users feel the urge to demand more control over their data, we'll be stuck with the binary of logged in and logged out, no nuance for the in-between. And I fear
Read more here: The logout button has become defunct.
Developers may prioritize other features over the logout button unless dealing with platforms that handle sensitive user information. This shift in design priorities reflects a broader trend towards seamless connectivity at the expense of user control. So is our content considered sensitive information? Something that could absolutely have an impact on our wellbeing, financial stability or overall future if it were to be used in a way that is not the explicit reason why we published it?
Should ChatGPT, Gemini and others be able to train on your content or use it by default?
So how should we treat AI crawlers and scrapers used for LLM training data? In the SEO community, there is a lot of noise about the ability to "opt out" of LLM training and its impact on your visibility in search engine results pages. Regulation is slow, but it's not like these companies respect the current legislation in place when it comes to intellectual property. What I worry about is that opting out is very much a baseline that will not be respected.
Opt out like Scarlett Johansson did? That went well. As we grapple with the complexities, it's essential to consider the implications for human agency and privacy. Scarlett Johansson's recent experience with OpenAI highlights the risks of companies disregarding our long established legal rights. Johansson is understandably angry that OpenAI launched a chatbot with a voice eerily similar to hers, despite her previous refusal to participate. She revealed that OpenAI had approached her to voice the chatbot, which she declined twice. When the new voice, named Sky, debuted, it drew immediate comparisons to Johansson’s role in the 2013 film "Her." The actress had to hire legal counsel and send multiple legal notices before OpenAI agreed to take down the voice.
The opt-in by default assumes content creators must be proactive about protecting their intellectual property. Scarlett Johansson was proactive. She has the means to also be reactive. Most of us do not. I would go on a tirade about how some folks simply do not understand consent but that’s not the case here. It’s a blatant disregard of consent. Do I trust these companies to respect copyright and requests to not scrape content? Yes. Maybe. After eating up everything that is available, dragging court cases for years and then finally settling for a "sorry, our bad" once they make billions and don't rely on those practices anymore.
I logged into ChatGPT today to see they offer a "connect your apps" in ChatGPT premium. Cool, but also, now, I have to check if they are going to just go wild and eat up all of my content, which would be an issue because I specifically pay for Google Workspace in order to have a modicum of "privacy" regarding my own content in there. See what I did there? That sneaky “logged in at all times” practice is along for the ride. OpenAI is following a playbook that paid off for many big tech companies before them.
I didn't get that far. I didn’t connect the two, because it's frankly exhausting to go down rabbit holes in terms and conditions. And we know that one wrong click means consequences for client data entrusted to me and kept safe in my Google Workspace account because I pay for that service to do so. I won’t say I know fully what deal the company is offering me if I connect both, but I will say that I am very leery of it based on their practices.
So let's go further, it's like we're supposed to keep feeding a beast that doesn't even want to give us crumbs for our labor. At least Google's ecosystem offered something in exchange.
Empowering Content Creators
The best way forward in my opinion is to give content creators the ability to monetize a walled garden in flexible ways to adapt to whatever models content creators develop. That's something content creators built up with companies like Patreon, Substack, OnlyFans.
“We do it with advertisers and ads.txt, so why not with AI crawlers?" It's about respecting creators and their content by asking for permission first, rather than assuming.” This is not the same thing. We are no longer asking creators to enter an agreement where advertisers provide some modicum of compensation for benefiting from the value content creators generate. It is very hard to ask for permission first when you know your business model depends on gobbling up as much quality human content as possible before you are legally required to put a stop to your activities. Upton Sinclair, the American novelist and social reformer, said it best: “It is difficult to get a man to understand something when his salary depends on his not understanding it.”
YouTube did the same thing. There were no mechanisms in place to help content creators on the platform monetize their content. Empowering content creators to monetize their work within controlled environments is a viable option. Platforms like Patreon and Substack have provided content creators with a path forward. I trust CMS companies, social media companies and search engines can do so as well. I mean, governments tend to align with that type of vision as well:
The EU addressed this through the Copyright Directive, particularly Article 15 (formerly Article 11), also known as the "link tax." This directive, adopted in 2019, requires tech giants like Google to pay publishers for using snippets of their articles in services like Google News.
A similar scenario unfolded in Canada with Meta (formerly Facebook) in relation to the Online News Act, also known as Bill C-18. This law mandates that tech platforms like Meta and Google negotiate compensation deals with news publishers when their content is shared on these platforms.
Ideally, data poisoning should be something content creators could opt into though. It's not a practical approach (ahem, Nightshade) but this should be improved and considered as a means of making sure the companies that do not respect the "opt out" do not benefit from theft. Think of all the videos you see on YouTube of dogs trying to grab hedgehogs and learning to never do that again. This is the type of defense needed for content creators.
The public square is not a shared chat
In "No Logo", Naomi Klein discusses the replacement of public spaces by malls in the eighties and nineties. This hegemony of the mall continued for two more decades. Klein argues that malls have increasingly taken over the role traditionally played by public squares and parks. This shift had several implications:
Malls represent a controlled, privatized, commercialized environment focused on consumption. This sounds eerily like the situation we are discussing now.
Public spaces are typically accessible to everyone and serve as venues for free speech, community gatherings, and public demonstrations. In contrast, malls are private properties with strict rules governing behavior, limiting the types of activities that can take place. This sounds eerily like the situation we are discussing now regarding quite a few platforms. You rent the space for your DMs, group chats, water cooler banter on Slack - you are existing on someone else's digital land.
Impact on social interaction: malls changed the nature of social interactions. Interactions in malls are mediated by commercial interests, often reducing social exchanges to consumer transactions. Doesn't that sound a bit like social media? I have a short post about this very problem.
Implications for democracy: Public spaces traditionally serve as venues for political activism and civic engagement. As these spaces disappear, opportunities for public discourse and collective action diminish, potentially weakening democratic participation. Bots are the new implications for democracy. Thanks to LLMs, we can scale the magnitude of the bullshit published online.
Homogenization: malls contributed to the homogenization of public life by promoting the same brands and experiences across different locations. This diminishes the unique cultural and social character of local communities, replacing it with a standardized, corporate-driven environment. Today, the pervasive trend of homogeneity across various fields is mind-numbing: from interior design to architecture, travel, fashion, and even personal appearance. We are converging into a big pile of bland aesthetics and clichés.
Microblogging
Enter the microblogging option offered by ChatGPT. I ponder the microblogging into the ether vibes the "Make this chat discoverable" option brings. It's giving web 1.0 vibes but on a platform that eats you. I see the parallel with malls. On Reddit, we see content being copy-pasted without regard for paywalls or copyright all the time. That’s the equivalent of you bringing outside food into a restaurant and having the restaurant steal the recipe along with the food. This is why I am comfortable saying we need a deterrent, something that brings real-time or short term consequences vs legal ones. I have wished for years now that governments would handle that stuff but it's been a giant let down. These companies can afford a war of attrition.
I am curious to see what comes next in terms of user adoption. Howard Rheingold’s work provides a foundational understanding of the internet's potential to empower individuals and foster community. Rheingold’s concept of virtual communities underscores the power of collective action and peer-to-peer support in digital environments. The success of online platforms, whether it be a site, microblogging platform (Twitter or Tumblr being examples), a channel or anything else has always been humans engaging and interacting with the content. I feel like this feature is a literal “eat the entire internet while we are at it” move instead of an attempt to foster peer-to-peer engagement.
Fin
In conclusion, opting out should be the default setting, placing the onus on platforms to seek explicit consent. This approach not only upholds principles of user control and privacy but also fosters a more transparent and equitable digital ecosystem. As we continue to navigate the complexities of the digital age, let's prioritize policies and practices that empower users and safeguard their rights in an increasingly interconnected world. If you have any examples of online communities, platforms, tools, companies that get this right, please let me know.
Comments