
Now that generative artificial intelligence (AI) tools that produce data and information are becoming widespread and commonplace, studies are showing that humans have an insatiable appetite for data. These appetites can result in unhealthy data-snacking behaviors and addictions. Some people refer to the practice as “going down rabbit holes” or having a “data binge.”
Humans have an insatiable appetite for data.
For practical purposes, we refer to this technology as “chatbot media (CM).” In this article, CM can be defined as text-based generative AI products and services built on large language models (LLMs) and machine learning (ML), provided to the public as an assistive writing tool and information source. Although these tools are still in the experimental stage, they are already being deployed and enjoyed by many people for practical and entertainment purposes. Because we are talking about data as a form of text-generated information, the terms “data” and “information” can be used interchangeably in context. It should also be noted that presently not all chatbots are powered by AI whose corpus is modeled on LLMs and ML. An example of a chatbot not powered by AI is a rule-based, scripted chatbot that is designed to follow preset rules to respond to user inputs, most often found in online customer service.
Defining “data hunger”
In simplest terms, “data,” in the context of the pursuit of information, are defined as: “…a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted” [1].
These appetites can result in unhealthy data-snacking behaviors and addictions.
The hunger for data is part of the human essence and DNA. It is an integral part of the basic existential quest for wisdom. Among data and information science canons, data have sometimes been viewed as the foundation, or first stage, of the hierarchical path to wisdom, such as that depicted in the highly-referenced Data, Information, Knowledge, Wisdom (DIKW) pyramid, also known as the “Wisdom Hierarchy” or “Knowledge Hierarchy,” whose origins are uncertain but believed to originate from a T. S. Eliot poem, “Choruses” that appeared in the 1934 pageant play, “The Rock” [2].
The DIKW hierarchy attracted attention in 1989 when American organizational theorist Ackoff [3] produced the seminal paper, “From Data to Wisdom,” for the Journal of Applied Systems Analysis, after his Presidential Address to the International Society for General Systems Research (ISGSR) in 1988. The following is a graphic representation of Ackoff’s “Wisdom Hierarchy.” However, the DIKW hierarchy has been criticized by some data and computer scientists, including Frické [4], who posits that “the hierarchy is unsound and methodologically undesirable” in the article, “The knowledge pyramid: a critique of the DIKW hierarchy.”
Although Ackoff did not present the Wisdom Hierarchy graphically in this article, he has been credited with its graphic representation as a pyramid (Figure 1).
Figure 1.DIKW pyramid (also known as the Wisdom Hierarchy).
The hunger for data is part of the basic existential quest for wisdom.
In this analysis, the data and information that users acquire from CM might not actually be bringing them any closer to answers or wisdom because the data and information could be inaccurate or contain mis-data or dis-data, more commonly known as misinformation or disinformation. Equally concerning is the supply of and demand for data. In our consumerist society, if we liken data to a product, the more products there are to consume, the more opportunities for consumption and the potential for overconsumption. The more the overconsumption, the more the product production, whether it is food, fashion, or data. Consequently, overconsumption of products can have severe impacts on natural resources [5].
This article is concerned with protecting the natural resource of human intelligence (HI).
Offered in this article is a perspective of framing the sociotechnical challenge of data-generated AI as one of chronic overconsumption in society that has led to a voracious appetite and hunger for data, a demand that has been accelerated by the most innovative data providers.
Circularity between AI and human data mining
In this article, “data hunger” in the context of CM has three meanings within the circularity of data sharing:
- human-produced data inputted into AI by humans;
- AI-generated data outputted to humans;
- human-produced data, generated by AI, outputted into the world [6].
We can look at this circularity using a data recycling paradigm in a larger interrelated system, where data is inputted, processed, and then outputted, the output of one process becomes the input of another process, and so on (Figure 2).

Figure 2.Circularity of data sharing.
Another way of looking at this data recycling phenomenon is that there are two agents, the AI chatbot and the human, which gives life to CM, and those two agents, as well as the data shared between the two, have a symbiotic relationship. The AI chatbot and the human are codependent to give and to receive. However, the exchange is currently imbalanced, with greater benefits going to the AI chatbot and greater harms going to some of the humans.
There is a circularity of mutual taking and giving of data, i.e., the mining and sharing of information between AI chatbots and humans, with both agents gathering and giving data from and to one another. Problems, however, can arise and be compounded when misinformation and disinformation are exchanged between the two and continuously recycled.
Although both agents will glean information from one another, this circularity of data exchange currently works more in the AI chatbot’s favor, as it is designed to be a ML algorithm that improves with data, and as yet, the AI chatbot models are imperfect in terms of accuracy.
How can we create a better balance with both entities benefitting and fewer harms being recycled?
Anticipated problems
Before we delve into some possible solutions, we need to recognize that the problems do not lie in CM as a whole. The problems stem from the coding and modeling of CM, which at its core, is about the taking and making of data.
Problems can arise and be compounded when misinformation and disinformation are exchanged and continuously recycled.
As standards and regulations are imposed, some harm will be unavoidable. We cannot predict nor control all the unanticipated consequences of a new innovation, but we should nonetheless be prepared for those consequences and do what we can to prevent harm before they happen. Publicly available technology assessment reports and studies from reputable organizations, such as the U.S. Government Accountability Office (GAO), Institute of Electrical and Electronics Engineers (IEEE), and the U.S. National Science Foundation (NSF), provide high-quality, expert research to policymakers and the public.
Preparation is key, and the only way we can reach substantial preparation is through funding for research. While the implementation of policies and regulations will lag behind the creation of emerging technologies, it is never too early to start thinking ahead about ways we can prevent a crisis or address a crisis as soon as it arrives. The race can seem daunting when technology such as AI chatbots has had a massive head start and spawned ancillary support systems. In the case of AI chatbots, in addition to the rapidly rising numbers of CM, there are already user success apps and humans that help users generate prompts to yield the most accurate outputs. Known as “prompt engineers” [7], these service apps are themselves coded by humans with their own set of biases, thus presenting an additional layer of data to the circularity of human-coded, text-to-text algorithms.
Surveillance capitalism [8] has taught us that within web-based data and information technologies, such as social media and search engines and even interactive extended reality (XR) entertainment (e.g., Pokemon Go), the user audience is the product [9]. Big Tech companies are hungry for the user’s data and habits to build a profile on each user for various business model reasons [10], such as targeting ads with the goal of selling products to the user, selling user information to third parties, curating user experience so that they continue returning to the site, or any combination of the above. Essentially, human data are mined by CM.
These data producers require massive amounts of data input to keep them advanced. When more data are required, companies turn to new methods of extracting new data to increase the volume necessary for continual innovation. A New York Times investigation found that Open AI, Meta, and Google turned to transcribing YouTube videos without copyright consent. All online information, including photographs and movies, is susceptible to being mined for their data without permission from the creators [11].
Similarly, humans mine massive amounts of data from CM, which is a form of data and information provider with similarities to search engines and social media, for their own self-interests. Besides using CM as a word production assistant, people use the data for multiple purposes, from planning gardens, workouts, and meals to coping with attention-deficit/hyperactivity disorder (ADHD) and dyslexia to playing games and even building new games [12]. Hence, humans share a codependency and voracious appetite for data with AI chatbots. After all, AIs are programed by humans in their own self-image. AI reflects the views, preferences, biases, and cultural history of those who program it [13].
Data snacking
In a Medical News Today article titled “Are our brains addicted to information?” author Maria Cohut, Ph.D., states “While, throughout the past, the human race hungrily sought information to maximize the odds of survival, easy access to useless information may now lead to an overload” [14]. Cohut asserts that human’s modern-day obsession with information-seeking can be harmful and that the act of seeking information out of idle curiosity might result in humans’ brains receiving the equivalent of empty calories when they scan voraciously on social media or read idle, byte-sized pieces about nothing in particular. They might be developing a dependency on idle information to constantly graze on.
Her information source is a report produced from an investigation into what idle curiosity looks like inside the human brain. The study was performed by neurologists Kenji Kobayashi and Ming Hsu from the Helen Wills Neuroscience Institute and the Haas School of Business at the University of California, Berkeley. Kobayashi and Hsu sought to balance the psychological and economic perspectives on idle curiosity or the reasons why people seek out information.1 What they found was that the act of information-gathering uses the same neural codes that respond to money, food, recreational drugs, and learning the odds of winning in a game—all of which produce high levels of dopamine, a hormone, and a chemical transmitter that motivates behavior.
In their report, “Common neural code for reward and information value” [15], Kobayashi and Hsu posit that compulsive information-seeking behavior by humans has never been greater. Despite the fact that it is ideal to only gather information that would be useful to them, humans frequently do so for reasons of idle curiosity and enjoyment of the unknown.
In an interview with Cohut, Hsu said: “To the brain, information is its own reward, above and beyond whether it’s useful. And just as our brains like empty calories from junk food, they can overvalue information that makes us feel good but may not be useful—what some may call idle curiosity” [14].
In other words, junk information, such as junk food, can be unhealthy and addictive.
So much idle data and information via search engines, social media, and CM are available to us in the digital age, and receiving idle data and information out of idle curiosity, whether useful or useless, offers us a pleasurable reward. The report essentially states that humans consume and sometimes overconsume all types of data and information for the thrill of anticipation and receive a dopamine rush from seeking data and information (whether out of a sense of idle curiosity or a purpose-driven task). That can lead to a reward cycle, which can become highly addictive and lead to constant data hunger. An example of our data hunger is our inability to stop checking for texts and news on our phones or PCs or to leave our devices behind for extended periods of time, such as a day or more. We want and need data and information, whether it is from a text message, a social media, or a CM.
Data hunger that results in “data snacking,” where one seeks data purely for idle curiosity’s sake or the fun of it without an end goal, can be likened to munching on junk food. In terms of CM, while not harmful in and of itself, data snacking as a means of gathering data and reapplying that data without fully understanding its level of accuracy can result in unanticipated consequences and unintended harm because some CM data are inaccurate and biased. If the human is counting on that data being accurate, yet the data are actually inaccurate or biased, and the human uses that data to produce a fact-based product, then misinformation and biases will be propagated, and a vicious cycle of societal harm will be perpetuated. In the case of introduced inaccuracies or biases, this regresses the status of marginalized persons and communities that have worked hard to dispel the stories that have historically oppressed them and makes it increasingly challenging for them to find ways of improving their future.
Presently, the CM corpus is modeled on LLM data sets that are biased, inaccurate, and hurtful to many people. A report by Okerlund [16] from the University of Michigan Ford School of Public Policy found pressing problems with LLMs, including: “Exacerbating environmental injustice, accelerating the thirst for data, normalizing LLMs, reinforcing social inequalities, remaking labor and expertise, and increasing social fragmentation.”
If we have an addiction to data and information, CM has the potential to feed our hunger for idle and purposeful or entertaining data, similar to social media, and to make us hungry for more. We have learned that social media has resulted in addictive behavior, misinformation, and disinformation, as well as harm to many marginalized and vulnerable persons, particularly young people. An American Psychological Association (APA) science-backed research on the harms and positive outcomes of social media on teens found that: “The potential risks of social media may be especially acute during early adolescence when puberty delivers an onslaught of biological, psychological, and social changes. One longitudinal analysis of data from youth in the United Kingdom found distinct developmental windows during which adolescents are especially sensitive to social media’s impact. During those windows—around 11 to 13 for girls and 14 to 15 for boys—more social media use predicts a decrease in life satisfaction a year later, while lower use predicts greater life satisfaction” [17], [18].
Before CM becomes normalized in people’s lives and possibly leads to addictive behaviors, it would be wise and prudent to pause, research, and consider the ways in which CM can help or hurt people and to find ways of maximizing the benefits and minimizing the harms. Most importantly, strong preventative measures and guardrails, policies, and regulations would serve and protect the public. Otherwise, a culture of new data (re)generating inaccurate or biased old data will be perpetuated.
Public trust and generative AI
Public distrust of generative AI and digital texts, as well as digital images and audio recordings, which can be viewed as mirrors of us, will likely grow. If we cannot trust these technologies, and these technologies are mirrors of ourselves, how can we trust ourselves or each other? In this mirror, it is possible that as we become increasingly wary of AI inputs and outputs, AI will become wary of our inputs and outputs. That wariness has positive and negative implications. The positive is that we hope we will learn to view and utilize these tools as assistants, keeping in mind that they are flawed and that we need to fact-check everything. The negative is that we might not be able to identify the accurate from the inaccurate, nor know where to go to fact-check.
Within the first two months of its release on 30 November 2022, ChatGPT became the fastest-growing consumer application in history [19]. It has become a popular tool for acquiring data, writing assistance, and entertainment. Professionals in various industries from engineering to law are touting its competence. Some economists predict that hundreds of millions of jobs, especially those in the creative sector, such as writers, editors, and artists, will be lost and replaced by AI, resulting in “a cataclysmic reorganization of the workforce mirroring the industrial revolution” [20]. Others believe that AI will replace entry-level type jobs and not the jobs where humans currently outperform AI, such as high-level legal analysis. That might change, however, as CM advances and becomes more sophisticated.
AI chatbots feed on our data to sustain themselves. We feed on AI chatbot data to assist us. As earlier mentioned, the circularity of data exchange between the AI chatbot and the human is currently imbalanced, with more benefits going to CM and some harms potentially being imposed on society. How can we create a better balance with careful design and business models?
Recommendations
Many questions and answers are still unknown at this point, and further research and studies are highly recommended on how to responsibly build these systems in sustainable ways that protect people and the planet while profiting.
Need for understanding the past to prepare for the future
AI has great potential to benefit humanity in terms of connectivity, education, entertainment, knowledge, and making human resources the most valuable thing we have.
Before moving forward with a new technology, it is important to understand its lineage and precedents. As technologists, we see that CM exhibits all the characteristics of creative destruction, also known as Schumpeter’s gale, which is a good or a bad thing, depending on one’s points of view on innovation, competition, entrepreneurship, and capitalism. In this case, creative destruction, a term describing “a process in which new innovations replace and make obsolete older innovations” [21], is exhibited by the ways in which AI chatbots are replacing humans for text-related tasks such as writing, editing, and word-based communications. We are undergoing a period of transition in which the ways we digitally gather and produce data, and information is changing.
Sam Altman, C.E.O. of Open AI, the company that built ChatGPT, said that this is a “printing press moment,” [22] making a generalized comparison of the revolutionary AI chatbot technology to the revolutionary printing press technology, an invention by Johannes Gutenberg in 1440 Germany that changed the way we disseminated and consumed information and launched the Printing Revolution [23].
CM has the potential to reduce some productivity problems, but at what expense? It also has the potential to make some problems bigger. Solutions are often most effective when approached holistically and gathered by lessons learned from past mistakes. “We cannot solve our problems with the same level of thinking that created them” [Albert Einstein].
Need for widespread stakeholder engagement
It has been said that the first step in solving a problem is recognizing that there is a problem. Taking that idea further, there are multiple ways of looking at a problem and coming up with a solution. Therefore, engagement, conversation, and perspectives between the quintuple innovation helix of academia, industry, government, the public, and the natural environment [24] are keys to increasing the chances for robust, self-reflexive discussions and solutions. In a 2021 report on creating frameworks for responsible development of AI, the U.S. GAO recommends wide-ranging stakeholder engagement. Figure 3 is a visual chart of the key players who should be seated at the table.
Need for sustainable AI data centers
The data centers powering generative AI require tons of energy and water.
They should be designed sustainably to use renewable energy and less water.
Need for better preparation
The major players competing for first mover advantage and leadership of this emerging technology are not unaware that their systems contain flaws. However, they cannot anticipate all the flaws on their own nor think of all the ways in which to mitigate the flaws that could cause harm. Making mistakes happens quickly. Fixing mistakes takes a long time and is socially, environmentally, and economically costly.
On 30 October 2023, an executive order (EO) on AI was announced by President Biden [26]. Highlights of the EO include the following goals: “safety and security, innovation and competition, worker support, consideration of AI bias and civil rights, consumer protection, privacy, federal use of AI, and international leadership” [27].
However, as of this printing, that EO is currently under revocation by the current administration.
Need for a multidisciplinary approach
Following that EO in 2023, the National Endowment for the Humanities (NEH) launched the research initiative, Humanities Perspectives on AI, in response to President Biden’s EO on safe, secure, and trustworthy AI, with the goals of setting new standards for AI safety and security, protecting Americans’ privacy, and advancing equity and civil rights. Their mission is: “to support research projects that seek to understand and address the ethical, legal, and societal implications of AI. NEH is particularly interested in projects that explore the impacts of AI-related technologies on truth, trust, and democracy; safety and security; and privacy, civil rights, and civil liberties” [28].
Finding ways to maximize benefits in safe, responsible ways is necessary to protect the public.
Again, it should be noted that as of this printing, the EO is currently under revocation by the current administration.
CM, and generative AI in general, has the power to damage an already fragile social and environmental ecosystem. It has enormous potential to provide helpful services to the public and advance the common good. Like most innovations of the past, such as calculators (which replaced long math and memorization of multiplication tables) and word processors (which replaced typewriters and handwriting), CM will become a useful tool but should not be counted upon for being the final creator. It will lighten the workload and save time for users. It will provide answers and instant gratification easily, and we will come to understand that we need to fact-check those answers.
Until that happens, finding ways to maximize the benefits in safe, responsible ways is necessary to protect the public. Innovations led by the infamous motto, “move fast and break things” (Mark Zuckerberg) [29], have resulted in various harms to society. Maybe it is time to slow down and fix things. As a counterbalance to our spirit of myopically deploying new sociotechnical innovations without holistic considerations, proper training, or responsible design and implementation, we are presented with a calling to focus more on efficiency and conservation, i.e., improving the systems currently in place. We would be remiss in not listening to that calling. Basic economics shows that wisely saving and investing money is as crucial as making money [30].
The Societal Habit and culture of overconsuming everything from food to material possessions to data have had negative impacts on people and the planet. Overconsumption of food causes obesity. Overconsumption of material products causes pollution and anthropogenic climate change. It is yet to be fully understood and determined where overconsumption of data will lead us, but we know that tech companies who build data-producing AI chatbots constantly need to mine mass amounts of idle online data to innovate and sustain the AI’s corpus and could consequently be creating more so-called data pollution.
ACKNOWLEDGMENTS
This article emerged from the Master’s thesis of Teana Davies completed on 1 June 2023. I thank Katina Michael for her detailed copyediting and supervision. This article would not have been possible without her boundless enthusiasm, support, and faith.