What’s It Like to Trust an LLM: The Devolution of Trust Psychology?

By on November 16th, 2025 in Social Implications of Technology

Simon T. Powers, Neil Urquhart, Chloe M. Barnes, Theodor Cimpeanu, Anikó Ekárt, The Anh Han, Jeremy Pitt, and Michael Guckert

 

The original purpose of large language models (LLMs) was in generative natural language processing (NLP) tasks. Text generated by earlier versions of GPT was easily identifiable as untouched by human hand, and that remained the case in 2020 [1]. However, it was only two years later that ChatGPT was released to a general public that had effectively been softened up by search engines, voice-activated virtual assistants, and anthropomorphized robots [2] but otherwise was completely unprepared for an instrument that appeared to have access to arcane knowledge and even agency—although all its output was based on a stochastic sequence-creating process. The use of LLMs by people lacking expertise and experience, and for purposes they were not originally intended for, is, on the one hand, a fine example of technological generativity; on the other hand, it is having profound and consequential effects on trust relationships. The issue is not just that there is a problematic trust issue between people and technology [2]. This cognitive deskilling is having a secondary, deleterious metalevel impact on trust decisions and trusting relationships between people and between people and organizations.

It is the timid response to this metalevel issue that we draw attention to in this article. We start by demolishing the common category mistake that LLMs are “just the same” as pocket calculators, i.e., LLMs are simply a logical progression in the development of tools to improve efficiency, enhance human competence, or free time for other more socially productive pursuits. We conclude that LLMs represent, in fact, a difference in kind, not degree. Based on this perspective, we go back to first principles and apply a general model for making trust decisions, through which we argue that the input signals for a trust decision with respect to LLM usage are being distorted: what people think they are trusting—in fact, overtrusting—is really a thin veneer for what they are actually, and perhaps mistakenly, trusting. These unintended consequences have deep effects on human trust relationships, which are still largely unexplored.

Pocket calculator category mistake

It has been argued that concerns over LLMs are no different than a previous concern that pocket calculators were deskilling a younger generation’s education in mental arithmetic. However, this analogy is substantively, both qualitatively and quantitatively, different. Evidence for the potential harm done by using LLMs for teaching critical skills such as logical reasoning or math has already been observed [3]. We argue that the difference between LLMs and pocket calculators includes the following.

  • The output of an LLM is probabilistic, as opposed to the deterministic output of a pocket calculator.
  • The pocket calculator does not have a built-in bias based on the data that it has been trained on, nor an algorithm that can customize or constrain the output that is displayed.
  • The pocket calculator does not exhibit a form of agency that can, deliberately or otherwise, produce a symmetric coshaping between the user and the tool, and users and the tool.
  • The pocket calculator does not try to anthropomorphize its output with chaff like “let me think about it,” which creates false expectations of human-like interaction.
  • Domain specificity: there is a finite limit of operation with a pocket calculator, but, with LLMs, every conceivable knowledge domain and skill can be undermined.

 

It has been argued that concerns over LLMs are no different than a previous concern that pocket calculators were deskilling a younger generation’s education in mental arithmetic.

Therefore, LLM interaction is not, as often misrepresented and misconceived, a natural language interface to an omniscient database or oracle. Rather dangerously, the form of its language presents an illusion of well-informed authority, even while its content is absolute garbage. This means that content requires careful validation, which presupposes that the motivation for inquiry is the pursuit of knowledge and not short-cutting expediency. In turn, some necessary domain knowledge is required to evaluate whether or not the output is approximately correct. Only by asking many questions do users notice the model’s limited variety in sentence structure and length, which reveals some nonhuman-like behavioral tics that might induce users to rethink their perception of and relation to the model and shape their own tool usage in turn.

We have already seen that when search engines became widely accessible and allowed individuals to access vast amounts of information, their use by society shaped how society reasoned. With search engines, metrics such as relevance of information and order of presentation of results are determined by the collective action of its users (who links to who) and by the willingness of information sources to procure attention for themselves by paying for it through advertising [4]. A similar coshaping is occurring with the widespread access of LLMs, but, with two key differences: 1) the agentic nature of the LLM implies active intent on the shaping of society by the LLM and 2) the preponderance of different LLMs—each trained on their own datasets and manifesting their own biases—creates a kind of epistemic fragmentation as each different subset of society is shaped by their chosen LLM.

In addition, there are two further issues of time and timing related to generic tool use. With a pocket calculator, the time the interaction takes scales linearly with the complexity of the question, the effort of formulating the expression, and the relative simplicity of the answer. Instead, with LLMs, simple questions that are quick to enter can generate complex answers. This correlation is important. One reason why “chalk and talk” was a successful mode of education was that it slowed down the rate of presentation (by the teacher) to the rate of processing (by the students). Changing to the use of prepared slides incrementally increased the rate of presentation, so the students stopped writing. LLMs threaten to accelerate the speed of presentation even further so that students stop writing and reading. Yet, studies have shown that reading printed books leads to better reading comprehension, deeper learning, and better recall compared to digital books [5].

With regards to timing, apprenticeships generally progress with exposure to, and practice with, certain tools only once a certain level of accomplishment has been achieved (e.g., carpentry starts with hacksaws, not chainsaws). Pocket calculators are generally only supplied once a student has sufficient grounding in some abstract concepts (e.g., a symbolic representation of nothing), and the basic principles of number theory and the transformative effect of mathematical operators have been mastered. LLMs are available to anyone, at any age, with a keyboard, screen, and an internet connection. Therefore, LLM adopters can lack sophistication in being able to understand what is written, let alone analyze it critically—assuming (as suggested above) that they do even read it. Learning math before being exposed to the calculator makes the user “consciously competent”—they can perform the task (finding the answer to a calculation) themselves, and they can verify the result because they understand the underlying operations that lead to the answer. Being taught math using a calculator from the beginning could arguably result in the user becoming “unconsciously competent”—they can find the answer, using a calculator, but have no understanding of how the answer is derived. In this situation, the user becomes dependent on the calculator to perform the task: remove access to the calculator, and the user becomes “Consciously incompetent”: they cannot perform the task, yet they are aware of what they are missing (i.e., math skills).

These substantive differences between what ostensibly describes the same situation—human competence extended with some tool—are, we will argue, a difference of kind rather than one of degree. To better understand this difference, we examine the implications of human–LLM interaction from the perspective of trust.

First principles trust analysis

The analysis of the previous section reveals a consequential issue with the dismissive “same as a pocket calculator” argument: it transfers preexisting trust relationships and implicitly exploits automation bias to leverage a situational disposition to trust. This converts LLM usage from a first encounter “risk-exposure” trust decision to a reliance trust decision as an energy-saving cognitive shortcut.

To demonstrate this, we now carry out a systematic appraisal of a decision to trust an LLM using the Lewis and Marsh [6] general functional model for trust decisions. This model describes how subjective trustworthiness judgments of the trustor are based on trustworthiness features of the artifact to be trusted, which are derived from available information. These trustworthiness features, or criteria as shown in the leftmost column of Table 1, are competence, predictability, honesty and integrity, and willingness & benevolence. Together with additional subjective and situational factors, such as a general disposition toward risk, this leads to either trusting or nontrusting behavior toward the artifact [6]. Table 1 summarizes the evaluation of the pocket calculator and an LLM with respect to each criterion, interposed with Wikipedia. The intention is to demonstrate an incremental increase in degree from calculator to Wikipedia but a significant difference in kind from calculator and Wikipedia to an LLM.

Table 1 
Evaluation of Pocket Calculator, Wikipedia, and LLMs Evaluated According to the Criteria of the Lewis and Marsh General Trust Model

Crucially, it is not just the decision to trust an LLM that is problematic. The increasing use of generative AI undermines our ability to make trust decisions in other contexts, as we now illustrate with an analysis of the decision to trust in the age of LLMs according to each Lewis and Marsh criterion.

Competence

While technologies such as smartphones and search engines offer individuals the opportunity to access vast amounts of information online, a distinction is made between physical access to the technology and “access” in terms of possessing the skills and knowledge to use the technology effectively. This is when skills also impact whether an individual can “access” technology, in addition to the more physical definition of access [7], [8]. Proficient use of technology here is not equated to proficiency in skills such as learning, comprehension, or critical thinking. The same argument can be made for the use of LLMs—being able to effectively use an LLM for a goal does not require or inherently lead to competency or the development of skills. However, these critical thinking skills are needed to make a trust decision about anything, given the inputs in the functionalist trust model described above.

Traditionally, effort has been a signal of competency—someone gains competency by having put effort into training, e.g., through working as an apprentice or studying at a university. However, in the age of LLMs, effort is no longer valued in that way—anyone can generate an essay, or computer code, without any investment in training. Indeed, investing effort to create an artifact now becomes seen as a lack of competence and a symptom of technology averseness. Thus, the approved signals of competency are at risk of becoming more and more hidden by the increasing use of generative AI: if anyone can generate a recommendation letter in 30 seconds using an LLM, then recommendation letters decline in value and eventually become irrelevant. Consequently, the human ability to make trust decisions based on perceived competency is undermined, not just in terms of whether to trust an LLM, but even in terms of trust decisions in interpersonal or business relationships.

Predictability

The essentially probabilistic nature of LLMs requires well-trained and skilled users who can competently prompt the model to produce a specific behavior and create output that does not consist of hallucinated text. Therefore, proper use of an LLM not only requires expertise in the domain of application but also in the emerging skill of prompt engineering.

A cautious user of LLMs, for example, in an application that is more safety-critical than a task with high requirements for creativity, is aware of the inherently probabilistic rather than deterministic nature of the output, plus the possibility of “hallucination,” bias, and potential model collapse. When LLMs are used in that way, users undergo a shift from just doing a job to needing to also consciously evaluate the quality of that part of the job effectively subcontracted out to the LLM. To do this successfully, they not only need to know how to do their own job but also how to “teach” the LLM (through appropriate prompting and correction) to produce consistent, reliable, and accurate results.

This neatly illustrates a new LLM-enabled style of automatic programming, in which the LLM is used as a mechanism to produce “low-level” programming code. Programmers are enabled to think (i.e., program) at a higher level of abstraction, such as providing glue code to interoperate with the LLM-generated code or components. While LLMs seem to make things easier at first sight by not requiring the specification of quality criteria, this evaluation becomes implicitly part of the duty of the human. However, passing on the unpredictability of LLM outputs as the responsibility of the human to validate those outputs—which, in the context of LLM-generated code, we might translate as an obligation of “caveat programmor”—diminishes the nature of trusting relationships in general.

Honesty and integrity

Reputation-based scoring systems have become a well-established feature of online interactions and commercial transactions. They are designed to assign a quantitative value to an individual’s or organization’s reputation based on a number of data points: a specific example is the AirBnB host–guest rating system, which is a key tool for hosts to build trust with potential guests, and for guests to be accepted by potential hosts. These systems offer a potentially rich source of third-party signals, which can be used to inform trustor decisions (and, indeed, provide constructive feedback for trustees to improve a product or service). However, while there is widespread evidence of systematic abuse of such scoring systems, for example, in fake reviews or blackmail threats to provide poor reviews, the more significant problem here is uniformity produced by the convergence of grade inflation (everyone has to get the highest score) with—when everyone has easy access to generative AI that is indifferent to the quality or accuracy of its output—prosaic, clichéd, and essentially indistinguishable reviews. Such standardization of reviews implies both thoughtlessness and a certain degree of dishonesty, and the resulting distrust is being observed in other domains of human activity and interaction. Anecdotally, for example, it is reported that teachers are inclined to automatically discount student dissertations at the first occurrence of certain notorious trigger words [9].

However, it is not that such reviews or references are syntactically identical, but that they look and sound sufficiently similar to give an unwarranted credence to their honesty or veracity. This is linked to the rhetorical quality of ethos, the character, or integrity of the speaker. Integrity is crucially important not only in the misuse of LLMs but also in the foundations on which they are built, i.e., in the training data itself. In the creative space, authors often remain uncompensated when their works are used to train LLMs, accessed through pirate websites or other illegally distributed content sources. While this could reasonably count as gross misconduct, it serves as a warning to what could happen if “faux science” (i.e., statements that give a vague appearance of scientific validity but are incompatible with the scientific method) is similarly distributed intentionally, without the regulation offered by the rigors of the scientific method, (for all its potential biases and idiosyncrasies) the peer review process, and (for all its perverse financial incentives for publishing houses) reliable gatekeepers of knowledge. It provides another route for the “Merchants of Doubt” [10] to “flood the zone” with fraudulent pseudoscience and conspiracy theories.

Willingness and benevolence

Willingness, as a trustworthiness feature, is concerned with the alignment of the potential trustee’s motivations with the trustor’s needs. In this context, willingness can be considered in terms of benevolence: the trustee acts in the trustor’s interest, not in their own. This is not necessarily the case with interaction with a commercial LLM, for example, the trustee (LLM) might be telling the user what the trustee wants the user to know (or not know, e.g., DeepSeek and Tiananmen Square [11]). Second, each interaction allows the collection of more data points about the user. Third, commercial LLMs are not a charity—venture capitalists will seek a return on investment. This return might be either directly in terms of revenue stream or indirectly in terms of social or political influence and domination. Services are offered for free until a critical mass is achieved, and users will then have to pay a subscription or “pay by ask.” Even financial loss can be acceptable for those who can write it off as overheads or a business expense if there is sufficient indirect recompense, such as increasing the dependence of users on the technology or reducing debate about complex and nuanced subjects to polarized slanging matches with armies of AIs distorting both balance and proportion.

Proxy trust

Lewis and Marsh say that through proxy trust, an object or technology can also act as a facade for other trust relationships. Thus, proxy trust can create trust in both the technological artifact itself and trust in the providing organization, and there is a potential duality: the LLM can be trusted because the providing organization is trustworthy, and the providing organization can be trusted because the LLM is trustworthy.

Here, however, there is a direct parallel with cryptocurrencies, where people are effectively asked to detach an existing and well-established trust relationship with an institution, in this context, a bank, or building society with a reputation for trustworthiness and reattach it to the owners and programmers of a blockchain and so by proxy in the blockchain application itself. However, there are already many examples of the abuse of blockchains in fraud, financial misconduct, pyramid schemes, and money laundering to suggest that reliance on trust decisions can be misguided because of these misconceptions about the transfer of trust-relation analogies. Similarly, there is a preexisting proxy trust relationship between people and the traditional nonprofit gatekeepers of knowledge that is being co-opted and reattached to the commercial organizations deploying LLMs, which equally should be reappraised from a first encounter trust perspective.

Situational disposition to trust

Humans are naturally context-aware, yet that feature can be a bug in the widespread adoption of LLMs. One might gravitate toward a simple Google search to fact-check or briefly research information and then be presented, unprompted, with an LLM-generated summary of the results. Search engines are one domain where we expect relevant and factual information to rise to the top but are, instead, being replaced by error/hallucination-prone summaries. This behavior can be reinforced since search engines are used by society to access information, while, simultaneously, this collective action of its users influences the relevance and order of displayed information [4]; this may, in turn, facilitate the propagation of errors or hallucinations (although the term “hallucination” is itself a misdirection: the statistical model produces a false exterior response rather than a false interior experience [12]). Trust in such contexts might be inaccurately biased, raising important questions about the deployment of LLMs in situationally sensitive domains, such as political opinion, medical advice, or even subjective reviewing.

In many domains of human endeavor, using an LLM generated output of a human–computer interaction as the indistinguishable input signal to a human– human interaction can be seen as “cheaply signaling”.

In summary, the analysis of the previous section identifies two features of evolutionary social behavior that has previously enabled human communities to maintain large-scale cooperation and trust, but those are now put at risk by current and future generative AI technologies. These two features are given as follows.

  • The loss of costly signaling: for example, a well-crafted reference letter conveys evidence of its own reliability because of the cost to produce it, which is negated if such letters can be produced without cost; similarly, the reported surge in job applications is partly attributed to LLMs reducing barriers to entry, rendering the resume practically worthless as a document.
  • The loss of indirect reciprocity: indirect reciprocity encourages social cooperation by third-party observation of benevolent interactions, which is how reputation and reviewing systems work; however, some restaurant owners have been blackmailed with fake negative reviews that are easy to generate in potentially convincing detail using the reviewing system data itself.

 

Evolutionary biology proposes that animals have evolved reliable signaling systems to ensure the coordination of the receiver and the speaker. This has been refined with the idea of costly signaling: since signals must be reliable and interaction starts with the receiver, the signaller must actually incur some resource cost that both prioritizes attention to the signal but crucially reinforces the interpretation that the receiver can trust the signaller. Evolutionarily, this affords a competitive edge compared to individuals with less ability who could not afford such a cost [13].

In many domains of human endeavor, using an LLM-generated output of a human–computer interaction as the indistinguishable input signal to a human–human interaction can be seen as “cheaply signaling”—it can be interpreted by the receiver as an indicator of a lack of care, respect, and thoughtfulness. However, if it becomes harder and harder to distinguish LLM-generated text from human-generated text, then the ability to recognize cheap signaling breaks down, and hence, the ability for costly signaling to support cooperation and trust decisions breaks down.

How could we prevent this? The loss of costly signaling affects our ability to judge competence. To prevent this, we need to change the signaling system, that is, change how competence is assessed. To come back to the reference letter example, reference letter requirements could be changed to require the attachment of evidence to support the claims being made or the format changed to a phone call conversation with the referee to assess their personal experience of working with the applicant. In the higher education system, university assessments could be changed to restore confidence in the signal, for example, through closed-book or oral examinations.

The second social support from our evolutionary past that is at risk is indirect reciprocity [14]. This is the ability to condition cooperation and trust on information from third parties, i.e., reputation. This risks being undermined by the ease with which reviews can now be generated.

How could we prevent this? Reputation systems need to be changed to prevent exploitation by automated generation. While there could be attempts to detect and prevent generated reviews, e.g., monitoring for spamming, a more fruitful avenue may be to weight reviews by the amount of objective evidence they provide. For example, hotel reviews that include proof that the reviewer stayed there during those dates may be weighted higher when calculating the average star rating compared to reviews without such evidence.

In conclusion, generative AI is conflicting with our psychology, which has evolved over millions of years to support trust decisions. There is, therefore, an urgent need to understand how the outsourcing of cognition to generative AI may lead to cognitive deskilling. This article has examined this in the context of trust decisions. Future studies should examine the risk of losing other defining features of our humanity.

Author Information

Simon T. Powers is a lecturer in trustworthy computer systems at the University of Stirling, FK9 4LA Stirling, U.K. His research builds evolutionary and game-theoretic models of human social systems, with a focus on cooperation, institutions, and trust. He has applied the insights from these models to the design of a range of sociotechnical systems, including AI governance mechanisms, community energy systems, and trust interactions between people and artificial agents. Powers has
a PhD in computer science from the University of Southampton, Southampton, U.K. Email: s.t.powers@stir.ac.uk.

Neil Urquhart is a lecturer in computer science at Edinburgh Napier University, EH10 5DT Edinburgh, U.K. He specializes in the use of AI techniques to solve real-world problems. He has worked on projects that have applied AI to food logistics, parcel deliveries, health care scheduling, and public transport.

Chloe M. Barnes is a lecturer in applied AI and robotics at Aston University, B4 7ET Birmingham, U.K., and champions the Evolutionary and Adaptive Intelligence Theme at the Aston Centre for Artificial Intelligence Research and Application (ACAIRA). Her research interests include computational intelligence and artificial life and include exploring the unintended
consequences of interaction in systems.

Theodor Cimpeanu is a post-doctoral research fellow at the University of Stirling, FK9 4LA Stirling, U.K. His research interests include regulation—how to build better institutions and how to avoid the pitfalls of AI that arise in human–machine interactions and from races to the bottom. He tackles these problems through evolutionary game theory and has a keen focus on fairness, with a side view on modeling emotions.

Anikó Ekárt is a professor of artificial intelligence at Aston University, B4 7ET Birmingham, U.K., and the director of the Aston Centre for Artificial Intelligence Research and Application (ACAIRA). Her research is centered around AI methods and their application, with a focus on evolutionary algorithms and genetic programming. Following genetic programming performance-improving methods, she has successfully contributed to applications of AI techniques to health, engineering, transport, and art. She is an advocate of co-creating trustworthy AI solutions with stakeholders.

The Anh Han is a professor of computer science and the lead of the Center for Digital Innovation at the School of Computing, Engineering and Digital Technologies, Teesside University, TS1 3BX Middleborough, U.K. His current research interests include evolutionary game theory, LLM simulations, and AI governance modeling.

Jeremy Pitt is a professor of intelligent and self organizing systems in the Department of Electrical and Electronic Engineering at Imperial College London, SW7 2BT London, U.K., where he leads the Self-Governing Systems Laboratory. His research interests include developing formal models of social processes using computational logic and their application in self-organizing multiagent systems for engineering cyber–physical and sociotechnical systems, wherein sustainability, justice, collective action, civic dignity, and the asymmetric distribution of legitimate political authority are central issues.

Michael Guckert is a professor of business informatics at Technische Hochschule Mittelhessen University of Applied Sciences, 35390 Gießen, Germany. His research focuses on the use of artificial intelligence in industrial production and medical applications. He is a member of the founding faculty of hessian.AI, the hessian center for Artificial Intelligence in Darmstadt and member of the board of that institution.

__________

To read the full version of this article, including references, click HERE.

________