Sobering up on "AI": Why designing with LLM is Confusing (Part 2)

Interactive 3D model of a tennis ball. Drag to orbit, otherwise auto-rotates slowly. — Figure 00 3D tennis ball. Drag to rotate.

The Inversion of Mental Models

Imagine a tennis ball; why imagine? Here is a 3D model of a tennis ball.

It has a texture, weight, shape, and other properties that inform us humans (and our animal companions) what we can do with it. For starters, obviously, play tennis, but human creativity expands this idea further by creating unique and novel interpretations of more actions and tasks we can map onto this object for particular problems.

We can throw the ball to play fetch with our dogs.
We can put them below chairs so that it does not make a sound when kids drag them.
Hang it by the garage roof to help us park our cars.

Three vignettes: a puppy with a tennis ball in its mouth, classroom chairs with tennis balls on their feet, and a tennis ball hung from a garage ceiling above a parked car. — Figure 01 Same tennis ball, multiple uses.

These novel uses are part of our and our great ape cousins’ innate ability to make tools out of rocks and sticks. When we map actions to objects, they become tools for us. A rock becomes a hammer to open nuts in the wild. A stick becomes a spear when throwing the pointy end through an animal.

This ability to map actions on objects arguably has been one of the driving variables in our co-evolution of language.

As we started building more sophisticated tools, we eventually developed proto-linguistic constructs to help conceptualize what we were doing. We had to develop an ability to think ahead, which meant we had to think hierarchically in time and space to plan complex tools that needed group effort. We had to come up with a shared vocabulary to pass the knowledge from one generation to another and enable the chances of survival.

Progression of tools from a plain rock, to a sharpened stick, to a stick with a stone point, to a metal-tipped spear, to a basic bow, to a modern takedown bow. — Figure 02 As language evolved so did our tools.

As a Human-computer interaction specialist, I leverage these and other human cognition facts while creating products. For example, let us take a button. I can place a button below a box labeled “username” and another box with “password.” And if the button says “Login” it becomes a metaphorical door to your shopping. Then, I place the button next to an image of an item with a label called “Add to cart,” and it becomes the action of you putting it in the basket. “Checkout” becomes the metaphor of you walking over to the cashier’s counter. Finally a button that says “Place your order” I can invoke the idea that you safely transacted by inserting your card into the credit card terminal.

When I place the proper objects (buttons) with the right call to action in the appropriate context, I can reuse the same object in multiple ways. I am reliably banking on these cognitive rules that an operator will trust to execute intuitively. With the correct mapping of the mental model in the interaction model, the product will achieve usability.

This creates a reliable design framework when only one entity in this interaction paradigm has cognition. But LLM spoofs cognitive heuristics very specifically on the inanimate object side, inverting this framework on its head.

Stage Left, Enter: LLM

One evening, I had an enthusiastic, albeit heated, experience with an ATM. We argued about why I should not be allowed to withdraw $40 and only be allowed to withdraw $12 instead. I would like to grab a meal at a McDonalds near this ATM. The new and improved LLM-backed algorithm argued that my demand for $40 would lead to overindulgence. It predicted my need to lose the holiday weight (before the holidays, mind you!) because I started my gym membership using the same card last week. That evening, it concluded that these two financial decisions were at odds, and the ATM was just looking out for me.

This example is fictitious, and I do not mean to upset current or future sentient ATMs. Designing for pure, unbridled LLM in a product that has not accounted for Human factors is like debating with an ATM. My foundational need for autonomy and task completion is still not gone when I interact with inanimate objects with touch interfaces.

Here is what happens: we take most of the internet’s content for training and tuning large-language models, blend it with our understanding of linguistics, apply layers of complex algebra, statistics, and matrix multiplications, and package it all into an interface built on decades of computer science and the psychology of human conversation. The result? A bizarre collapse of interaction models where language becomes the object, and my regular old brain cannot compute.

Stacked diagram of the theoretical sub-components beneath a modern LLM: linguistics, statistics, matrix algebra, psychology, and computer science layered like sediment. — Figure 04 Theoretical sub-components that shape the modern LLMs.

Diagram showing all the layers of linguistics, statistics, ML, and UI stacking on top of one another inside the HCI paradigm for LLMs. — Figure 05 All these layers working on top of each other in the HCI paradigm for LLMs.

How do you design for the thing that no longer fulfills the perceptual requirements to be a “thing.”

The “weight” and “parameter’ of the LLM are talking with me for one thing and firing all my conversational habits with my interaction with it. This makes me believe I am conversing with something beyond the garden variety chatbot. I’m not sure where my interaction biases start; maybe it’s because I’m in a chat interface, and having grown up with Yahoo Messenger, I default to assuming that there is cognition on the other side. It also uses emojis like my friends, and other sophisticated conversational techniques like turn-taking and gentle disagreement, along with a very real human tendency to gaslight the reader with its hallucinations.

Chat-only interfaces will likely show poor features utilization (for the reasons discussed in my previous article) after a specific user tenure within the application. This comes from the observation that several “prompt-engineering-like” buttons have appeared in the most significant public LLM models. Buttons like “Search the web” or “Create Image” enable interaction acceleration points for the batch of sub-tasks the user must do to achieve their goals. Previously, you had to write a specific keyword(s), having memorized it for the best prompt output. This was to enable users who could not explore the entire feature set. And yes, I know you can get more out of these LLMs if you talk straight with their APIs. Like any other expert application, expert users would have their own needs and challenges to cope with as the models evolve. Their entire solution set or business model might collapse because their solution algorithm is absorbed by the LLM feature list.

Five human figures standing outside an enormous golden orb. Still from the 1998 film Sphere. — Figure 06 A still from the movie Sphere (1998).

It is safe to assume that it is not worth the effort to take the conversational interface, model it as is, and plant it in your next big product idea. If you do that, it starts to look like an alien orb from the 1998 movie “Sphere,” where the user is pontificating outside your application, with all their instincts telling them to do copywriting-adjacent tasks and user doesn’t get visibility on how the system is interpreting the human input.

Since the interaction model is inverted, the human operator may constantly hope that LLM reads their mind and magically does what they vaguely typed. Instead of the UI being Buttons and menus and a layout mapped with intuitive actions, actions are executed independently of the notion of any such objects grounding the user’s metaphor in an interaction modality. You see a button and click it, and it leads to deterministic feedback; something happens on the screen, but in the case of an LLM, your UI wrapper has to accommodate the user’s interaction in an inferential backend, putting the labor back on to the user to create order out of chaos.

We know from experience that if the LLM does some things, it is tough to judge why it did what it did. Until we get a stable and public version of Explainable AI (XAI), this orb-shaped LLM might have all the affordances of a tennis ball. However, it still feels alien to design for, and we cannot play tennis with it (yet).

Figure 07 Conceptual LLM ball: affordances without grounding.

I am intentionally (but temporarily) parking the “Ghost in the Shell” type arguments. This type of argument asks if there is consciousness in LLMs because it handicaps the interaction design paradigm in a particular way that is hard to resolve in a LinkedIn article and maybe my lifetime. Instead, I want to focus on implementing various HFE/HCI techniques with a careful choice of specific tasks for LLM automation. Doing so consistently over time is how we arrive at a cohesive and usable experience while mitigating risk to human life. Product owners must closely examine which aspects of traditional workflows can be automated using these technologies and compare them with existing usability metrics to see any significant gains (if any).

And finally, I want to mention in a large quote block.

LLMs are not reducing HCI’s scope of work; it is, in fact, doubling it.

I argue in good faith that product and business owners want to mitigate the risk associated with poor product usability. In that case, it’s vital to recognize that these technologies are still evolving and imperfect, and that’s a risk you must be aware of and manage. Your product is currently tied to version 1.0 of the tool kit. With tech debt being what it is and system architecture having its own momentum, your product’s technology infrastructure might struggle to adapt quickly to version 2.0, potentially baking newer risks and unintended consequences. Effective product design with LLM at its core necessitates replicating HCI research and design efforts two times. First, to address the human operator’s needs, and then a second set of HCI activities for the purchased version of LLM. This process is no different than evaluating the pros and cons of an application design with relational databases or real-time databases and accounting for how those database designs handicap and enable HCI in specific ways.

Which we’ll explore further in Part 3.