Sobering up on "AI": A Human-Computer Interaction Retrospective (Part 1)

Clusters of dark spheres orbit a hex-tessellated core in a diffuse grey chamber, an abstract visual for the unpredictable dynamics and opaque internal state of large language models. — Figure 00 The system, mid-oscillation.

Everywhere I look in my digital landscape, there is now an alluring “AI” button with its colorful gradients, practically begging me to click it, like answering the call of a mythical siren, asking me to dip my toes in the lukewarm waters of life-changing intelligence outside my own. Few applications have yet to onboard something akin to Artificial Intelligence in our relentless march to an adopt-or-perish mentality. I can’t help but feel its pull when it is so easy for me to ask it to “Rewrite this email for me” or “Explain Einstein’s postulates like I am five.” This ability to rewrite my email and then dumb down physics is awe-inspiring and paralyzing. Awe-inspiring because writing emails is still a tedious task, and admittedly, proofreading is painful, so when it saves me some effort, I’m at least owed some “Awe.” In the same breath, it is also paralyzing since I didn’t fully understand Einstein to begin with; I don’t know if I can trust the machine’s attempt at deconstructing it. To be sure about its output, I still need someone with at least a college-level physics to explain it to me. At which point, why bother with this tool? Gellman’s amnesia effect is a true phenomenon in my own experience because I’ve never trusted the output of these tools regarding human-computer interaction.

The paradox is that we start to believe that thinking machines can do everything, which they cannot, and in return, we begin to feel that we cannot do the things it does, or worse, we do not want to do those things anymore.

We ask it for the mysteries of the universe, yet when it struggles to give the correct Excel formula to split the date in time into two separate columns, we get frustrated by its artificiality, like it broke a promise. This bizarre contradiction in interaction is deeply fascinating to me and an all too familiar sight from the observer’s seat in the usability lab.

A man using a stepstool to reach his stove even when he can clearly access it without one. — Figure 01 The AI-powered stepstool for 2025. Commentary on LLM over-reliance.

Designing for products before LLMs felt like walking down the streets of Delhi 6, where every new product was like a chance to experience new flavors, clothing, or hidden tombs of knowledge around every corner. Like old Delhi, the product landscape captured a unique aspect of our attempts to solve problems. At first, giving the impression of pure chaos to a casual observer, but when looked closer, a system would emerge. I felt delight in studying these new pockets of reality, synthesizing, and making something of my own for my clients. Cracking the usability code feels much like the quiet pride a therapist experiences when their client connects the dots on their own. It is a subtle, gratifying moment where the solution feels organic, as if it had been there all along, waiting to be uncovered.

Busy street scene with pedestrians, rickshaws, and shopfronts in the narrow lanes near Jama Masjid, Old Delhi. — Figure 02 Streets near Jama Masjid, Delhi. Photo: Edmund Nigel Gall.

After natural language processing, like IBM’s Watson’s, we now have large language models, the next iteration of linguistic computation. LLMs draw the landscape of a gold rush era California in the mind’s eye. Specifically, that of 19th-century frontier Saloons and their Player Pianos, which would play music itself without a human operator. I can only imagine the novelty of this “thing” that looks like a piano, but no one is playing it. The mixture of fascination and trepidation first-time patrons must have felt. It must have seemed like an otherworldly and excellent business opportunity to attract customers. The current iteration of language models invoked the same mixture of thoughts and emotions in my group chats. People predicted the potential of limitless wealth while simultaneously, others spoke with fear about the inevitable doom of humanity. I was left curious about how things would unfold; much like the Player Piano, which has now evolved into our Music Playlists, we have had time to sit with LLMs and allow ourselves to play with them to figure out how to use them effectively.

Weathered wooden storefronts (saloon, bank, bath house, and livery stable) lining Mane Street in Pioneertown, California. — Figure 03 Pioneertown, California (1946 movie set). Photo: Matthew Field.

This swing in human emotion is also not unfounded or without merit. Every piece of technology has had people pontificate with either wide-eyed optimism or doomsday predictions with a pessimistic outlook toward humanity’s future. For example, Instagram is outstanding with its ability to help you grow your business or share art with the world so you, as an artist, can support yourself (optimistic bucket.) On the other hand, teaching our grandmother how to change her Instagram password is like pulling teeth with electrified pliers (not so subtle, pessimistic bucket.) Let’s be clear, the problem isn’t with our grandma. Password resets are still a horrible maze of confusion, with its resetting, verifying OTPs, and multi-factor authentication. As the user, she shouldn’t be burdened with this level of challenge.

You see, these kinds of lived experiences with our digital products provide evidence for the spread of our collective hope or despair. Straddling the divide between the inanimate and the human, we, as HF/HCI Engineers, will play a part in deciding where the pendulum stops. The problem isn’t that people oscillate between hope or despair; it is whether we, as its creators, investors, and even objectors of dark design, will have the foresight to make the pendulum stop in the right place. We also don’t need to speculate about the urgency of this moment either; we only need to look at our recent history for clear examples, whether it’s the Bhopal Gas Tragedy (BGT) or The Three Mile Island (TMI) incident. In the TMI incident, there was a lack of human-factor oversight to ensure we designed critical systems and technology with humans in mind, which then created profound changes in energy policy for the country. In the case of BGT, non-compliance with established safety engineering procedures led to one of the worst, if not THE worst, industrial catastrophes whose repercussions are felt to this date.

Apprentice to the Nuclear Scientist

For the next few moments, just imagine I am a fly-on-the-wall in a room filled with scientists and engineers as they work on the next generation of nuclear reactors. The room has the palpable silence of focus and brain sweats. The math is starting to look like hieroglyphics (or maybe that’s just me, the fly.) You would feel the weight of the task: designing something not just powerful but something that must be profoundly safe. The significance of this task cannot be overstated, as it is an improvement from earlier attempts in the past and has the potential to shape the future.

The Three Mile Island nuclear reactor near Harrisburg, Pennsylvania, left an impression on my field of HFE. On March 28, 1979, Unit 2 of the reactor had a partial meltdown. TMI’s story is easily accessible to refresh our modern memory and how the incident changed the course of energy production in the USA. We now know from having several investigative agencies do a post-mortem of the incident what a mistake it was to overlook Human Factors. For future architects and scientists, publicly accessible guidelines were developed after the TMI investigation to standardize control panel and interface design. The Nuclear Regulatory Commission explicitly created these documents to specifying critical aspects of human-system interaction. Informing future generations how to design the control panel, prioritize alarms, and layout interfacing elements. Since its release, they updated those documents as modern technology improved operating systems and complex graphical user interfaces arrived.

Black-and-white photo of operators crowded around banks of analog dials and indicator panels in the Three Mile Island Unit 2 control room during the 1979 crisis. — Figure 04 TMI Unit 2 control room during the 1979 crisis. NRC via Flickr.

The aftermath of TMI significantly changed how we engineer for safety in the United States. From cognitive ergonomics to macro-ergonomics, every aspect of human interaction with systems was reevaluated. Looking back at the news stories of that time, it’s clear that there was a fundamental erosion of trust in nuclear energy. Even though during or after the TMI incident, the residents of Harrisburg didn’t have to experience loss of life (thankfully), TMI did create an energy debate, and the words “Never Again” echoed. One side declared it publicly, with passion and heart against nuclear power, while the other whispered it in academic corridors, quietly shouldering the responsibility for a better path forward.

It is humbling to think about how much humanity learned from that moment. Now, we find ourselves in the moment of a similar dive; we collectively paste our thoughts, our businesses, and our Google drives into Gen-”AI”. It is not just technology but a trillion-dollar experiment reshaping how we do things. The potential is breathtaking, but so is the risk, and if it turns out to be uneventful, I have plans to have high tea with ChatGPT. However, there is merit to a potential catastrophe, and we continue to commit to integrating this tool into places and products where human lives are at stake. In that case, we must research and design its guardrails now (preferably last year.)

Just like those nuclear engineers and scientists, we are designing something that could (potentially) change the course of several industries, for better or worse. And if you think it is hyperbole, consider this: the same system generating your emails and helping you brainstorm last-minute dinner menu ideas, in theory, craft medical diagnoses and legal arguments, and contribute to policy writing. We know from experience how much back-and-forth it takes to get that darn dinner menu right because we forgot to put the “prompt” for the stuff we do not have in the fridge. It implies that we need to go out and buy. We cannot afford to miss a “prompt” in the medical diagnosis or policy impacting millions of lives. There is no room for a ” we will fix it later ” mentality.

I am not saying safety uniquely falls with only Human-factors engineers and human-computer interaction specialists. No, what this is, is a call for collaboration. While they are doing the math, we must design the metaphorical nuclear reactor interfaces, dashboards, and control panels to ensure we design for ourselves*.* So, if we continue to absorb more linguistic computation in our products, let us roll up our sleeves and get our usability Leatherman out. The stakes are high, but so is the opportunity, IF we get it right. And if nothing else, let us make sure the future flies on the wall. Look at what we are building today and see the right story.

Figure 05 Leatherman multi-tool, folded. Wikimedia / Gerifalte Del Sabana.

However, these models have an un-intuitiveness baked inside them, making them hard to design when you try to integrate them into an application. Some software engineers would have you believe that command-line interfaces (CLIs) are the correct mental model to interact with a computer, and they would be right for their particular use cases. However, GUIs are superior for the general audience by a mile. The conversational interface of the big LLMs has fundamentally recreated the command line problem. Through trial and error, the user must arrive at the correct syntax or memorize a non-intuitive “natural-language-adjacent” syntax to get the computer to do something they could have done with a few clicks with the GUI metaphor. Remember the prompt engineering cheat sheets that became popular a while back; they attempted to solve this standard syntax problem. Not to forget the heart of the problem, the Black-box nature of these tools, and why you cannot decipher why LLMs did what they did because the “logic” is opaque even to the ML Researchers making these tools.