UI Prompt Controls for AI Products

My family and I took our biannual autumn trip to Yosemite National Park recently. I dusted off my camera and quickly got reacquainted with shooting in varying lighting conditions given how quickly the weather can change there.

I’m not ashamed to admit that I sometimes use the scene modes on my cameras – especially if it’s an important shot that I don’t want to mess up.

We hiked near the top of Yosemite Falls and I wanted to get a photo of my wife and daughter with the valley down below. I snapped a few and thought miniature mode would be pretty neato; the two figures sharply focused with the landscape blurred around them.

Miniature Mode example courtesy of DALL-E.

If you’re not familiar with miniature mode, it is basically a setting that makes real-life scenes look like tiny models or “miniatures.” It uses selective focus to blur portions of the image to create the illusion that only part of the scene is in focus, while the rest appears soft or blurred. In other words, like what happens in close-up photos of miniature models.

This narrow depth of field if you will, means the viewer largely perceives only a relatively thin strip as being in focus.

The technique got me thinking about some of the interaction design these days – especially when it comes to AI product interfaces and their corresponding prompt controls. (As an avid ChatGPT user, I think this was also partially triggered with the release of Open AI’s Canvas, it’s first UI update in 2 years.)

Prompt controls are essentially the UI components in and around the chat input field of conversational interfaces. These include everything from helpful hints to guide users and aid discoverability, to the affordances and interaction elements that actually allow users to do things (e.g., buttons, cards, chips, sliders, etc.)

ChatGPT 4o includes several options that both educate users about what’s possible, as well as providing them a kind of shortcut for a specific task (e.g., Create image).

It makes sense to add functionality to this area of screen real estate. That’s where user attention is.

However, these design choices are also implicitly stating that these things are important; either to the user or the business. Ideally, both.

DALL-E surfaces multiple styles to spark creativity. These refreshed and offer alternatives upon each visit.

Although input fields such as text boxes have been around since the dawn of GUI-based computing, Google really made the humble search box the star of the show. Their business model, speed of retrieval, and emphasis on ease-of-use really changed the game for quickly accessing information in an online world.

Of course, other text-centric and asynchronous communication products have also continued to augment search bar real estate with UI controls. Think Slack, for example.

All in all, the modest little field — with its old-school, rectangular shape — has matured to become much mightier — with even a cute makeover consisting of rounded corners! 🤭

Both Google Search and Google Gemini have added Search by voice and Search by image into the field itself.

Much of this design rationale has to do with inline and contextual information benefitting from proximity and similarity. For humans, elements that are visually connected are perceived as more related. That’s just how perception and cognition works.

This approach is also educational – as in, it teaches users what’s possible with the product offering – and it improves discoverability for specific tasks and workflows, like streamlining the first step for creating images or analyzing data 📊.

This is especially important because interfacing with AI is still a new experience for the vast majority of people. Users still tend to rely on their mental models of interacting with search engines.

After all, who can blame them, given that it’s been almost 30 years of typing key words in to see a list of results — results that then take you out of the search engine experience. Whereas now, users need to edit and refine their prompts conversationally.

It’s a much different interaction model.

So now, with conversational chat interfaces quickly becoming the predominant means of interacting with AI, continued restraint will be key. It’s certainly easy to continue to add things to the screen; much harder to simplify and remove stuff. (Just think of all the fiefdoms within companies that want their thing front and center on home pages, leading to fractured user experiences!)

As indicated above, this comes down to prioritizing user and business value — and how that manifests in the interface.

Although we’ve focused largely on the text box here, it’s really only one part of the AI product experience paradigm that’s quickly emerging.

Perhaps next, we’ll widen the aperture 📸 a bit and look at how all the other major interface areas are seemingly coming together to create early design patterns (e.g., context sidebars, settings panels) across the AI product landscape.

Marc