The following is one (of many) lessons I learned from being a Product Manager on Amazon Alexa and Google Home voice projects at TribalScale. The voice space is relatively new, and I’ve learned a lot about the platform after being in the trenches with design and engineering over the past year. With my clients, we’ve explored feasibility, limitations, and best practices for this emerging platform.
I will share more of my insights in future blogs, but for now, here’s one key take-away surrounding lists:
Getting users to choose from a long list of items is straight forward when it comes to web or mobile platforms especially with dropdown-lists or pickers. However, designing for voice is a whole lot harder from a “delivery” and a “selection” perspective. Here’s why:
A general design best practice for voice is that your device’s responses to a user-ask should be brief and pass the “One-Breath Test”:
If you can say the words at a conversational pace and in one breath, the length of the response is usually good.
In the case where you have a substantial amount of selectable options (i.e. 6 or more), you can’t simply present them all up-front because it would take too long. Not only that, it might be hard for the user to remember all the options presented by the voice-device, especially if the options themselves are long and complex.
To address the response length issue, a potential method is to deliver the options in multiple sets, and ask the user “Would you like to hear more?” at the end of each set of options. This approach won’t be as effective if you had too many sets because users might have issues remembering the options between one set to another. A more extreme design method is to present one option per voice response, ask the user whether that option is the one they want or if they want to hear the next option. Continue until you’ve exhausted all selectable options.
I recommend limiting the options you present to your users to begin with by only giving them options that are top priority for your voice use-case. Don’t feel pressured to port the exact same experience or options from mobile / web to voice. Think of voice as a separate experience altogether. What’s effective on mobile may not necessarily be effective on voice platforms and vice versa. I’m a firm believer that existing paradigms on web/mobile doesn’t necessarily translate to the greatest experiences for voice directly, so feel free to explore better and newer options — get creative!
Another recommendation is to re-organize options into multiple sessions or flows where applicable.
For example: A user is looking for a recipe and has a choice of 20 options. The recipes are inherently part of different categories: Appetizers, Mains, and Desserts. Instead of asking the user “Which recipe do you want?” and proceed to list all 20 options or 2 sets of 10 options up-front in one continuous session, break down the options even more between sessions by using your categories. The delivery time might be longer, but it’s more user-friendly.
The other challenge with lists in voice apps is how a user is able to select options vocally. Part of this issue stems from certain limitations of voice platforms. Without diving too deep into the technicals, voice skills can only (to a certain extent) interpret context as much as you initially define in the voice models (utterances/entities/slots, etc). That means for the most accurate results, your selectable options should be predefined in your models.
However, this is not easy to do for custom skills that have dynamic data pulled from APIs. Take a news headline custom skill that pulls its article content from a 3rd party API as an example. The options being presented to the users are constantly changing and not definable up-front. A user can’t simply ask their device to pull up a specific article by the headline alone–the voice assistant won’t understand the context of the ask. It doesn’t know that the user is trying to pick an article from a list of options, and instead, those words might trigger another feature within the voice app which was not the original intention.
I suggest adding numeric or alphabetic values to your options, and instruct the user to select option “1, 2, 3” or “A, B, C” instead of the full option name. It’s easier to code in the logic on the custom skill-side to tie in the context: “If the user says 2, then give them the 2nd article that was presented to them in the API”. This way you can skirt around and include full (and bounded) context into your voice models.
Another potential solution if your option themselves are quite long is the “One option at a time” flow mentioned previously:
This method achieves the same end result, but the experience has better flow and is simpler for the user to understand.
When it comes to delivery, the goal is to find best balance between how much information the voice assistant delivers to the user, and how much time the delivery of information takes for your given use-case. Use the methods described above to break down the options into sets and/or sessions. For selection, be mindful of the platform limitations. Ensure users can effectively select the option they intended to make while ensuring the platform understood that selection. Good luck!
Leo Lee is a product manager at TribalScale equipped with an engineering background. He has over 5 years of experience working with fortune 500 companies, developing and building the strategy of their mobile, web and emerging tech products that are touched by millions of users. Quickly, he has become a voice technology expert on VUI design and development through his work on TribalScale’s Alexa and Google Action projects with major sports associations and media companies.