Search auto-suggest — the drop-down menu of suggestions displayed while typing into a search box — is the first and often the only search interaction that users encounter on a site or intranet. When done correctly, it is nearly invisible, offering both popular and rare search terms, and showing the vocabulary and scope of relevant searches. Web search engines such as Google have millions of queries and decades of search relevance statistics to generate these lists, but smaller sites can tune the suggestions and offer great value to users. While user activity can dynamically add new suggestion terms and re-rank depending on current value, this must be balanced with an ongoing evaluation and curation process to avoid suggesting inappropriate or misleading terms.
Note: auto-suggest functionality is different from interactive typing autocomplete, type ahead or word completion, because there are very likely to be multiple possible suggestions, and the result of a selection is not a simple substitution but an action to send a search or go to a specific content page.
Technically, auto-suggest is a simplified text search, with an index, retrieval, results ranking and presentation. However each of these is quite different from traditional document or product search.
Auto-suggest search index
The internal representation of auto-suggest index terms comes in three main elements: Searchable text, Block lists, and Display text.
The searchable text is that matched when the user types and the front end sends the letter(s) to the auto-suggest backend. This is usually created from frequent successful user queries. In addition, add lists contain positive terms such as product names, brands, document titles and new vocabulary. Some of these, such as those dealing with shipping or returns in a commerce site or a travel portal in an intranet, may redirect to specific URLs instead of sending a query to the search engine.
Block lists allow additional control, avoiding suggesting obsolete terms, unreleased products, out-of-scope queries and known misspellings. By managing these lists, domain experts and merchandisers can keep the suggestions current and avoid inappropriate suggestions.
Note: it is extremely important not to mislead users by suggesting terms that have no matches in the search results. This generally happens when a user can only see part of a search index, for example a department or a specific section within a larger system, such as a particular city library holdings within a provider’s ebook corpus. Adding a flag as part of the auto-suggest searchable query index, and sending this flag as part of the retrieval process avoids this disappointing user experience.
This is even more important when terms related to sensitive topics such as unreleased products, policy changes, or employment should not be displayed unless the user has access permissions. Filtering on the user access permissions allows appropriate retrieval of this material, while avoiding the possibility of other users combining terms to identify topics that are not public, even if they cannot see the documents or records in search results.
The display text includes the correct capitalisation, singular/plural format, diacritics and punctuation, according to the requirements of the site. For example, a certain sporting event should be shown as the Super Bowl, even if a user types superbowl, and the French “thé” (tea) should not be shown as “the”. This may also include variations on a term such as plum (fruit) and plum (colour), or a specific location in a geospatial search.
Search URL and optional query parameters
The query text is the words from the suggestion to pass along to the full-text search engine. In most cases, this should send a phrase match setting for multi-word queries. This could also specify a field, permissions group or other filter, based on the display text.
For known items, such as brand names, categories or departments, the suggestion may simply redirect to the specific page.
Matching and retrieval
Suggestion query matching is unlike full-text search in that it generally performs a beginning (left) match on the searchable text, instead of requiring a full token (word) match. For example, a search for plu may match plum or pluot.
The most valuable suggestions are those with exact matches and matches at the beginning of the phrase. When there aren’t any of those available, falling back to matching on the beginning of another word is still quite useful. For English-language suggestions, a partial match in the middle of a word is difficult to understand, however for many German words, it would be clear and helpful.
Example: Beginning (left) matches
- apple sorbet
- apple-celery granita
Fallback token left matches
- green apple ice
- custard apple frozen yogurt
True substring match may be necessary in some cases:
- Orangensaft (orange juice)
As described above, if there is a filter to what a user can see in search results, it’s vital to include the same filter as a flag during suggestion retrieval.
The order of suggestion results is not based on relevance, but on an internal value order. This can be precalculated using a set of algorithms based on search frequency and success metrics such as clicks and conversions, preferably calculated using ML features. There should also be boost factors for novelty, and decay factors based on the recent frequency and interactions. Some sites, such as food or decoration, may require higher decay factors to reduce the rank of outdated suggestions, such as santa claus in February.
In the frequent cases where there are ties on ranking values, subsorting the ranks alphabetically makes it easy for users to skim the displayed results.
Note that inappropriate suggestions, even innocuous ones, may be intriguing to users and cause them to select from the menu to see what happens. For example, there may be no ice cream called bandana, but if customers type the term often enough, it may rise to appear in the suggestions. Once suggested, customers may want to know what it means, and select it, creating a problem feedback loop. To avoid this, the ranking algorithm should track the searches and subsequent conversions very closely, and alert domain experts about anomalies for possible addition to the searchable index or to be added to the block list.
Once the list is generated in order, it’s a simple matter to retrieve the first ten (or so) left-matching suggestions. Even on web search and very large commerce sites, the number of choices generally limits itself quickly.
|banana nut bread|
|apples and bananas|
As explained earlier, if there aren’t enough matches on the left, re-search allowing left matches on other tokens, and rank those after the left matches.
The returned content should be the “display text” versions of the terms, with optional query parameters to send to the full-text search engine. The search engine may redirect these to a particular URL.
In some cases, there will simply be no matches for the letters typed. This can be a spelling problem, so as a fallback, use the search engine spellchecker, and if a very good match is found, display it as the only option. Users will generally select that correct spelling without confusion, as it is what they meant to type in the first place.
It may be a scope problem, for example, there are no relevant items on an iced dessert site for queries such as iphone or n95. This is a case where showing no suggestions gives customers negative but useful feedback.
High traffic zero matches terms should be marked as anomalies and domain experts can add them as internal searchable text, or block lists.
Suggestion menu user interface
In the display of suggestions, the most common user interface bolds the text that matches what users have already typed, and A/B testing tends to support that. However, some sites, such as Yelp, successfully present the suggestions with the term that matches in regular weight, and the remaining text bolded.
Images in the suggestions menu should be limited to controlled suggestions such as brand, type or very specific products. Even in ecommerce sites, it’s difficult to create and curate small images that are sufficiently distinct to be recognisably different in this situation.
Auto-suggest evaluation: an ongoing process
To measure and evaluate auto-suggest usage, the system should log a flag recording whether the menu was displayed, and if one of the entries was selected, and at what position. This can be correlated with ongoing results of quality testing. For A/B testing, the logging should also indicate which test setting was used. These results can also be compared to non-suggestion queries and (where available) SEO query metrics to identify whether the auto-suggestions are performing well.
This chapter was originally published in Search Insights 2021. Download the report for free here: