Get Faster and More Relevant Search Results (Generally Available)

We’ve updated the search engine, bringing you faster, smarter search and more relevant results throughout Salesforce, including global search, sidebar search, and advanced search in the full Salesforce site, Salesforce1, and your custom search implementations that use the SOSL search API.
Available in: All Editions

In the Winter ’15 release, Salesforce Knowledge article search was updated with the new search infrastructure. In Spring ’15, we’re expanding this search infrastructure to all Salesforce search utilities. This expanded enhancement was previously available only through a pilot program.

These improvements and new search capabilities are available on a rolling basis after Spring ’15 is released, and before the Summer ’15 release is rolled out.

Faster indexing
It now takes less time for records to be searchable after they’re created or updated.
Improved tokenization
A key enhancement of the new search infrastructure is the change from bigram tokenization to morphological tokenization. When users search, the search engine returns results based on tokens in the search string that match tokens in the index. With improved tokenization, content is indexed more appropriately, resulting in fewer irrelevant matches in search results.
Morphological tokenization ensures that searches in East Asian languages such as Chinese, Japanese, Korean, and Thai (CJKT), which don’t include spaces between words, return accurate search results. Previously, when indexing a string of characters, the search engine applied bigram tokenization to segment the string into pairs of characters, known as bigrams.
For example, before Spring ’15, a search for 京都 (Kyoto) in Japanese incorrectly included 東京都 (Tokyo Prefecture) in the search results.
Using bigramming, 東京都 (Tokyo Prefecture) was tokenized as these bigrams.
東京

Tokyo

京都

Kyoto

With morphological tokenization, the same phrase is properly segmented into these tokens.
東京

Tokyo

Prefecture

In this context, both tokens are meaningful and correct, and 京都 (Kyoto) isn’t tokenized.

Now, a search for 京都 (Kyoto) returns only results that include 京都 (Kyoto) and not 東京都 (Tokyo Prefecture).

Important

Important

If you use CJKT languages, we recommend running business scenarios in your sandbox environment to ensure that your integrations that rely on SOSL continue to work as expected after the upgrade.

Limitation with Japanese language users querying records that are tokenized as Chinese
If a record contains at least 300 characters and contains Kanji only (no Katakana or Hiragana), the content is tokenized as Chinese. Therefore, a Japanese language user searching for this record doesn’t find it in search results. Kanji-only records with fewer than 300 characters are tokenized in Chinese and Japanese.
Improved alphanumeric search
Thanks to more efficient handling of punctuation, we’ve improved search results when you search for specialized strings such as URLs, email addresses, and phone numbers. Punctuation symbols—<>[]{}()!,.;:"'— at the beginning or end of a tokenized string are removed from indexed content and from users’ searches. Removing these characters makes it easier for the search engine to recognize when a user is searching for a phone number, as in this example: (415) is tokenized as 415.

Previously, if a user searched for きっと、来る in Japanese, the punctuation caused this matching string to be excluded from search results: きっと来る. Now, the same string results in a match.

Words that contain both letters and numbers are split into separate tokens. For example, web2lead is broken up into these tokens: web2lead, web2, web, 2, and lead. A search for web matches items that contain web2lead; however, a search for web2lead only returns results that include the full term, web2lead.

Previously, a search for web2lead returned matches with web, 2, and lead, even if those terms were in separate places within the item.

As another example, a record name that includes letters, numbers, and punctuation is broken up into several tokens.
Record Name Indexed Tokens
ABC-Record-XYZ1234 ABC-Record-XYZ1234

ABCRecordXYZ1234

ABC

Record

1234

XYZ1234

XYZ

A search for any of these indexed tokens returns the record ABC-Record-XYZ1234.
Further, when you search for an exact match, using either quotation marks (“”) or sidebar search, special characters are treated as part of the search term to help you find the record that you’re looking for. For example, if you search for 100!%, we match only 100!%. We don’t match items with 100%.

Before Spring ’15, if you searched for an exact match for 100!%, we matched items with 100 or 100%.

Improved validation of the AND NOT operator
In searches that don’t include a word before and after the AND NOT operator, “and” and “not” are included in the search term. For example, a search for AND NOT apples returns items with the word apples, while a search for oranges AND NOT apples doesn’t return items with the word apples.

For more information about searching on the new search infrastructure, see “How Search Works” in the Salesforce Help.