Configuring the Word Breaker

Word Breaking is the breaking down of text into individual text tokens or words. Many languages, especially those with Roman alphabets, have an array of word separators (such as white space) and punctuation used to distinguish words, phrases, and sentences. Word breakers must rely on accurate language heuristics to provide reliable and accurate results.

Word breaking is more complex for character-based systems of writing or script-based alphabets, where the meaning of individual characters is determined from context. A Word Breaker is vital for the proper indexing of most of the Asian languages (for example Japanese, Chinese, and Arabic) and other languages.

Setting Up the Language Analyzer

1. Select Configuration tab and click Archive StoresA collection of email sources, email metadata and search indexes within GFI Archiver.

2. Select Index Management.

3. Configure one of the following language analyzing options:

Option Description
Enable built-in word breaker (recommended) The GFI Archiver language analyzer is enabled by default. It is highly recommended that for optimal indexing performance this is not disabled.
Enable Microsoft Windows word breaker

Select this option to disable the GFI Archiver built-in word breaker and use the word breaker of your Windows operating system. Use the Default Language drop down list to specify the language to be used to index archived data.

NOTE

If the required language is not listed in the Default Language drop down list, add the required language from the Regional settings option within the Windows® control panel.

Alternatively, check the Enable automatic language detection check box to let Windows detect the language automatically.