Languages |
![]() ![]() ![]() |
Zoom provides options to cater to searching sites of different languages and encodings. For additional information, please see “International / foreign language support”. Encoding and character sets This will be the encoding and character set (charset) of the resultant search page and index files. Zoom will scan files and convert the content from the character set specified on the web pages to the encoding selected here. Files which do not have a charset specified will be assumed to be the character set specified here. Change your settings regarding foreign language support and character encoding. Firstly, if your website uses Unicode UTF-8, you must enable the “Use Unicode” option. Otherwise, specify the encoding (also known as charset) used. “windows-1252” is the most common option for English, French, German, and a number of other Latin based languages.
International searching options Enable accent/diacritic/ligature insensitivity: This will map all occurrences of accented characters to their non-accented equivalent (eg. ó, ò, ô, etc. will all be treated as “o”). With this enabled, a user can enter the search word “cliché” and it will find all occurrences of the word on your website spelt as either “cliché” or “cliche". You can now specifically enable or disable this feature for accents (ó, ò, ô, õ, etc.), umlauts (ä, ë, ï, ö, ü), and ligatures (å, ø, æ).
Use digraphs for umlauts: When this option is enabled together with umlaut insensitivity, characters like "ö" will be considered the same as "oe" as opposed to "o". Similarly, "ä"="ae", "ü"="ue", etc. Support single-case languages (eg. asian languages): This should only be used if you are using a language where there is no case-difference and problems can occur when the script or indexer attempts to convert case (such as some East Asian languages). Substring matches for all searches: This will mean the script will consider search words that occurs within another word to be considered a match. (Eg. a search for the word “hot” will match “hotcake”, “shotgun”, etc.) This may be useful for East Asian languages where words are not distinguished by spaces and you always want to search for a single character within a set of words. Strip Arabic diacritic marks in words: This option will strip diacritical marks from Arabic words which are typically never searched for, and is only used to represent accurate pronunciation of the word. Stemming This option is not available in the JavaScript version. When this feature is enabled, search results will match similar words or words which are derivatives of each other (e.g. plurals). For example, searching for the word "fish" will return pages containing the words "fishes", "fishing", etc. The CGI version features an improved stemming feature which can be configured for languages other than English. PHP or ASP only supports English stemming. Stemming is not available when single-case matching (i.e. "Support single-case languages") is enabled.
Search page language You can modify the text that appears on the search page and search results, by customizing the Zoom Language Files (.ZLANG files). Almost every bit of text on the search page can be modified or translated, including “Search results for…” and “x results found”, etc. This allows you to translate the search page to the language of your choice, without having to modify the search script. Zoom also comes with a few pre-translated language files which both serve as examples, and allow you to create French or German search pages straight out of the box, by selecting it from the drop-down menu. For more information on how to create your own translations, please refer to “Translating the search page”. |