gettext: Aspects

 
 1.3 Aspects in Native Language Support
 ======================================
 
    For a totally multi-lingual distribution, there are many things to
 translate beyond output messages.
 
    • As of today, GNU ‘gettext’ offers a complete toolset for
      translating messages output by C programs.  Perl scripts and shell
      scripts will also need to be translated.  Even if there are today
      some hooks by which this can be done, these hooks are not
      integrated as well as they should be.
 
    • Some programs, like ‘autoconf’ or ‘bison’, are able to produce
      other programs (or scripts).  Even if the generating programs
      themselves are internationalized, the generated programs they
      produce may need internationalization on their own, and this
      indirect internationalization could be automated right from the
      generating program.  In fact, quite usually, generating and
      generated programs could be internationalized independently, as the
      effort needed is fairly orthogonal.
 
    • A few programs include textual tables which might need translation
      themselves, independently of the strings contained in the program
      itself.  For example, RFC 1345 gives an English description for
      each character which the ‘recode’ program is able to reconstruct at
      execution.  Since these descriptions are extracted from the RFC by
      mechanical means, translating them properly would require a prior
      translation of the RFC itself.
 
    • Almost all programs accept options, which are often worded out so
      to be descriptive for the English readers; one might want to
      consider offering translated versions for program options as well.
 
    • Many programs read, interpret, compile, or are somewhat driven by
      input files which are texts containing keywords, identifiers, or
      replies which are inherently translatable.  For example, one may
      want ‘gcc’ to allow diacriticized characters in identifiers or use
      translated keywords; ‘rm -i’ might accept something else than ‘y’
      or ‘n’ for replies, etc.  Even if the program will eventually make
      most of its output in the foreign languages, one has to decide
      whether the input syntax, option values, etc., are to be localized
      or not.
 
    • The manual accompanying a package, as well as all documentation
      files in the distribution, could surely be translated, too.
      Translating a manual, with the intent of later keeping up with
      updates, is a major undertaking in itself, generally.
 
    As we already stressed, translation is only one aspect of locales.
 Other internationalization aspects are system services and are handled
 in GNU ‘libc’.  There are many attributes that are needed to define a
 country’s cultural conventions.  These attributes include beside the
 country’s native language, the formatting of the date and time, the
 representation of numbers, the symbols for currency, etc.  These local
 “rules” are termed the country’s locale.  The locale represents the
 knowledge needed to support the country’s native attributes.
 
    There are a few major areas which may vary between countries and
 hence, define what a locale must describe.  The following list helps
 putting multi-lingual messages into the proper context of other tasks
 related to locales.  See the GNU ‘libc’ manual for details.
 
 _Characters and Codesets_
 
      The codeset most commonly used through out the USA and most English
      speaking parts of the world is the ASCII codeset.  However, there
      are many characters needed by various locales that are not found
      within this codeset.  The 8-bit ISO 8859-1 code set has most of the
      special characters needed to handle the major European languages.
      However, in many cases, choosing ISO 8859-1 is nevertheless not
      adequate: it doesn’t even handle the major European currency.
      Hence each locale will need to specify which codeset they need to
      use and will need to have the appropriate character handling
      routines to cope with the codeset.
 
 _Currency_
 
      The symbols used vary from country to country as does the position
      used by the symbol.  Software needs to be able to transparently
      display currency figures in the native mode for each locale.
 
 _Dates_
 
      The format of date varies between locales.  For example, Christmas
      day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in
      Australia.  Other countries might use ISO 8601 dates, etc.
 
      Time of the day may be noted as HH:MM, HH.MM, or otherwise.  Some
      locales require time to be specified in 24-hour mode rather than as
      AM or PM. Further, the nature and yearly extent of the Daylight
      Saving correction vary widely between countries.
 
 _Numbers_
 
      Numbers can be represented differently in different locales.  For
      example, the following numbers are all written correctly for their
      respective locales:
 
           12,345.67       English
           12.345,67       German
            12345,67       French
           1,2345.67       Asia
 
      Some programs could go further and use different unit systems, like
      English units or Metric units, or even take into account variants
      about how numbers are spelled in full.
 
 _Messages_
 
      The most obvious area is the language support within a locale.
      This is where GNU ‘gettext’ provides the means for developers and
      users to easily change the language that the software uses to
      communicate to the user.
 
    These areas of cultural conventions are called _locale categories_.
 It is an unfortunate term; _locale aspects_ or _locale feature
 categories_ would be a better term, because each “locale category”
 describes an area or task that requires localization.  The concrete data
 that describes the cultural conventions for such an area and for a
 particular culture is also called a _locale category_.  In this sense, a
 locale is composed of several locale categories: the locale category
 describing the codeset, the locale category describing the formatting of
 numbers, the locale category containing the translated messages, and so
 on.
 
    Components of locale outside of message handling are standardized in
 the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
 specification).  GNU ‘libc’ fully implements this, and most other modern
 systems provide a more or less reasonable support for at least some of
 the missing components.