Internationalization - A growing pain for Internet protocol design

International characters are becoming a growing pain for Internet protocol design.
Many protocols just deal with 7-bit ASCII on the basic protocol level while there is an increasing demand for information expression in local languages at the application layer. Various protocols like Internet mail and DNS has addressed this issue by defining conventions to carry international characters over 7-bit ASCII. The problem is rather straightforward as long as the task is limited to presentation of data in a local language context, but grows to a very hard problem when the task is expanded to comparison of canonicalized strings from different sources. The problem is even harder if consistency between visual matching and matching of encoded character strings is required.

The technical plenary at the 76th IETF in Hiroshima (November 8-13 2009) recently focused in on this particular problem.
(Presentation slides from the plenary on Internationalization)

The following was stated by the Internet Architecture Board (IAB) on in connection with the plenary session:


IETF 76 Technical Plenary, 1630-1930 Thursday 12th November 2009.

Internationalization in
Names and Other Identifiers


This technical plenary can be seen as following on from the plenary at
IETF68 on March 2007 in Prague on "Internationalization and Internet
Engineering" and builds on an old realization that smooth and interoperable
functioning
 of the Internet depends on text strings
 being
interpreted in the same way 
by all systems connected to it. RFC 2277
(IETF Policy on Character Sets and Languages, January 1998) and RFC 5198
(Unicode Format for Network Interchange, March
2008) discuss the use of international text in Internet protocols.

Internationalization of protocol elements is ongoing work within the IETF
(e.g. IDNABIS WG, EAI WG, IRI BOF) and deployment of these technologies is
ongoing. For example, there has been much publicity and excitement around
the announcement that ICANN plans to start issuing international text
country-code top level domains soon.

The issues and complexities of handling international text affect many of
our protocols, most Internet users, and many IETF Working Groups. However,
the issues are complicated and there is not always a shared understanding
even among experts in the field.

This IETF Technical Plenary presentation will explore examples of the kinds
of things that can go wrong with internationalized text, and examples of
cases where even when things go "right", the result may still not be what
the average human might expect or want. The IAB is currently working on
draft-iab-idn-encoding, with the goal of describing some of the important
issues and problems, and giving guidance for future protocol design to
reduce such problems.

This plenary is a working session of the IAB where we want to share our
experiences and learn from yours.


For the IAB,
Olaf Kolkman, Chair.