On June 19, 2007, the Internet Corporation for Assigned Names and Numbers (ICANN) announced its .test plan of the Internationalized Domain Names (IDN) concept (www.icann.org/announcements/announcement-2-19jun07.htm). Those wishing to comment on or to respond to the proposed .test plan may do so online by July 31, 2007.
ICANN is reconsidering what some have called "non-English" generic top-level domain names (gTLDs) or domain suffixes. Instead of using Roman (not English) characters after the final "dot" in an Internet address, other characters could find their way to the right of that dot. To set the record straight, ICANN does not refer to "non-English" characters, for indeed, we are not talking about English "letters." ICANN has a more neutral term: Internationalized Domain Names (IDN). So, as a first order of business, let us stop referring to non-English characters and start calling them what they are: IDNs.
As early as September 2000, the ICANN governing board asserted "that it is important that the Internet evolve to be more accessible to those who do not use the ASCII character set" and that "the internationalization of the Internet’s domain name system must be accomplished through standards that are open, non-proprietary, and fully compatible with the Internet’s existing end-to-end model and that preserve globally unique naming in a universally resolvable public name space" (www.icann.org/topics/idn).
Incorporating non-Roman characters may prove to be a complex undertaking. The Rosetta Project (www.rosettaproject.org/about-us/about-us) has identified more than 2,500 different languages. Some of these languages are not written. But consider some of the major extant character sets: Arabic, Chinese, Cyrillic, Greek, Hebrew, Hindi, Korean, Japanese, Roman, Thai, and Vietnamese—in sum, the range of orthography living and dead. These comprise alphabetic or phonetic scripts where symbols represent (more or less) sounds, ideographic or logographic scripts where symbols represent concepts, and those systems that combine the two. Phonetically based scripts incorporate logograms. Consider these in common use: € ∞, and ©, not to mention national flags or these symbols:
In principle, there is no reason why symbols sets could not be intermixed. For example, from randomly entering Unicode symbols I propose the following TLD: ; that is to say Unicode’s FB38, 21AF, and 2345. It is equally possible to remain in the Roman character domain and add accent marks to the mix: What if .com were revised and ICANN supported .com, .còm, .cóm, .côm, .cõm, and .cöm?
There are software issues, as explained in the ICANN document. Domain names such as www.infotoday.com are surrogates for a specific number address, hence ICA Names and Numbers. Names and numbers servers convert alphanumeric domain names to specific numbers. Infotoday.com actually has the Internet address or IP Number 22.214.171.124. There may be more than one alphanumeric domain name associated with a single IP Number. ICANN is concerned that the name/number resolving software may be insufficiently robust at this time to accommodate the many possible symbol permutations for gTLDs that may be presented. Hence the .test.
I don’t doubt for a minute that name/number servers can or soon will be able to resolve most character sets. There are different implications for the Searcher set. Consider three examples from the ccTLD (country code) arena. Many ccTLDs employ an abbreviation at the second level (2TLD) to categorize the site. These sometimes follow gTLD nomenclature (the original "seven") and sometimes not. Thus, a government site in Mexico carries a .gob.mx "label": .gob for gobierno (government in Spanish) and .mx for Mexico. Academic sites in the United Kingdom are indicated as .ac.uk. Organizational sites in Japan carry the 2LD and ccTLD of .or.jp. At a minimum it is useful for a searcher to be aware of these conventions. Additional levels of complexity are introduced when different language abbreviations are used, like .gob. Far more complexity is introduced into the system if and when .or.jp, for example, is rendered in Japanese. As the number of approved scripts increases, so will the level of complexity.I don’t believe that the Internet should be restricted to Latin scripts or to some limited number of character strings. Every language system should have a place at the table. That said, the level of complexity inherent in domain names will make the searching profession ever more challenging. All in all, I think we will manage. My key worry is how to key in what are certain to be difficult email addresses.