IRIs and IDNs: Testing, Implementations, and Specification Evolvement

IUC 31, San Jose, 2007

Martin J. DÜRST

duerst@it.aoyama.ac.jp

Aoyama Gakuin University

AGU

© 2007 Martin J. Dürst, Aoyama Gakuin University

Overview

Paper and Slides available online

Background

Internet/Web internationalization in waves:

What are IRIs and IDNs?

IRIs: Internationalized Resource Identifiers, internationalization of URIs (/URLs)

IDNs: Internationalized Domain Names, internationalization of domain names

Internationalization here means:

Why IRIs and IDNs?

Native script is easier to:

Because of higher familitarity and no need for transcription

[from a talk of mine 10 years ago]

URI/IRI structure

scheme:hierarchical-part?query#fragment-identifier

Example an actual URI containing all four parts:

http://www.w3.org/2005/11/Translations/Query?titleMatch=HTML&lang=fr#xhtml1-2

hierarchical-part often includes a domain name (www.w3.org in the above example)

Encoding of IRIs

When:

How:

Examples: Dürst → D%C3%BCrst, 渋谷駅 → %E6%B8%8B%E8%B0%B7%E9%A7%85

Encoding of IDNs

When:

How:

Examples: 渋谷駅.jp → xn--i5wq75dpjj.jp, www.résumé.jp → www.xn--rsum-bpad.org

Fallbacks

Foreign script not usable due to:

Solutions:

Testing

Why Testing?

Why testing first: Test-driven development

Charmod Testing Requirements

Axes listed in W3C Character Model 1.0: Resource Identifiers:

  1. IRIs in several document formats (HTML, CSS, SVG, Atom,...)
  2. IRIs in several locations in the same document format
  3. non-ASCII characters in different parts of an IRI (e.g. domain name part, path part)
  4. IRIs in documents with various widely used character encodings and with characters from various scripts
  5. Document-specific escapes in IRIs
  6. IRIs in various URI schemes
  7. Setup of various servers for IRIs
  8. Translation of IRIs into URIs (needed for all the above)

Over the years, IRI tests for various purposes have been created and made available at various locations. An overview is given at http://www.w3.org/International/iri-edit/testing.html; if some tests are not listed there, please inform the author.

Testing Framework

Framework Idea and History

Test Types

Modalities

Abstraction conveniently combining:

Human-oriented vs. Machine-oriented Tests

Demo!

Version 0.10 of tests published today:

http://www.sw.it.aoyama.ac.jp/2005/iritest/

Next Steps

Can you ever have enough tests?

Other Tests

Overview page with pointers at http://www.w3.org/International/iri-edit/testing.html

If you know about some test that is not linked, please tell me!

More tests needed for other aspects than resolution:

Browser Implementations

Coverage is reasonably good:

(mostly checked on Windows)

Other Implementations

Implementing IRIs/IDNs in cURL

Specification Update

The IETF Standards Track

IETF: Internet Engineering Task Force

Three standards levels:

RFC: Request for Comments (also: Experimental, Informational, Historical, Obsolete)

For implementers, Proposed Standard is good enough, even just an RFC is fine

Very few things make it to full Standard (currently 67, in: URIs, out: SMTP)

Current Issues in draft-duerst-iri-bis

Issues list at http://www.w3.org/International/iri-edit

Mailing list is public-iri@w3.org (archives at http://lists.w3.org/Archives/Public/public-iri/)

Open Issues

Email Address Internationalization

Top-Level Domain Names

Internationalization of URI Schemes

Conclusions & Outlook

Q & A