Internationalization of the Ruby Scripting Language

IUC 31, San Jose, 2007

Martin J. DÜRST

duerst@it.aoyama.ac.jp

Aoyama Gakuin University

AGU

© 2007 Martin J. Dürst, Aoyama Gakuin University

Outline

Assumptions

Talk assumes that you know

Slides are avaliable online

What is Ruby?

Ruby History

Ruby Highlights

Internationalization in Ruby 1.8

Bad news:

Good news:

Character Encodings Modes

Also available: [0x9752, 0x5C71].pack('U*')
produces the string '青山' (UTF-8 only)

Standard Libraries

(need to be required)

Other Libraries

Various things available, but scattered

Unicode in Ruby on Rails

Own Work

Charesc

require 'charesc'
print "#{U677Eu672Cu884Cu5F18}"
      # => 松本行弘
print U9752u5C71u5B66u9662u5927u5B66
      # => 青山学院大学
print "Martin J. D#{U00FC}rst"
      # => Martin J. Dürst

Langtag

require 'langtag'
tag = Langtag.new('de-Latn-ch')
tag.script             # => 'Latn'
tag.wellformed?        # => true
tag.region = 'at'
print tag              # => de-Latn-at

Multilingualization

M17N, term used often in Japan

Yukihiro Matsumoto and Masahiko Nawate, Multilingual Text Manipulation Method for Ruby Language, IPSJ Journal, Vol. 46, No. 11, Nov. 2005 (in Japanese).

Future internationalization architecture for Ruby:

UCS: Universal Code Set

CSI: Code Set Independent

In Defense of the CSI Approach

Main Problems of CSI

From Ruby 1.8.x to Ruby 1.9

Already Working in Ruby 1.9

Soon to Come: Character Escapes

print "\u677E\u672C\u884C\u5F18"
      # 松本行弘
print "Martin J. D\u00FCrst"
      # Martin J. Dürst
print "Martin J. D\u{FC}rst"
      # Martin J. Dürst

Current Discussions

Where We Need to Go

Transcoding Policies

Transcoding Policies: Where

Transcoding Policies: What

Acknowledgements

Conclusions

Further Information

An up-to-date version of this paper as well as slides used for the talk connected to this paper are available at http://www.sw.it.aoyama.ac.jp/2007/pub/IUC31-ruby.

Code for Demos

System.out.println('Hello World!');
print 'Hello World!'
5.times { print 'Hello World!' }

So it is possible to write "Hello".upcase (returning the string HELLO).

who = 'Unicode Conference'
print "Hello #{who}!"
[1, 2, 3.14159, "four", "five and a quarter"]
[0x9752, 0x5C71].pack('U*')
aoyama = '青山学院大学'

require 'jcode'
aoyama.length
aoyama.jlength

"abc".length

Blocks:
5.times { print "Hello World!" }
File.open(fn) do |file| ... end