https://www.sw.it.aoyama.ac.jp/2019/pub/RubyKaigiLight/
duerst@it.aoyama.ac.jp, Aoyama Gakuin University
© 2019 Martin J. Dürst, Aoyama Gakuin University
Each new version of Ruby is based on the latest version of Unicode. This lightning talk looks at the updates from Unicode 10.0.0 to Unicode 12.1.0. Unicode 12.1.0 is a special version that adds a single composed character for the new Japanese era Reiwa that will begin on May 1st. Unicode 12.0.0 added several scripts and characters, but didn't need any updates to Ruby's internals. Unicode 11.0.0 on the other hand required some change in Ruby internals which lead to the discovery of a bug affecting, among else, the zombie emoji.
These slides have been created in HMTL, for projection with Opera (≤12.17 Windows/Mac/Linux; use F11 to switch to projection mode). The slides have been optimized for a screen size of 1920x1080pixels, but can easily be viewed on other screens, too. Texts in gray, like this one, are comments/notes which do not appear on the slides. Please note that depending on the browser and OS you use, some rare characters or special character combinations may not display as intended, but e.g. as empty boxes, question marks, or apart rather than composed.
RbConfig::CONFIG["UNICODE_VERSION"]
⇒ '12.1.0'
Who cares?
どうでもいいのではないか?
New Scripts and Characters
New Scripts and Characters
New Emoji
New Scripts and Characters
New Emoji
Reiwa (令和) support
Year (Y) | Unicode Version (U) | Ruby Version (R) |
published in Spring/Summer | published around Christmas | |
2014 | 7.0.0 | 2.2 |
2015 | 8.0.0 | 2.3 |
2016 | 9.0.0 | 2.4 |
2017 | 10.0.0 | 2.5 |
2018 | 11.0.0 | 2.6 |
2019 | 12.0.0 | 2.7 |
U = Y - 2007 = 10R-15 R = (Y-1992) · 0.1 = 0.1U + 1.5
Faster!!!
Supported Unicode version | Ruby version |
---|---|
11.0.0: June 5, 2018 | 2.6.0: December 25, 2018 |
12.0.0: March 5, 2019 | 2.6.2: March 13, 2019 |
12.1.0: May 7, 2019 | 2.6.3: April 17, 2019 |
Supported Unicode version | Ruby version | Time to Publication |
---|---|---|
11.0.0: June 5, 2018 | 2.6.0: December 25, 2018 | 200 days |
12.0.0: March 5, 2019 | 2.6.2: March 13, 2019 | 8 days |
12.1.0: May 7, 2019 | 2.6.3: April 17, 2019 | - 21 days |
Supported Unicode version | Ruby version | Time to Publication |
---|---|---|
11.0.0: June 5, 2018 | 2.6.0: December 25, 2018 | 200 days |
12.0.0: March 5, 2019 | 2.6.2: March 13, 2019 | 8 days |
12.1.0: May 7, 2019 | 2.6.3: April 17, 2019 | - 21 days |
Unicode 12.1.0 is still in beta
This is what an easy (I'd wish to say typical) upgrade looks like:
common.mk
make up; …
make check
/\x/
/\x/
/\x/
Example: Flag of Wales
flag_of_Wales = "\u{1F3F4 E0067 E0062 E0077 E006C E0073
E007F}"
flag_of_Wales.length
⇒ 7
(characters)
flag_of_Wales.bytes.length
⇒ 28
(bytes)
flag_of_Wales.grapheme_clusters.length
⇒ 1
(extended grapheme clusters)
"A#{flag_of_Wales}Z".match? /A\xZ/
⇒ true
\x
Implemented?node_extended_grapheme_cluster
in
regparse.c
\x
Rewrite it!
700 lines, 1 function
⇒ 300 lines, 5 functions
Very Different Code Style
No way to reuse node tree?
No way to convert subexpression to node tree?
Test First!
enc/unicode/data/emoji/11.0.0/emoji-sequences.txt
,… (4
total)ruby test/runner.rb test/ruby/enc/test_emoji_breaks.rb
4 tests, 81777 assertions, 0 failures, 0 errors, 0 skips
Unicode says that zombies canNOT have skin colors
Zombie Skin Color
Test on phony beta:
U+32FF SQUARE ERA NAME SAYUU
(左右, left-right)
From: Martin_J. Dürst <duerst@it.aoyama.ac.jp>
Subject: New Japanese Era Name
Date: Mon, 1 Apr 2019 11:43:24 +0900
The new Japanese era name has been announced a few minutes ago:
令和 U+4EE4 U+548C
; Reading:
REIWA
This is the information that's needed for Unicode 12.1.0. Does anybody
know what's the schedule for the official approval of this new
version?
Answer: Because they are in a meeting the week before :-)
From: Ken Whistler <xxxxxx@yyyyy.zzz>
Date: Sun, 31 Mar 2019 21:51:03 -0700
If all you need is just the code point, name, and decomposition, then by
all means, start prepping your updates as soon as you can.
puts
debuggerputs
commit (commit puts
?)
debuggerputs
: r67445,
more puts
: r67447,
removed: r67451)In 2.6.2:
"\u32FF".match? /\p{age=12.1}/
⇒
Systax Error: invalid character property
name
In 2.6.3:
"\u32FF".unicode_normalize :nfkc
⇒ "令和"
"\u32FF".match? /\p{age=12.1}/
⇒
true
"\u32FF".unicode_normalize :nfkc
⇒ "令和"
"\u32FF".match? /\p{age=12.1}/
⇒
true
Enjoy!