String-to-Integer vs Unicode additional digit groups table
Important note: I've created this table out of curiosity so it might be a little chaotic. I tried to not make any errors but I cannot guarantee that there are none. If you have any comments or have tested another programming language/library/digit group/etc and would like to share the results, feel free to either e-mail me or leave a comment under this post.
As for test cases, most of the scripts/programs I've written for the purpose of making this table used one of these files as input:
* test_case_string.txt - format: <HEX CODE> <CHARACTER NAME>
* test_case_utf8.txt - format: <UTF-8 ENCODED CHARACTER> <CHARACTER NAME>
Please note that the above files don't containt any ROMAN digits (their character codes are U+2160 to U+216F and U+2170 to U+217F; e.g. U+216C is Ⅼ aka decimal 50).
There are some more things that could be checked, e.g.:
* Other programming languages (like Go or Objective C or Delphi), libraries and functions could be tested.
* Does this have any security implications (filter bypassing perhaps?). See also Unicode Security Considerations.
* Are non-decimal digits supported in some cases?
* Did I miss any digit groups?
By the way...
There are more blog posts you might like on my company's blog: https://hexarcana.ch/b/
Also, I would like to thank the following people for pointing me to various languages (some I tested, some I didn't): Roi Martin, Tomasz Dąbrowski, Maciej Tebecha, himn1, argasek, dfgg, meal and nathell.
Cheers,
Comments:
Agreed, it does look inconsistent.
Maybe some Perl expert could look into it?;)
That actually was a typo in the description - I've used 1.9.2 for tests (the default bundle for Windows).
Add a comment: