The interesting difference between ASCII and Unicode is that the first had only one group of digits defined (30h to 39h), and the latter defines 42 decimal digit groups (I think it actually defines more, but nvm). A common programming language operation is to convert a sequence of digit-characters (yes, a number) to a machine-understandable integer. Does any default in-language string-to-integer support Unicode digits? Does any is-digit function return true on Unicode digits? Well, I did some checking and created a table (programming language/version/library vs digit group) that addresses these questions.
String-to-Integer vs Unicode additional digit groups table
Important note: I've created this table out of curiosity so it might be a little chaotic. I tried to not make any errors but I cannot guarantee that there are none. If you have any comments or have tested another programming language/library/digit group/etc and would like to share the results, feel free to either e-mail me or leave a comment under this post.
As for test cases, most of the scripts/programs I've written for the purpose of making this table used one of these files as input:
* test_case_string.txt - format: <HEX CODE> <CHARACTER NAME>
* test_case_utf8.txt - format: <UTF-8 ENCODED CHARACTER> <CHARACTER NAME>
Please note that the above files don't containt any ROMAN digits (their character codes are U+2160 to U+216F and U+2170 to U+217F; e.g. U+216C is Ⅼ aka decimal 50).
There are some more things that could be checked, e.g.:
* Other programming languages (like Go or Objective C or Delphi), libraries and functions could be tested.
* Does this have any security implications (filter bypassing perhaps?). See also Unicode Security Considerations.
* Are non-decimal digits supported in some cases?
* Did I miss any digit groups?
Also, I would like to thank the following people for pointing me to various languages (some I tested, some I didn't): Roi Martin, Tomasz Dąbrowski, Maciej Tebecha, himn1, argasek, dfgg, meal and nathell.
Cheers,
Links / Blogs
- → dragonsector.pl
- → vexillium.org
- Security/Hacking:
- Reverse Eng./Low-Level:
- Programming/Code:
Posts
- Just another Null Byte Poison via Unicode variant (MuPDF mutool RCE),
- PWNing Online 2020 CFP,
- How to document your knowledge (in a CV/resume),
- confidence 2020 CFP deadline is approaching fast,
- What's the probability of a downloaded ZIP being broken?,
- GSGC2019: Winners,
- Teaser Dragon CTF 2019 - this Sat/Sun,
- Gynvael's Summer GameDev Challenge 2019,
- HackYeah 2019: 30 free tickets,
- Paged Out! Issue #1,
- → see all posts on main page
// copyright © Gynvael Coldwind
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
Comments:
Agreed, it does look inconsistent.
Maybe some Perl expert could look into it?;)
That actually was a typo in the description - I've used 1.9.2 for tests (the default bundle for Windows).
Add a comment: