The interesting difference between ASCII and Unicode is that the first had only one group of digits defined (30h to 39h), and the latter defines 42 decimal digit groups (I think it actually defines more, but nvm). A common programming language operation is to convert a sequence of digit-characters (yes, a number) to a machine-understandable integer. Does any default in-language string-to-integer support Unicode digits? Does any is-digit function return true on Unicode digits? Well, I did some checking and created a table (programming language/version/library vs digit group) that addresses these questions.
String-to-Integer vs Unicode additional digit groups table
Important note: I've created this table out of curiosity so it might be a little chaotic. I tried to not make any errors but I cannot guarantee that there are none. If you have any comments or have tested another programming language/library/digit group/etc and would like to share the results, feel free to either e-mail me or leave a comment under this post.
As for test cases, most of the scripts/programs I've written for the purpose of making this table used one of these files as input:
* test_case_string.txt - format: <HEX CODE> <CHARACTER NAME>
* test_case_utf8.txt - format: <UTF-8 ENCODED CHARACTER> <CHARACTER NAME>
Please note that the above files don't containt any ROMAN digits (their character codes are U+2160 to U+216F and U+2170 to U+217F; e.g. U+216C is Ⅼ aka decimal 50).
There are some more things that could be checked, e.g.:
* Other programming languages (like Go or Objective C or Delphi), libraries and functions could be tested.
* Does this have any security implications (filter bypassing perhaps?). See also Unicode Security Considerations.
* Are non-decimal digits supported in some cases?
* Did I miss any digit groups?
Also, I would like to thank the following people for pointing me to various languages (some I tested, some I didn't): Roi Martin, Tomasz Dąbrowski, Maciej Tebecha, himn1, argasek, dfgg, meal and nathell.
Cheers,

Sections
- lang:
|
- RSS:
|
- About me
- Tools
- → YT YouTube (EN)
- → D Discord
- → M Mastodon
- → T Twitter
- → GH GitHub
Links / Blogs
- → dragonsector.pl
- → vexillium.org
- Security/Hacking:
- Reverse Eng./Low-Level:
- Programming/Code:
Posts
- My howto script,
- Talk: PCI Express to Hell,
- Live: On Leaving Google and What's Next,
- Thoughts on overlarge fields in formats and protocols,
- On self-healing code and the obvious issue,
- LLM + Clean Room: Will LLMs be the death of code copyrights?,
- Solving a VM-based CTF challenge without solving it properly,
- Asking MEMORY.DMP and Volatility to make up,
- KnightCTF 2023 write-ups (RE category),
- Dev Log: Moving contacts from Android to MaxCom MM721,
- → see all posts on main page
// copyright © Gynvael Coldwind
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
Comments:
Agreed, it does look inconsistent.
Maybe some Perl expert could look into it?;)
That actually was a typo in the description - I've used 1.9.2 for tests (the default bundle for Windows).
Add a comment: