The interesting difference between ASCII and Unicode is that the first had only one group of digits defined (30h to 39h), and the latter defines 42 decimal digit groups (I think it actually defines more, but nvm). A common programming language operation is to convert a sequence of digit-characters (yes, a number) to a machine-understandable integer. Does any default in-language string-to-integer support Unicode digits? Does any is-digit function return true on Unicode digits? Well, I did some checking and created a table (programming language/version/library vs digit group) that addresses these questions.
String-to-Integer vs Unicode additional digit groups table
Important note: I've created this table out of curiosity so it might be a little chaotic. I tried to not make any errors but I cannot guarantee that there are none. If you have any comments or have tested another programming language/library/digit group/etc and would like to share the results, feel free to either e-mail me or leave a comment under this post.
As for test cases, most of the scripts/programs I've written for the purpose of making this table used one of these files as input:
* test_case_string.txt - format: <HEX CODE> <CHARACTER NAME>
* test_case_utf8.txt - format: <UTF-8 ENCODED CHARACTER> <CHARACTER NAME>
Please note that the above files don't containt any ROMAN digits (their character codes are U+2160 to U+216F and U+2170 to U+217F; e.g. U+216C is Ⅼ aka decimal 50).
There are some more things that could be checked, e.g.:
* Other programming languages (like Go or Objective C or Delphi), libraries and functions could be tested.
* Does this have any security implications (filter bypassing perhaps?). See also Unicode Security Considerations.
* Are non-decimal digits supported in some cases?
* Did I miss any digit groups?
Also, I would like to thank the following people for pointing me to various languages (some I tested, some I didn't): Roi Martin, Tomasz Dąbrowski, Maciej Tebecha, himn1, argasek, dfgg, meal and nathell.
Cheers,
Sections
- lang: |
- RSS: |
- About me
- Tools
- → YT YouTube (EN)
- → D Discord
- → M Mastodon
- → T Twitter
- → GH GitHub
Links / Blogs
- → dragonsector.pl
- → vexillium.org
- Security/Hacking:
- Reverse Eng./Low-Level:
- Programming/Code:
Posts
- Paged Out! #5 is out,
- CVEs of SSH talk this Thursday,
- Debug Log: Internet doesn't work (it was the PSU),
- FAQ: The tragedy of low-level exploitation,
- Solving Hx8 Teaser 2 highlight videos!,
- Gynvael on SECURITYbreak podcast,
- Paged Out! #4 is out,
- I won't be able to attend CONFidence'24 after all :(,
- xz/liblzma: Bash-stage Obfuscation Explained,
- Two of my bookmarklets: image extraction and simple TTS,
- → see all posts on main page
// copyright © Gynvael Coldwind
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
// design & art by Xa
// logo font (birdman regular) by utopiafonts / Dale Harris
/* the author and owner of this blog hereby allows anyone to test the security of this blog (on HTTP level only, the server is not mine, so let's leave it alone ;>), and try to break in (including successful breaks) without any consequences of any kind (DoS attacks are an exception here) ... I'll add that I planted in some places funny photos of some kittens, there are 7 of them right now, so have fun looking for them ;> let me know if You find them all, I'll add some congratz message or sth ;> */
Vulns found in blog:
* XSS (pers, user-inter) by ged_
* XSS (non-pers) by Anno & Tracerout
* XSS (pers) by Anno & Tracerout
* Blind SQLI by Sławomir Błażek
* XSS (pers) by Sławomir Błażek
Comments:
Agreed, it does look inconsistent.
Maybe some Perl expert could look into it?;)
That actually was a typo in the description - I've used 1.9.2 for tests (the default bundle for Windows).
For example, applying Witch Ordinals to well-known Poweliks mshtml trick:
rundll32.exe javascript:alert('๓ē໓นkค-iŞ-๖ēคนtฯ');window.close();"\..\mshtml #৩੧೪໑၅৯២៦୫໓໕໘९൭៩೩٢۳๘൪၆๒୬༤୩৫৬၉੯൯"
Add a comment: