Security in HTML 5 and HTTP

data dump:html 5
For various reasons I've decided to take a deeper look at the evolving HTML 5 standard and related new HTTP extensions (or proposals of extensions). To tell you the truth, I was extremely surprised about the number of HTML tags that I didn't even hear of (like <ruby>, <kbd>, <meter>, <progress>, etc). Another thing that surprised me were a few security features I was not familiar with... so I decided to write down what I found interesting (so yes, this is a 'data dump' only).

HTML 5: keygen tag

Let's start with the <keygen> tag, which is used to generate a pair of RSA keys (OK, so the standard says that RSA is supported, however it seems Mozilla added a few other types of key like DSA and ES). The public key is send to (keygen is used in forms) the server and the private one is stored in the local keystore.
So... crypto finally arrived at HTML/JS, that's good news. However, I do wonder what will this be used for - any ideas? Maybe user+browser identification?
Anyway, I hope no webmaster will try to substitute HTTPS with JS RSA crypto on HTTP, that's basically asking for trouble (i.e. in a MITM scenario, which HTTPS is designed to protect against, the attacker can easily change the content of HTML/JS, like e.g. injecting a script that will send the not-yet-encrypted/already-decrypted data to the evil-server DOT com).

HTML 5: iframe sandbox

Apparently the <iframe> tag (btw, the <frame> and <frameset> tags are not supported in HTML 5) has gained a sandbox attribute, which enables a set of extra restrictions on any content hosted by the iframe. [...] When the attribute is set, the content is treated as being from a unique origin, forms and scripts are disabled, links are prevented from targeting other browsing contexts, and plugins are disabled.
To narrow the restrictions you can specify a set of values like:
* allow-forms - allows submitting of forms (i.e. normally submitting forms is disabled (?))
* allow-scripts - allows running scripts (i.e. normally scripts are disabled)
* allow-same-origin - allow the page to be same origin (i.e. normally the framed paged is always treated as a different unique origin)
* allow-top-navigation - self describing (i.e. normally scripts cannot do it)

Hmm, I'm wondering why exactly is form submitting disabled. I mean, there are many ways to emulate this functionality and send e.g. phished data elsewhere right? Actually only partly right, since one would require scripting to be enabled on the page (i.e. allow-scripts). Guess this makes sense after all.

On a side note, it seems that there is no way to re-enable plugins.

Actually there are a few screen-pages of description of how the sandbox works, so it's best to read it. A couple of times perhaps.
However, I'll quote a few warnings from the standard since they are quite interesting:

Warning! If the allow-scripts keyword is set along with allow-same-origin keyword, and the file is from the same origin as the iframe's Document, then a script in the "sandboxed" iframe could just reach out, remove the sandbox attribute, and then reload itself, effectively breaking out of the sandbox altogether.

Warning! These flags only take effect when the nested browsing context of the iframe is navigated. Removing them, or removing the entire sandbox attribute, has no effect on an already-loaded page.
So, the sandboxing features/flags are enabled at load time. I guess that why the previous quoted warning has the 'and then reload itself' part included.

Warning! Sandboxing hostile content is of minimal help if an attacker can convince the user to just visit the hostile content directly, rather than in the iframe. To limit the damage that can be caused by hostile HTML content, it should be served using the text/html-sandboxed MIME type.
I'll get back to text/html-sandboxed later.

Another quote - on how the cookies are handled on the unique-origin page:
If the contents are sandboxed into a unique origin (in an iframe with the sandbox attribute) or the resource was labeled as text/html-sandboxed, a SECURITY_ERR exception will be thrown on getting and setting.
That seems to be a good solution.

And one last thing: one thing that came to my mind when I saw that this allows iframing with scripting turned off was that 'the frame busing scripts will stop working'. However, since there is the From-Origin header I've mentioned the other day, everything should be OK. That is, if a given browser implements both features at the same time or the From-Origin header first, otherwise there will be a time window in which the frame busting scripts won't work. Hmm, but I guess there is the X-Frame-Options header to help with the time window. Concluding: yes, WWW was made using the patchwork technique.

HTTP: text/html-sandboxed MIME type

Looks like an HTTP server can tell the browser that a certain html document is not trusted, which translated to 'having unique origins' (i.e. not being in the same origin as the rest of the site). So, a text/html-sandboxed page should have no access to cookies of the hosted domain, nor could it e.g. read the content of other pages hosted on this server.
On a side note: the extension proposed is ".sandboxed", and it's discouraged to use ".html" or ".htm" extensions due to the danger that the legacy user agents might render/execute the pages with full same-origin permissions.

I admit it - I'm having doubts regarding this feature. Two reasons:
1. A commonly repeated phrase is that the address bar of a browser is the only way to tell what page you are on (in opposition to e.g. the status bar or the text of the link). But in this case, the potentially-evil page would be hosted on the same domain (text/html-sandboxed of course) and would be a good base for phishing ("I've checked the domain and it was OK!!!!1").
2. The ".sandboxed" extensions for legacy browsers might not work due to the content sniffing mechanism - since the MIME type will be unknown for the legacy browsers, they might switch into content sniffing. And if they do, guess what they'll find. Yep. HTML.

I guess the 1st point could by addressed by using the same approach as <iframe> sandbox uses - i.e. disallow submitting data in any way as well as scripting.
As for the second point, well we're still fighting with IE6 which was released 10 years ago. So I guess this won't be a problem... in 2051.

Speaking of which...

MIMESNIFF: type sniffing specification

So yes, content sniffing has been a problem for some years now. You host a file thinking it's an image, but the browser assumes it's an HTML document (because there's some random HTML <tag> at the beginning) and treats it as such. Result? XSS.
One of the solutions to this problem was the X-Content-Type-Options: nosniff header introduced in IE8 in 2008. Another was to force-download every resource using the Content-Disposition: attachment header. Etc.

But in the end the important part is that content sniffing was actually a usability feature, so you might actually want to leave it ON. And so, a safe & secure content sniffing procedure is needed.
And it seems there is a draft of it already. I am yet to read through it though, so no comments on it at this point of time.


1. There is a very interesting Unicode security report linked in the HTML 5 standard. Worth looking through.
2. The Content Security Policy, although not a part of HTML 5 standard, looks very interesting and I'm interested to see if this will spread around.

And that's that.


2011-08-07 21:45:45 = Krzysztof Kotowicz
Great post!

As for the keygen tag: Is JS crypto is a good thing(TM)? There are mixed opinions. One flaw of that is obviously XSS like you mentioned, but there are other fundamental problems with it - as described by Nate Lawson - http://rdist.root.org/2010/11/29/final-post-on-javascript-crypto/

Iframe sandbox is a funny thing - it does offer protection when you knowingly want to embed questionable content - and not have your page modified along the way. But at the same time it's a great tool for clickjacking attacks ( http://html5sec.org/#122 ). The From-Origin header is just a new idea, and iframe sandbox already works in Chrome for months. So, basically, X-Frame-Options is the only way to protect from having your site clickjacked. But the adoption rates of this header are scary. So - the final effect is that the clickjackers got a new tool to launch their attack, and all the websites that want to protect need to adjust somehow. Not the best approach IMHO [ but the web is broken anyway ;) ].

"The address bar of a browser is the only way to tell what page you are on". No, unfortunately not in 2k11. (history.pushState - https://developer.mozilla.org/en/DOM/Manipulating_the_browser_history ). I have a feeling that the sandboxed mime type etc. are unlikely to be used widely, I'd bet on Content Security Policy, I think it will get better adoption.

HTML5 is a broad subject, there's many other security issues with it, to mention only Cross Origin Resource Sharing and Offline Web Applications. While I think it's great that HTML gets new possibilities, it really needs attention from security community and security-minded developers, as there are many new issues and quirks in the spec that might introduce new vulnerabilities even in legacy applications.
2011-08-08 19:23:14 = Pawel Golen
ENISA has published an interesting paper lately: "A Security Analysis of Next Generation Web Standards" (http://www.enisa.europa.eu/act/application-security/web-security/a-security-analysis-of-next-generation-web-standards). It is undoubtedly worth reading, but the problem with papers like this is theirs overblown and formal language. It's definitely not a quick read.
2011-08-10 07:11:05 = eneon
I agree with Krzysztof Kotowicz that new iframe tag in HTML5 leads to more clickjacking attacks but there is a way to successful protect against clickjacking, see here <a href="http://websec.rooted.pl/2011/07/more-accurate-framebusting.html">http://websec.rooted.pl/2011/07/more-accurate-framebusting.html</a>
2011-08-10 10:51:24 = Krzysztof Kotowicz
@eneon yes, there are methods to protect, the one described in your blog comes from a great Stanford paper - http://seclab.stanford.edu/websec/framebusting/framebust.pdf and is currently the safest method of protection (X-Frame-Options + CSS hiding showing the contents with JS) - although I'd use display:none instead of visibility:hidden.

The point is that few website will employ these methods. Last time I checked (late autumn), nk.pl - the Polish social network, had the old JS framebusting code and no X-Frame-Options header at all. New ways of clickjacking attacks are discovered and used (using iframe sandbox is one of them) and the web now has to somehow upgrade all its websites to protect from the old (2008) vulnerability. That is what's wrong.
2011-08-10 20:52:50 = eneon
@Krzysztof Kotowicz yeah, papers from Stanford are great and my example in fact comes from there but I like 'visibility' CSS property because 'display' has more complicated uncloak form.
Please focus on the top.location.replace(location.href) which doesn't let users back to framing site after jump out.

Indeed Clickjacking with HTML5 is easy. I think the biggest difficulty that must be overcome is not implementation text/html-sandboxed MIME type but let content of our framed pages be this MIME because until every web browser will accept it as a right content it remains old problems at same point as earlier without this MIME type.
2011-08-25 09:20:39 = kravietz
Keygen is not a new thing, it's used to generate RSA keys for SSL client authentication for example. You can find it mostly on certification authorities (CA) that issue X.509 certificates (low security profiles).

Add a comment:

URL (optional):
Math captcha: 5 ∗ 2 + 2 =