Go to file

Greg Weber eb5b78d429 export just sanitizeXSS		2010-09-26 08:09:49 -07:00
Text/HTML	export just sanitizeXSS	2010-09-26 08:09:49 -07:00
.gitignore	gitignore file	2010-09-23 18:09:21 -07:00
LICENSE	add LICENSE, update README, cabalize	2010-09-25 13:11:34 -07:00
README.md	export just sanitizeXSS	2010-09-26 08:09:49 -07:00
test.hs	finish cabalizing package	2010-09-25 13:31:37 -07:00
xss-sanitize.cabal	finish cabalizing package	2010-09-25 13:31:37 -07:00

README.md

Summary

provides a function Text.HTML.SanitizeXSS.sanitizeXSS that filters html to prevent XSS attacks.

Use Case

All html from an untrusted source (user of a web application) should be ran through this function. If you trust the html (you wrote it), you do not need to use this.

Detail

This is not escaping! Escaping html does prevents XSS attacks. Strings should be html escaped to show up properly and to prevent XSS attacks. However, escaping will ruin the display of the html.

This function removes any tags or attributes that are not in its white-list of. This may sound picky, but most html should make it through unchanged, making the proces unnoticeable to the user but giving us safe html.

Limitations

TagSoup is used to parse the HTML, and it does a good job. However TagSoup does not maintain all white space. TagSoup does not distinguish between the following cases:

<a href="foo">, <a href=foo>
<a   href>, <a href>
<a></a>, <a/>

img and br tags will be output as a single self-closing tags. Other self-closing tags will be output as an open and closing pair. So or converts to , and or converts to . There are future updates to TagSoup planned to fix these cases.

Integration

It is recommended to integrate this so that it is automatically used whenever an application receives untrusted html data (instead of before it is displayed). See the Yesod web framework as an example.

Credit

This was taken from John MacFarlane's Pandoc (with permission) modified to be faster and parsing redone with TagSoup