use html5lib as reference instead of Pandoc
This commit is contained in:
parent
66a6139a4f
commit
f2b4400472
37
README.md
37
README.md
@ -13,8 +13,20 @@ This is not escaping! Escaping html does prevents XSS attacks. Strings should be
|
||||
|
||||
This function removes any tags or attributes that are not in its white-list of. This may sound picky, but most html should make it through unchanged, making the proces unnoticeable to the user but giving us safe html.
|
||||
|
||||
Integration
|
||||
===========
|
||||
It is recommended to integrate this so that it is automatically used whenever an application receives untrusted html data (instead of before it is displayed). See the Yesod web framework as an example.
|
||||
|
||||
Credit
|
||||
===========
|
||||
This was taken from John MacFarlane's Pandoc (with permission) modified to be faster and parsing redone with TagSoup. html5lib is also being used as a reference (BSD style license).
|
||||
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
===========
|
||||
|
||||
TagSoup Parser
|
||||
--------------
|
||||
TagSoup is used to parse the HTML, and it does a good job. However TagSoup does not maintain all white space. TagSoup does not distinguish between the following cases:
|
||||
|
||||
<a href="foo">, <a href=foo>
|
||||
@ -23,10 +35,21 @@ TagSoup is used to parse the HTML, and it does a good job. However TagSoup does
|
||||
|
||||
In the third case, img and br tags will be output as a single self-closing tags. Other self-closing tags will be output as an open and closing pair. So `<img /> or <img><img>` converts to `<img />`, and `<a></a> or <a/>` converts to `<a></a>`. There are future updates to TagSoup planned so that TagSoup will be able to render tags exactly the same as they were parsed.
|
||||
|
||||
Integration
|
||||
===========
|
||||
It is recommended to integrate this so that it is automatically used whenever an application receives untrusted html data (instead of before it is displayed). See the Yesod web framework as an example.
|
||||
Where is the white list from?
|
||||
-----------------------------
|
||||
Ultimately this is where your security comes from, although I would tend to think that even a basic, incomplete white list would act as a strong deterrent.
|
||||
|
||||
Credit
|
||||
===========
|
||||
This was taken from John MacFarlane's Pandoc (with permission) modified to be faster and parsing redone with TagSoup
|
||||
Version 0.1 of the white list is from Pandoc. Probably that list is from an older version of (a wiki page containing a white list)[http://wiki.whatwg.org/wiki/Sanitization_rules]. Having some prior experience editing Wikipedia, I am a little wary of directly using a wiki for a purpose like this, although it does seem to be watched over.
|
||||
|
||||
Version >= 0.2 uses (the source code of html5lib)[http://code.google.com/p/html5lib/source/browse/python/html5lib/sanitizer.py]. as the source of the white list and my implementation reference. They do reference that wiki page as their source, but hopefully they are careful of when they import it into their code. I would definitely consider working with the maintatiners of html5lib, but it doesn't make sense to merge the projects because sanitization is just one aspect of html5lib (They have a parser also)
|
||||
|
||||
If anyone knows of better sources or thinks a particular tag/attribute/value may be vulerable, please let me know.
|
||||
|
||||
attributes data and style
|
||||
-------------------------
|
||||
The href attribute is white listed, but its value must pass through a white list also. This is how the data and style attributes should work also. However, this was never implemented in Pandoc, and the html5lib code is a little complicated and relies on regular expressions that I don't understand. So for now thes attributes are not on the white list.
|
||||
|
||||
svg and mathml
|
||||
--------------
|
||||
A mathml white list is fully implemented.
|
||||
There is a white list for svg elements and attributes. However, some elements are not included because they need further filtering (just like the data and style html attributes)
|
||||
|
||||
@ -2,13 +2,14 @@ module Text.HTML.SanitizeXSS (sanitizeXSS) where
|
||||
|
||||
import Text.HTML.TagSoup
|
||||
|
||||
import Data.Set (Set(), member, fromList)
|
||||
import Data.Set (Set(), member, notMember, (\\), fromList)
|
||||
import Data.Char ( toLower )
|
||||
|
||||
import Network.URI ( parseURIReference, URI (..),
|
||||
isAllowedInURI, escapeURIString, uriScheme )
|
||||
import Codec.Binary.UTF8.String ( encodeString )
|
||||
|
||||
-- | santize the html to prevent XSS attacks. See README.md <http://github.com/gregwebs/haskell-xss-sanitize> for more details
|
||||
sanitizeXSS :: String -> String
|
||||
sanitizeXSS = renderTagsOptions renderOptions {
|
||||
optMinimize = \x -> x `elem` ["br","img"] -- <img><img> converts to <img />, <a/> converts to <a></a>
|
||||
@ -28,14 +29,14 @@ safeTagName tagname = tagname `member` sanitaryTags
|
||||
|
||||
safeAttribute :: (String, String) -> Bool
|
||||
safeAttribute (name, value) = name `member` sanitaryAttributes &&
|
||||
(name `notElem` ["href","src"] || sanitaryURI value)
|
||||
(name `notMember` attrValIsUri || sanitaryURI value)
|
||||
|
||||
|
||||
-- | Returns @True@ if the specified URI is not a potential security risk.
|
||||
sanitaryURI :: String -> Bool
|
||||
sanitaryURI u =
|
||||
case parseURIReference (escapeURI u) of
|
||||
Just p -> (map toLower $ uriScheme p) `member` safeURISchemes
|
||||
Just p -> (init (map toLower $ uriScheme p)) `member` safeURISchemes
|
||||
Nothing -> False
|
||||
|
||||
|
||||
@ -44,9 +45,175 @@ sanitaryURI u =
|
||||
escapeURI :: String -> String
|
||||
escapeURI = escapeURIString isAllowedInURI . encodeString
|
||||
|
||||
|
||||
safeURISchemes :: Set String
|
||||
safeURISchemes = fromList [ "", "http:", "https:", "ftp:", "mailto:", "file:",
|
||||
safeURISchemes = fromList acceptable_protocols
|
||||
|
||||
sanitaryTags :: Set String
|
||||
sanitaryTags = fromList (acceptable_elements ++ mathml_elements ++ svg_elements)
|
||||
\\ (fromList svg_allow_local_href) -- extra filtering not implemented
|
||||
|
||||
sanitaryAttributes :: Set String
|
||||
sanitaryAttributes = fromList (acceptable_attributes ++ mathml_attributes ++ svg_attributes)
|
||||
\\ (fromList svg_attr_val_allows_ref) -- extra unescaping not implemented
|
||||
|
||||
attrValIsUri :: Set String
|
||||
attrValIsUri = fromList ["href", "src", "cite", "action", "longdesc",
|
||||
"xlink:href", "xml:base"]
|
||||
|
||||
acceptable_elements :: [String]
|
||||
acceptable_elements = ["a", "abbr", "acronym", "address", "area",
|
||||
"article", "aside", "audio", "b", "big", "blockquote", "br", "button",
|
||||
"canvas", "caption", "center", "cite", "code", "col", "colgroup",
|
||||
"command", "datagrid", "datalist", "dd", "del", "details", "dfn",
|
||||
"dialog", "dir", "div", "dl", "dt", "em", "event-source", "fieldset",
|
||||
"figure", "footer", "font", "form", "header", "h1", "h2", "h3", "h4",
|
||||
"h5", "h6", "hr", "i", "img", "input", "ins", "keygen", "kbd",
|
||||
"label", "legend", "li", "m", "map", "menu", "meter", "multicol",
|
||||
"nav", "nextid", "ol", "output", "optgroup", "option", "p", "pre",
|
||||
"progress", "q", "s", "samp", "section", "select", "small", "sound",
|
||||
"source", "spacer", "span", "strike", "strong", "sub", "sup", "table",
|
||||
"tbody", "td", "textarea", "time", "tfoot", "th", "thead", "tr", "tt",
|
||||
"u", "ul", "var", "video"]
|
||||
|
||||
mathml_elements :: [String]
|
||||
mathml_elements = ["maction", "math", "merror", "mfrac", "mi",
|
||||
"mmultiscripts", "mn", "mo", "mover", "mpadded", "mphantom",
|
||||
"mprescripts", "mroot", "mrow", "mspace", "msqrt", "mstyle", "msub",
|
||||
"msubsup", "msup", "mtable", "mtd", "mtext", "mtr", "munder",
|
||||
"munderover", "none"]
|
||||
|
||||
-- this should include altGlyph I think
|
||||
svg_elements :: [String]
|
||||
svg_elements = ["a", "animate", "animateColor", "animateMotion",
|
||||
"animateTransform", "clipPath", "circle", "defs", "desc", "ellipse",
|
||||
"font-face", "font-face-name", "font-face-src", "g", "glyph", "hkern",
|
||||
"linearGradient", "line", "marker", "metadata", "missing-glyph",
|
||||
"mpath", "path", "polygon", "polyline", "radialGradient", "rect",
|
||||
"set", "stop", "svg", "switch", "text", "title", "tspan", "use"]
|
||||
|
||||
acceptable_attributes :: [String]
|
||||
acceptable_attributes = ["abbr", "accept", "accept-charset", "accesskey",
|
||||
"action", "align", "alt", "autocomplete", "autofocus", "axis",
|
||||
"background", "balance", "bgcolor", "bgproperties", "border",
|
||||
"bordercolor", "bordercolordark", "bordercolorlight", "bottompadding",
|
||||
"cellpadding", "cellspacing", "ch", "challenge", "char", "charoff",
|
||||
"choff", "charset", "checked", "cite", "class", "clear", "color",
|
||||
"cols", "colspan", "compact", "contenteditable", "controls", "coords",
|
||||
-- "data", TODO: allow this with further filtering
|
||||
"datafld", "datapagesize", "datasrc", "datetime", "default",
|
||||
"delay", "dir", "disabled", "draggable", "dynsrc", "enctype", "end",
|
||||
"face", "for", "form", "frame", "galleryimg", "gutter", "headers",
|
||||
"height", "hidefocus", "hidden", "high", "href", "hreflang", "hspace",
|
||||
"icon", "id", "inputmode", "ismap", "keytype", "label", "leftspacing",
|
||||
"lang", "list", "longdesc", "loop", "loopcount", "loopend",
|
||||
"loopstart", "low", "lowsrc", "max", "maxlength", "media", "method",
|
||||
"min", "multiple", "name", "nohref", "noshade", "nowrap", "open",
|
||||
"optimum", "pattern", "ping", "point-size", "prompt", "pqg",
|
||||
"radiogroup", "readonly", "rel", "repeat-max", "repeat-min",
|
||||
"replace", "required", "rev", "rightspacing", "rows", "rowspan",
|
||||
"rules", "scope", "selected", "shape", "size", "span", "src", "start",
|
||||
"step",
|
||||
-- "style", TODO: allow this with further filtering
|
||||
"summary", "suppress", "tabindex", "target",
|
||||
"template", "title", "toppadding", "type", "unselectable", "usemap",
|
||||
"urn", "valign", "value", "variable", "volume", "vspace", "vrml",
|
||||
"width", "wrap", "xml:lang"]
|
||||
|
||||
acceptable_protocols :: [String]
|
||||
acceptable_protocols = [ "ed2k", "ftp", "http", "https", "irc",
|
||||
"mailto", "news", "gopher", "nntp", "telnet", "webcal",
|
||||
"xmpp", "callto", "feed", "urn", "aim", "rsync", "tag",
|
||||
"ssh", "sftp", "rtsp", "afs" ]
|
||||
|
||||
mathml_attributes :: [String]
|
||||
mathml_attributes = ["actiontype", "align", "columnalign", "columnalign",
|
||||
"columnalign", "columnlines", "columnspacing", "columnspan", "depth",
|
||||
"display", "displaystyle", "equalcolumns", "equalrows", "fence",
|
||||
"fontstyle", "fontweight", "frame", "height", "linethickness", "lspace",
|
||||
"mathbackground", "mathcolor", "mathvariant", "mathvariant", "maxsize",
|
||||
"minsize", "other", "rowalign", "rowalign", "rowalign", "rowlines",
|
||||
"rowspacing", "rowspan", "rspace", "scriptlevel", "selection",
|
||||
"separator", "stretchy", "width", "width", "xlink:href", "xlink:show",
|
||||
"xlink:type", "xmlns", "xmlns:xlink"]
|
||||
|
||||
svg_attributes :: [String]
|
||||
svg_attributes = ["accent-height", "accumulate", "additive", "alphabetic",
|
||||
"arabic-form", "ascent", "attributeName", "attributeType",
|
||||
"baseProfile", "bbox", "begin", "by", "calcMode", "cap-height",
|
||||
"class", "clip-path", "color", "color-rendering", "content", "cx",
|
||||
"cy", "d", "dx", "dy", "descent", "display", "dur", "end", "fill",
|
||||
"fill-opacity", "fill-rule", "font-family", "font-size",
|
||||
"font-stretch", "font-style", "font-variant", "font-weight", "from",
|
||||
"fx", "fy", "g1", "g2", "glyph-name", "gradientUnits", "hanging",
|
||||
"height", "horiz-adv-x", "horiz-origin-x", "id", "ideographic", "k",
|
||||
"keyPoints", "keySplines", "keyTimes", "lang", "marker-end",
|
||||
"marker-mid", "marker-start", "markerHeight", "markerUnits",
|
||||
"markerWidth", "mathematical", "max", "min", "name", "offset",
|
||||
"opacity", "orient", "origin", "overline-position",
|
||||
"overline-thickness", "panose-1", "path", "pathLength", "points",
|
||||
"preserveAspectRatio", "r", "refX", "refY", "repeatCount",
|
||||
"repeatDur", "requiredExtensions", "requiredFeatures", "restart",
|
||||
"rotate", "rx", "ry", "slope", "stemh", "stemv", "stop-color",
|
||||
"stop-opacity", "strikethrough-position", "strikethrough-thickness",
|
||||
"stroke", "stroke-dasharray", "stroke-dashoffset", "stroke-linecap",
|
||||
"stroke-linejoin", "stroke-miterlimit", "stroke-opacity",
|
||||
"stroke-width", "systemLanguage", "target", "text-anchor", "to",
|
||||
"transform", "type", "u1", "u2", "underline-position",
|
||||
"underline-thickness", "unicode", "unicode-range", "units-per-em",
|
||||
"values", "version", "viewBox", "visibility", "width", "widths", "x",
|
||||
"x-height", "x1", "x2", "xlink:actuate", "xlink:arcrole",
|
||||
"xlink:href", "xlink:role", "xlink:show", "xlink:title", "xlink:type",
|
||||
"xml:base", "xml:lang", "xml:space", "xmlns", "xmlns:xlink", "y",
|
||||
"y1", "y2", "zoomAndPan"]
|
||||
|
||||
-- the values for these need to be escaped
|
||||
svg_attr_val_allows_ref :: [String]
|
||||
svg_attr_val_allows_ref = ["clip-path", "color-profile", "cursor", "fill",
|
||||
"filter", "marker", "marker-start", "marker-mid", "marker-end",
|
||||
"mask", "stroke"]
|
||||
|
||||
svg_allow_local_href :: [String]
|
||||
svg_allow_local_href = ["altGlyph", "animate", "animateColor",
|
||||
"animateMotion", "animateTransform", "cursor", "feImage", "filter",
|
||||
"linearGradient", "pattern", "radialGradient", "textpath", "tref",
|
||||
"set", "use"]
|
||||
|
||||
{- style value (css) filtering not implemented
|
||||
-
|
||||
- this is used for css filtering
|
||||
allowed_svg_properties = fromList acceptable_svg_properties
|
||||
acceptable_svg_properties = [ "fill", "fill-opacity", "fill-rule",
|
||||
"stroke", "stroke-width", "stroke-linecap", "stroke-linejoin",
|
||||
"stroke-opacity"]
|
||||
|
||||
|
||||
allowed_css_properties = fromList acceptable_css_properties
|
||||
allowed_css_keywords = fromList acceptable_css_keywords
|
||||
acceptable_css_properties = ["azimuth", "background-color",
|
||||
"border-bottom-color", "border-collapse", "border-color",
|
||||
"border-left-color", "border-right-color", "border-top-color", "clear",
|
||||
"color", "cursor", "direction", "display", "elevation", "float", "font",
|
||||
"font-family", "font-size", "font-style", "font-variant", "font-weight",
|
||||
"height", "letter-spacing", "line-height", "overflow", "pause",
|
||||
"pause-after", "pause-before", "pitch", "pitch-range", "richness",
|
||||
"speak", "speak-header", "speak-numeral", "speak-punctuation",
|
||||
"speech-rate", "stress", "text-align", "text-decoration", "text-indent",
|
||||
"unicode-bidi", "vertical-align", "voice-family", "volume",
|
||||
"white-space", "width"]
|
||||
acceptable_css_keywords = ["auto", "aqua", "black", "block", "blue",
|
||||
"bold", "both", "bottom", "brown", "center", "collapse", "dashed",
|
||||
"dotted", "fuchsia", "gray", "green", "!important", "italic", "left",
|
||||
"lime", "maroon", "medium", "none", "navy", "normal", "nowrap", "olive",
|
||||
"pointer", "purple", "red", "right", "solid", "silver", "teal", "top",
|
||||
"transparent", "underline", "white", "yellow"]
|
||||
-}
|
||||
|
||||
|
||||
-- I don't know where this is from!
|
||||
-- The rest of pandoc's lists were smaller than the ones in html5lib
|
||||
-- This one is bigger.
|
||||
{-
|
||||
pandoc_acceptable_protocols = [ "", "http:", "https:", "ftp:", "mailto:", "file:",
|
||||
"telnet:", "gopher:", "aaa:", "aaas:", "acap:", "cap:", "cid:",
|
||||
"crid:", "dav:", "dict:", "dns:", "fax:", "go:", "h323:", "im:",
|
||||
"imap:", "ldap:", "mid:", "news:", "nfs:", "nntp:", "pop:",
|
||||
@ -56,33 +223,4 @@ safeURISchemes = fromList [ "", "http:", "https:", "ftp:", "mailto:", "file:",
|
||||
"ldaps:", "magnet:", "mms:", "msnim:", "notes:", "rsync:",
|
||||
"secondlife:", "skype:", "ssh:", "sftp:", "smb:", "sms:",
|
||||
"snews:", "webcal:", "ymsgr:"]
|
||||
|
||||
sanitaryTags :: Set String
|
||||
sanitaryTags = fromList ["a", "abbr", "acronym", "address", "area", "b", "big",
|
||||
"blockquote", "br", "button", "caption", "center",
|
||||
"cite", "code", "col", "colgroup", "dd", "del", "dfn",
|
||||
"dir", "div", "dl", "dt", "em", "fieldset", "font",
|
||||
"form", "h1", "h2", "h3", "h4", "h5", "h6", "hr",
|
||||
"i", "img", "input", "ins", "kbd", "label", "legend",
|
||||
"li", "map", "menu", "ol", "optgroup", "option", "p",
|
||||
"pre", "q", "s", "samp", "select", "small", "span",
|
||||
"strike", "strong", "sub", "sup", "table", "tbody",
|
||||
"td", "textarea", "tfoot", "th", "thead", "tr", "tt",
|
||||
"u", "ul", "var"]
|
||||
|
||||
sanitaryAttributes :: Set String
|
||||
sanitaryAttributes = fromList ["abbr", "accept", "accept-charset",
|
||||
"accesskey", "action", "align", "alt", "axis",
|
||||
"border", "cellpadding", "cellspacing", "char",
|
||||
"charoff", "charset", "checked", "cite", "class",
|
||||
"clear", "cols", "colspan", "color", "compact",
|
||||
"coords", "datetime", "dir", "disabled",
|
||||
"enctype", "for", "frame", "headers", "height",
|
||||
"href", "hreflang", "hspace", "id", "ismap",
|
||||
"label", "lang", "longdesc", "maxlength", "media",
|
||||
"method", "multiple", "name", "nohref", "noshade",
|
||||
"nowrap", "prompt", "readonly", "rel", "rev",
|
||||
"rows", "rowspan", "rules", "scope", "selected",
|
||||
"shape", "size", "span", "src", "start",
|
||||
"summary", "tabindex", "target", "title", "type",
|
||||
"usemap", "valign", "value", "vspace", "width"]
|
||||
-}
|
||||
|
||||
@ -1,11 +1,11 @@
|
||||
name: xss-sanitize
|
||||
version: 0.1.1
|
||||
version: 0.2.0
|
||||
license: BSD3
|
||||
license-file: LICENSE
|
||||
author: Greg Weber <greg@gregweber.info>
|
||||
maintainer: Greg Weber <greg@gregweber.info>
|
||||
synopsis: sanitize untrusted HTML to prevent XSS attacks
|
||||
description: sanitize untrusted HTML to prevent XSS attacks with Text.HTML.SanitizeXSS.sanitizeXSS. see README.md for more details
|
||||
description: run untrusted HTML through Text.HTML.SanitizeXSS.sanitizeXSS to prevent XSS attacks. see READMe.md <http://github.com/gregwebs/haskell-xss-sanitize> for more details
|
||||
|
||||
category: Web
|
||||
stability: Stable
|
||||
|
||||
Loading…
Reference in New Issue
Block a user