Thursday, April 28, 2011

Bad Characters to Use in Web-based Filenames

RFC 2396 is the governing document for "Uniform Resource Identifiers (URI): Generic Syntax." This RFC defines what can and can't be used in a URI, as well as what shouldn't be used.

First, section "2.2. Reserved Characters" contains the following list of reserved characters:

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","

Second, section "2.4.3. Excluded US-ASCII Characters" contains the following lists of delimiter and unwise characters:

delims = "<" | ">" | "#" | "%" | <"> unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
ToDo: Add a Regular Expression to explicitly exclude these characters in SanitizeText

No comments: