Input data validation is a fundamental practice that web applications need to get right to protect users, their data, your data and your web site. But free text entry fields in web forms can be problematic - we all make mistakes, misread the instructions, type in the wrong fields, avoid filling in the mandatory items and so on. But how should the web application respond to these?
Users can submit data to web applications in many ways - file uploads, page requests, URL parameters, form parameters, file uploads, request headers, cookies and so on, but it is in conventional forms where we need to be somewhat lenient in how we respond to data validation issues.
Form data must be checked for integrity, business rules and validity of the following:
- Required (whether the field must be submitted, not null)
- Type (e.g. positive integer, text string, date, true/false boolean)
- Length/Range (e.g. numerical limits, number of decimal places, length of text, applicability of a date)
- Format (e.g. DD/MMM/YYYY for a date, international format for telephone numbers, allowed/disallowed characters)
- Character encoding/character set.
HTML forms allow radio buttons, checkboxes and drop down lists where the value(s) meant to be submitted are predetermined. In these cases it is reasonable that some of these failures are likely to be caused by malicious users tampering with values, or some fault in the application itself.
But what about text fields where users can enter any value in free format? I mentioned in Separate the Text from the Code a posting elsewhere about the display of form entry error messages. At what point to you reject the user input completely as malicious?
You may have mechanisms looking for dangerous input such as custom application code, an application filter module, security settings in your web server software or a web application firewall. If a user accidentally or without malice types text that could be construed as malicious because it matches a signature for something like cross site scripting or SQL injection, when they submit the form they could be presented with something much less friendly. This could occur if the control points are set to intervene rather than simply monitor, and so the 'friendly' message will never appear and they will see something else, like these three examples:
So be wary about validation and active blocking. In some cases, user data should be accepted and then sanitised before explaining the problem and possibly re-displaying their data appropriately encoded. The problem might be as simple as putting an item of text in a date field, or using an "incorrect" date format. If the filtering mechanisms are too strict, the usability of the application will be much degraded.
For each item of user input, try to identify:
- What is allowed to be received by the web application for each field
- Exact validation rules before the application can use, store or transmit the data value
- Which control points will be checking and enforcing these
- The sensitivity of the values
- How the data will be used subsequently
- How the data values should be encoded when output.
In some cases nos. 1 and 2 will be the same, but more often they will be different. Build these into your application test cases.
After all, some people live near "Alert Street" and "Script Drive":
For more information on integrity checks, validation and business rules, see data validation from OWASP's Development Guide. I'll discuss client-side validation, whitelisting and blacklisting in future posts.