File Upload Validation Techniques

File upload filtering is an extremely important part of web application security that is also notoriously hard to get right. And unfortunately the stakes are high, as vulnerabilities associated with your file upload functionality can quickly turn into critical, exploitable issues with impacts that include remote code execution on the underlying web server. So let’s summarize some of the common file upload validation techniques that can and should be used to thwart many of the common file upload filtering bypasses.

File Extension Validation

Your first line of defense against someone uploading dangerous files to your web application is extension filtering. This is basically where you are checking the extension of the presented file and making sure it is one that you should be accepting. This check is non-trivial, as there are multiple ways to evade this filtering depending on how you are implementing it.

First and foremost, you want to be using an allow list for extensions that your application should be accepting, as opposed to a list of extensions that are explicitly denied. It is nearly impossible to generate a blacklist that protects from every dangerous extension because there are ways to mask the extension from filters and extensions that you’ve probably never heard of that can be abused. Consider things like .php5 or .phtml that are executed like PHP pages. On the other hand, your list of allowable extensions should be as limited as possible and based on what is required for business operations.

The next thing you’ve got to worry about with file extension filtering is the use of two known bypass techniques. One is the use of double extensions, as .jpg.php may pass certain filters because it contains .jpg as the first extension encountered in the string (if your regex is “\.jpg”, for example), but will execute like a PHP page because that is the last extension. Alternatively, an attacker could use something like a null byte (%00) to manipulate what extension actually gets saved with a file once it has passed extension filtering. Submitting a document with “.php%00.jpg” will result in a .php file actually getting saved.

The TLDR here is: use an explicit allow list of file extensions that is as restrictive as possible and ensure you are leveraging known good extension filtering code, rather than trying to build it yourself. OWASP’s Input Validation Cheatsheet is a good resource.

Content-Type Validation

Anytime a file is uploaded by a user, there is a content-type header associated with it. This could be something like “image/jpeg” or “application/php” that is associated with the request carrying the file to be uploaded. This header should not be trusted, as it can be easily manipulated by a user uploading a file, but it can provide a quick sanity check that the uploaded file matches its identified extension. Again, this isn’t going to be anything ground-breaking from a prevention perspective, but it does raise the bar slightly on the required complexity for a successful file upload bypass. Again, you’re preferably using an explicit allow list for the MIME-types you are accepting, but then you are going a step further and making sure the declared MIME-type matches the extension of the file upload.

Signature Validation

Another complimentary check you can make is to verify that the uploaded file’s signature also matches its extension and content-type. This could be done by reading the first 4 – 6 bytes of a file (as an example, a GIF’s first 6 bytes should be \x47\x49\x46\x38\x37\x61), which are reserved as identifying bytes for the content that is about to follow. While this is again trivial to spoof, it makes an attack more complex.

File Name Sanitization

There are a ton of different attacks that can abuse an application’s file naming for uploaded files. These attacks will make use of things like directory traversals, special/control characters, or the use of restricted filenames to try and induce dangerous behavior. Without going down the rabbit hole of how these attacks works, suffice to say you should always be using a GUID-style renaming model to generate a new filename for any uploaded files. These files should be stored on a different host if possible (or even a properly secured S3 bucket or Azure blob) to ensure there is separation between them and the web server itself, to reduce the impact of a compromise.

If for some reason you have a legitimate need for clients to specify their own file names, you really need strict input validation on both the client-side and server-side to avoid leaving yourself exposed to attacks. Utilize a maximum length for file names and create a list of allowable characters for file names (e.g. alphanumeric, periods, underscores, hyphens).

File Content Validation

There are some additional security controls that can be implemented for certain types of uploaded files, depending on what you’re accepting. For example, image rewriting can destroy malicious content injected into an image and Microsoft documents have validation libraries that can be used to help legitimize their contents. You could even perform manual review of uploaded files if there was a low volume or you had sufficient resources available. Then there are third-party libraries you can consider that scan for known virus signatures, such as VirusTotal, to avoid accepting any known malware.

File Parsing Library Vulnerabilities

Finally, slightly out of the realm of actual file input validation but still certainly related to security risks caused by file uploads, we should discuss file parsers. Depending on the type of uploads you are accepting and what you are doing with them, you may have a file or image parser that handles uploaded documents in some way. You should always keep a close eye on these parsing libraries and make sure you are using the latest version. You should subscribe to relevant news sources or periodically check for publicly-disclosed vulnerabilities associated with your parsing libraries. Any issues discovered in this area tend to be critical (e.g. ImageTragick) and, as such, have to be monitored closely.

So as you can see, one simple feature of a web application can have a lot of attack surface and many security considerations associated with it. All of these vulnerabilities are explored as part of a web application penetration test, but you should really have a good understanding of these potential issues and how you are going to mitigate them during the development process. If you have any questions or want to see how your implementation defends against file upload bypasses, let us know and we’d be happy to help!