Photo by Tomasz Sroka / Unsplash

File mime type validation in Django

django Mar 3, 2020

Let's say that we are building an app through which we enable users to upload files to our server. At the client-side, we can have some primary checks for file validation but we cannot rely on it. We certainly do not want the users to upload malicious files and compromise our system. Extra precautions are required while building such applications. This post shows one of the solutions to how we can approach this problem.

A simple model for such an app would look something like this -

# models.py

from django.db import models
from django.contrib.auth import get_user_model

from .validators import FileMimeValidator

User = get_user_model()


class ProfilePictures(models.Model):
    user = models.ForeignKey(
        User, 
        related_name="profile_pictures", 
        on_delete=models.CASCADE
        )

    image = models.FileField(
        "image", 
        upload_to="path/to/upload", 
        validators=[
            FileMimeValidator()
        ])

    objects = models.Manager()

    def __str__(self):
        return f"{self.user.get_full_name()} 's profile picture"

Here, FileMimeValidator will be responsible for validating the file that is being uploaded. In our case, we can use python-magic python library for validation. Basically, python-magic helps us to identify file types using libmagic library. Let us now see how our FileMimeValidator class will look like -

# validators.py

from pathlib import Path

import magic
from django.core.exceptions import ValidationError
from django.utils.deconstruct import deconstructible


@deconstructible
class FileMimeValidator:
    messages = {
        "malicious_file": "File looks malicious. Allowed extensions are: '%(allowed_extensions)s'.",
        "not_supported": "File extension '%(extension)s' is not allowed. "
                         "Allowed extensions are: '%(allowed_extensions)s'."
    }
    code = 'invalid_extension'
    ext_cnt_mapping = {
        'jpeg': 'image/jpeg',
        'png': 'image/png'
    }

    def __init__(self, ):
        self.allowed_extensions = [allowed_extension.lower() for 
            allowed_extension in self.ext_cnt_mapping.keys()]

    def __call__(self, data):
        extension = Path(data.name).suffix[1:].lower()
        content_type = magic.from_buffer(data.read(1024), mime=True)
        if extension not in self.allowed_extensions:
            raise ValidationError(
                self.messages['not_supported'],
                code=self.code,
                params={
                    'extension': extension,
                    'allowed_extensions': ', '.join(self.allowed_extensions)
                }
            )
        if content_type != self.ext_cnt_mapping[extension]:
            raise ValidationError(
                self.messages['malicious_file'],
                code=self.code,
                params={
                    'allowed_extensions': ', '.join(self.allowed_extensions)
                }
            )

    def __eq__(self, other):
        return (
            isinstance(other, self.__class__) and
            self.allowed_extensions == other.allowed_extensions and
            self.message == other.message and
            self.code == other.code
        )

This is fairly a straight forward solution. We allow users to upload images with extensions - jpeg and png. Then we are performing these two checks -

  • Firstly, we are getting the extension of the file and matching it against the allowed extensions.
  • Then we match for the file signature by reading the first few bytes of that file. These first few bytes of the file are known as a file signature, magic numbers or Magic Bytes. Every file has its unique signature and it helps us in identifying the file type. For example, a pdf document starts with %PDF-.

We raise a ValidationError if either of the checks fails.

Now let us write some tests for the code that we have in place for validation. Below is the factories.py file which we can use across our tests. For the sake of simplicity, I am using django_dynamic_fixture. Do check this library out if you are not aware of it. It is a lifesaver! Using it will save you a lot of time while writing tests.

# tests/factories.py

from django.apps import apps
from django.conf import settings
from django.core.files.uploadedfile import SimpleUploadedFile
from django_dynamic_fixture import G


def create_user(**kwargs):
     User = apps.get_model(settings.AUTH_USER_MODEL)
     user = G(User, **kwargs)
     user.set_password(kwargs.get('password', 'test'))
     user.save()
     return user

def generate_file(**kwargs):
    file = SimpleUploadedFile(**kwargs)
    return file

Assuming we have exposed the endpoints via django-rest-framework and our pytest fixtures are in place. We will have our test cases something like this -

# tests/integration/test_image_upload.py

import pytest
from .. import factories as f


def test_image_upload_for_empty_file(client):
    user = f.create_user()
    url = reverse("image-upload")
    payload = {
        "user": user.id,
        "image": f.generate_file(
            name="image.jpg",
            content=b"",
            content_type="image/jpeg"
        )
    }
    response = client.post(url, payload, format="multipart")
    assert response.data['message']['resume'][0] == "The submitted file is empty."


def test_image_upload_for_unsupported_file_extension(client):
    user = f.create_user()
    url = reverse("image-upload")
    payload = {
        "user": user.id,
        "image": f.generate_file(
            name="resume.pdf",
            content=b"%PDF-",
            content_type="application/pdf"
        )
    }
    response = client.post(url, payload, format="multipart")
    assert response.data['message']['resume'][0] == \
        "File extension 'pdf' is not allowed. Allowed extensions are: 'jpeg, png'."


def test_image_upload_for_malicious_file(client):
    user = f.create_user()
    url = reverse("image-upload")
    payload = {
        "user": user.id,
        "image": f.generate_file(
            name="pwnscript.sh",
            content=b"malicious content!!!!",
            content_type="application/x-sh"
        )
    }
    response = client.post(url, payload, format="multipart")
    assert response.data['message']['resume'][0] == \
        "File looks malicious. Allowed extensions are: 'jpeg, png'."


def test_image_upload(client):
    user = f.create_user()
    url = reverse("image-upload")
    payload = {
        "user": user.id,
        "image": f.generate_file(
            name="landscape.png",
            content=b".PNG....",
            content_type="image/png"
        )
    }
    response = client.post(url, payload, format="multipart")
    assert response.status_code == 200

Hope this post was helpful.

Related reads

django-dynamic-fixture

Magic numbers and List of File Signatures

Tags