Skip to main content

Checking file's "magic numbers"

Few days ago I had very interesting task. Our customer required that we perform checking of so called file's "magic numbers" to determinate does uploaded file correspond to it's extension. 
We are already allowed only to upload files with some predefined extensions (PDF, DOC ...). But this can not prevent some evil user to update an exe file after renaming it to PDF or DOC.
So first of all I will explain what are "magic numbers", and then I will show how we handle them.

File's magic numbers

You can think about magic number as a file id, they are  strongly typed data  used by application to identify data stored in the file. That is sequence of bytes on some special place in the file. Usually they are starting bytes of file but that is not case for every file type. Also it is possible that some application use more then one sequences of bytes to determinate file type.
So for example when you opening PDF document, PDF reader application will check content of file for these bytes and by doing that determines does file can be read by the application. You can ask yourself, why you need to have special bytes sequence in the file for identification, when you already have an extension. That is because the same file extensions can be shared by more then one application, but content of files can be totally different and incompatible between applications.
This should give you an idea what are "magic numbers" and why do we use them. In next few rows I will explain how we implemented extension checking.

Checking file extension

I will first start with a model class which is used for representing file, then i will show code and configuration of validator  class. 
1:  /**  
2:   * Model Class used for describing file type.  
3:   *   
4:   * @author IMA  
5:   */  
6:   public static class FileType {  
7:   private String extension;  
8:   private byte[] magicBytes;  
9:   private int offset;  
10:   private String description;  
11:   /**  
12:    * File extension  
13:    */  
14:   public String getExtension() {  
15:     return extension;  
16:   }  
17:   public void setExtension(String extension) {  
18:    this.extension = extension;  
19:   }  
20:   /**  
21:    * Magic numbers which are used for file type detection  
22:    */  
23:   public byte[] getMagicBytes() {  
24:     return magicBytes;  
25:   }  
26:   public void setMagicWord(String magicWord) {  
27:     this.magicBytes = StringByteConvertor.convertStringToByte(magicWord);  
28:   }  
29:   /**  
30:    * Start position of magic numbers  
31:    */  
32:   public int getOffset() {  
33:     return offset;  
34:   }  
35:   public void setOffset(int offset) {  
36:     this.offset = offset;  
37:   }  
38:   /**  
39:    * Description of file type  
40:    */  
41:   public String getDescription() {  
42:     return description;  
43:   }  
44:   public void setDescription(String description) {  
45:     this.description = description;  
46:   }  
47:   @Override  
48:   public int hashCode() {  
49:     final int prime = 31;  
50:     int result = 1;  
51:     result = prime * result + ((description == null) ? 0 : description.hashCode());  
52:     result = prime * result + ((extension == null) ? 0 : extension.hashCode());  
53:     result = prime * result + Arrays.hashCode(magicBytes);  
54:     result = prime * result + offset;  
55:     return result;  
56:   }  
57:   @Override  
58:   public boolean equals(Object obj) {  
59:    if (this == obj)  
60:     return true;  
61:    if (obj == null)  
62:     return false;  
63:    if (getClass() != obj.getClass())  
64:     return false;  
65:    FileType other = (FileType) obj;  
66:    if (description == null) {  
67:     if (other.description != null)  
68:       return false;  
69:     } else if (!description.equals(other.description))  
70:       return false;  
71:     if (extension == null) {  
72:       if (other.extension != null)  
73:        return false;  
74:     } else if (!extension.equals(other.extension))  
75:       return false;  
76:     if (!Arrays.equals(magicBytes, other.magicBytes))  
77:       return false;  
78:     if (offset != other.offset)  
79:       return false;  
80:     return true;  
81:   }  
82:   }  
As you can see this is a simple model class, which is used for describing the file. It holds information which is needed for detecting some file extension. From above code you can see that we have used only one byte array for holding magic numbers, which is not sufficient for some cases. I've already mentioned it, some file can use more then one byte sequences.
I will continue with validator class. This class determines does given file has supported extensions. Supported file types are stored in the map:

1:  // Map which holding possible file types by it's extension.  
2:  Map<String, Set<filetype>> supportedExtensions = new HashMap<String, Set<filetype>>();  

The next method is used for setting list of supported file exstensions :
1:  public void setSupportedFileExtension(List<filetype> supportedFileTypes) {  
2:   this.supportedFileTypes = supportedFileTypes;  
3:   if (supportedFileTypes == null) {  
4:    return;  
5:   }  
6:   for (FileType fileType : supportedFileTypes) {  
7:    String extension = fileType.getExtension();  
8:    Set<filetype> fileExtensions = supportedExtensions.get(extension);  
9:    if (fileExtensions == null) {  
10:    fileExtensions = new HashSet<filetype>();  
11:    }  
12:    fileExtensions.add(fileType);  
13:    supportedExtensions.put(extension, fileExtensions);  
14:   }  
15:   }  
It acquires the list of supported extensions and stores it in the map. So later during validation we check only supported file typs for the given extension.
Main part of this class is the method for validation which determines does the list of supported file extensions contains the extension of the uploaded file.
Here is the code snipet of this method:
1:  /**  
2:   * Validate does file type correspond to file extension.  
3:   *   
4:   * @param file  
5:   * @return true or false  
6:   * @throws IOException  
7:   */  
8:   public boolean validateFileExtension(MultipartFile file) throws IOException {  
9:   if (supportedFileTypes == null || supportedFileTypes.isEmpty()) {  
10:    LOG.debug("List with supported files extension is empty");  
11:    return false;  
12:   }  
13:   String fileName = file.getOriginalFilename();  
14:   if (fileName == null || StringUtils.isEmpty(fileName)) {  
15:    return true;  
16:   }  
17:   String extension = (FilenameUtils.getExtension(fileName)).toUpperCase();  
18:   Set<filetype> fileTypes = supportedExtensions.get(extension);  
19:   if (fileTypes == null || fileTypes.isEmpty()) {  
20:    LOG.debug("Unsupported extension:" + extension);  
21:    return false;  
22:   }  
23:   byte[] fileContent = file.getBytes();  
24:   if (fileContent == null) {  
25:    return false;  
26:   }  
27:   for (FileType fileType : fileTypes) {  
28:    int offset = fileType.getOffset();  
29:    byte[] magicBytes = fileType.getMagicBytes();  
30:    if (fileContent.length >= offset + magicBytes.length) {  
31:    byte[] fileMagicBytes = Arrays.copyOfRange(fileContent, offset, offset + magicBytes.length);  
32:    if (Arrays.equals(magicBytes, fileMagicBytes)) {  
33:     return true;  
34:    }  
35:    }  
36:   }  
37:   return false;  
38:   }  
In short, this method will do the following:
  1. Check if the supported files are configured, if they are proceed with next step otherwise it will return false.
  2. Check if the file extension is defined, in the case that there is some file it will continue with checking file extension.
  3. After that, the method will extract the file extension and check if the specified extension is in the map of supported extensions.
  4. At this point we start iteration through the list of file types for specified extension, and we are checking if there are some matching. When the matching is found, it will return true, meaning that file is supported, otherwise false will be result of method call.
If we want the validator to work, we need to provide it with a supported extensions, this project is a spring project so the logical place for doing that is xml configuration file. Here is the simple example of bean configuration for only one file extension:

1:  <bean name="fileExtensionTypeValidator" class="validator.FileExtensionTypeValidator">  
2:   <property name="supportedFileExtension">  
3:   <list>  
4:    <bean class="validator.FileExtensionTypeValidator$FileType">  
5:    <property name="extension" value="DOCX" />  
6:    <property name="offset" value="0"/>  
7:    <property name="description" value="Microsoft Office Open XML Format (OOXML) Document"/>  
8:    <property name="magicWord" value="50 4B 03 04 14 00 06 00"/>  
9:    </bean>  
10:   </list>  
11:   </property>  
12:  </bean>  
Nothing complicate, just a regular spring bean :).


As you can see, it is not a big deal, but you can prevent some evil kids from the block to upload executable file as a PDF, DOC etc. This can reduce the amount of spam this application's users.

Popular posts from this blog

Running Spring Boot Web App on the Random Port from Port Range

By default the spring boot web application is listening on the port 8080 for the incoming connection.

This behavior can be changed by providing server.port property value during starting of the application or as part of the or through the code by implementing EmbeddedServletContainerCustomizer.

But it would be even better if we could specified a range of the ports which can be used for the starting the application.

It would be great if I could specify a property like server.portRange=8100..8200 to define a list of the port on which I want to start my service.

In this blog post I will describe how this can be done.

Simple Workflow Engine With Spring

Few months ago, during working on one of the company project, we had need to developed  REST services which is used for sending an email depending on data sent by client application. During developing this service we decide to create simple workflow engine which will be charged for sending an email, but also this engine can be used for any kind of simple flows. In this article i will explain step by step how you can implement your simple workflow engine which can handle sequence flow.