Few days ago I had very interesting task. Our customer required that we perform checking of so called file's "magic numbers" to determinate does uploaded file correspond to it's extension.
We are already allowed only to upload files with some predefined extensions (PDF, DOC ...). But this can not prevent some evil user to update an exe file after renaming it to PDF or DOC.
So first of all I will explain what are "magic numbers", and then I will show how we handle them.
File's magic numbers
You can think about magic number as a file id, they are strongly typed data used by application to identify data stored in the file. That is sequence of bytes on some special place in the file. Usually they are starting bytes of file but that is not case for every file type. Also it is possible that some application use more then one sequences of bytes to determinate file type.
So for example when you opening PDF document, PDF reader application will check content of file for these bytes and by doing that determines does file can be read by the application. You can ask yourself, why you need to have special bytes sequence in the file for identification, when you already have an extension. That is because the same file extensions can be shared by more then one application, but content of files can be totally different and incompatible between applications.
This should give you an idea what are "magic numbers" and why do we use them. In next few rows I will explain how we implemented extension checking.
Checking file extension
I will first start with a model class which is used for representing file, then i will show code and configuration of validator class.
I will continue with validator class. This class determines does given file has supported extensions. Supported file types are stored in the map:
The next method is used for setting list of supported file exstensions :
Main part of this class is the method for validation which determines does the list of supported file extensions contains the extension of the uploaded file.
Here is the code snipet of this method:
Conclusion
As you can see, it is not a big deal, but you can prevent some evil kids from the block to upload executable file as a PDF, DOC etc. This can reduce the amount of spam this application's users.
We are already allowed only to upload files with some predefined extensions (PDF, DOC ...). But this can not prevent some evil user to update an exe file after renaming it to PDF or DOC.
So first of all I will explain what are "magic numbers", and then I will show how we handle them.
File's magic numbers
You can think about magic number as a file id, they are strongly typed data used by application to identify data stored in the file. That is sequence of bytes on some special place in the file. Usually they are starting bytes of file but that is not case for every file type. Also it is possible that some application use more then one sequences of bytes to determinate file type.
So for example when you opening PDF document, PDF reader application will check content of file for these bytes and by doing that determines does file can be read by the application. You can ask yourself, why you need to have special bytes sequence in the file for identification, when you already have an extension. That is because the same file extensions can be shared by more then one application, but content of files can be totally different and incompatible between applications.
This should give you an idea what are "magic numbers" and why do we use them. In next few rows I will explain how we implemented extension checking.
Checking file extension
I will first start with a model class which is used for representing file, then i will show code and configuration of validator class.
1: /**
2: * Model Class used for describing file type.
3: *
4: * @author IMA
5: */
6: public static class FileType {
7: private String extension;
8: private byte[] magicBytes;
9: private int offset;
10: private String description;
11: /**
12: * File extension
13: */
14: public String getExtension() {
15: return extension;
16: }
17: public void setExtension(String extension) {
18: this.extension = extension;
19: }
20: /**
21: * Magic numbers which are used for file type detection
22: */
23: public byte[] getMagicBytes() {
24: return magicBytes;
25: }
26: public void setMagicWord(String magicWord) {
27: this.magicBytes = StringByteConvertor.convertStringToByte(magicWord);
28: }
29: /**
30: * Start position of magic numbers
31: */
32: public int getOffset() {
33: return offset;
34: }
35: public void setOffset(int offset) {
36: this.offset = offset;
37: }
38: /**
39: * Description of file type
40: */
41: public String getDescription() {
42: return description;
43: }
44: public void setDescription(String description) {
45: this.description = description;
46: }
47: @Override
48: public int hashCode() {
49: final int prime = 31;
50: int result = 1;
51: result = prime * result + ((description == null) ? 0 : description.hashCode());
52: result = prime * result + ((extension == null) ? 0 : extension.hashCode());
53: result = prime * result + Arrays.hashCode(magicBytes);
54: result = prime * result + offset;
55: return result;
56: }
57: @Override
58: public boolean equals(Object obj) {
59: if (this == obj)
60: return true;
61: if (obj == null)
62: return false;
63: if (getClass() != obj.getClass())
64: return false;
65: FileType other = (FileType) obj;
66: if (description == null) {
67: if (other.description != null)
68: return false;
69: } else if (!description.equals(other.description))
70: return false;
71: if (extension == null) {
72: if (other.extension != null)
73: return false;
74: } else if (!extension.equals(other.extension))
75: return false;
76: if (!Arrays.equals(magicBytes, other.magicBytes))
77: return false;
78: if (offset != other.offset)
79: return false;
80: return true;
81: }
82: }
As you can see this is a simple model class, which is used for describing the file. It holds information which is needed for detecting some file extension. From above code you can see that we have used only one byte array for holding magic numbers, which is not sufficient for some cases. I've already mentioned it, some file can use more then one byte sequences.I will continue with validator class. This class determines does given file has supported extensions. Supported file types are stored in the map:
1: // Map which holding possible file types by it's extension.
2: Map<String, Set<filetype>> supportedExtensions = new HashMap<String, Set<filetype>>();
The next method is used for setting list of supported file exstensions :
1: public void setSupportedFileExtension(List<filetype> supportedFileTypes) {
2: this.supportedFileTypes = supportedFileTypes;
3: if (supportedFileTypes == null) {
4: return;
5: }
6: for (FileType fileType : supportedFileTypes) {
7: String extension = fileType.getExtension();
8: Set<filetype> fileExtensions = supportedExtensions.get(extension);
9: if (fileExtensions == null) {
10: fileExtensions = new HashSet<filetype>();
11: }
12: fileExtensions.add(fileType);
13: supportedExtensions.put(extension, fileExtensions);
14: }
15: }
It acquires the list of supported extensions and stores it in the map. So later during validation we check only supported file typs for the given extension.Main part of this class is the method for validation which determines does the list of supported file extensions contains the extension of the uploaded file.
Here is the code snipet of this method:
1: /**
2: * Validate does file type correspond to file extension.
3: *
4: * @param file
5: * @return true or false
6: * @throws IOException
7: */
8: public boolean validateFileExtension(MultipartFile file) throws IOException {
9: if (supportedFileTypes == null || supportedFileTypes.isEmpty()) {
10: LOG.debug("List with supported files extension is empty");
11: return false;
12: }
13: String fileName = file.getOriginalFilename();
14: if (fileName == null || StringUtils.isEmpty(fileName)) {
15: return true;
16: }
17: String extension = (FilenameUtils.getExtension(fileName)).toUpperCase();
18: Set<filetype> fileTypes = supportedExtensions.get(extension);
19: if (fileTypes == null || fileTypes.isEmpty()) {
20: LOG.debug("Unsupported extension:" + extension);
21: return false;
22: }
23: byte[] fileContent = file.getBytes();
24: if (fileContent == null) {
25: return false;
26: }
27: for (FileType fileType : fileTypes) {
28: int offset = fileType.getOffset();
29: byte[] magicBytes = fileType.getMagicBytes();
30: if (fileContent.length >= offset + magicBytes.length) {
31: byte[] fileMagicBytes = Arrays.copyOfRange(fileContent, offset, offset + magicBytes.length);
32: if (Arrays.equals(magicBytes, fileMagicBytes)) {
33: return true;
34: }
35: }
36: }
37: return false;
38: }
In short, this method will do the following:
- Check if the supported files are configured, if they are proceed with next step otherwise it will return false.
- Check if the file extension is defined, in the case that there is some file it will continue with checking file extension.
- After that, the method will extract the file extension and check if the specified extension is in the map of supported extensions.
- At this point we start iteration through the list of file types for specified extension, and we are checking if there are some matching. When the matching is found, it will return true, meaning that file is supported, otherwise false will be result of method call.
If we want the validator to work, we need to provide it with a supported extensions, this project is a spring project so the logical place for doing that is xml configuration file. Here is the simple example of bean configuration for only one file extension:
1: <bean name="fileExtensionTypeValidator" class="validator.FileExtensionTypeValidator">
2: <property name="supportedFileExtension">
3: <list>
4: <bean class="validator.FileExtensionTypeValidator$FileType">
5: <property name="extension" value="DOCX" />
6: <property name="offset" value="0"/>
7: <property name="description" value="Microsoft Office Open XML Format (OOXML) Document"/>
8: <property name="magicWord" value="50 4B 03 04 14 00 06 00"/>
9: </bean>
10: </list>
11: </property>
12: </bean>
Nothing complicate, just a regular spring bean :).Conclusion
As you can see, it is not a big deal, but you can prevent some evil kids from the block to upload executable file as a PDF, DOC etc. This can reduce the amount of spam this application's users.
Comments
Post a Comment