HomeGeneralFile TypesWhat algorithm does PureCM use to 'guess' the initial file type when adding a file?

11.4. What algorithm does PureCM use to 'guess' the initial file type when adding a file?

When adding a file to PureCM the file will automatically be assigned a file type. You can manually change the file type, but if adding a lot of files it can be a laborious process to go through checking each file. The algorithm PureCM uses to determine the file type is described below.

1) If the file matches a filter for a file type with the 'Force this File Type to be Used' then this file type will be used. So if you have created a file type 'text/xml', set its filter to be '*.xml' and set the 'Force this File Type to be Used' flag then the file file1.xml will always use the 'text/xml' file type.

Otherwise...

2) PureCM will read the first chunk of the file to determine the encoding of the file. If the 'Options | General | Read entire file to determine file type when adding' is set then the whole file is read, otherwise the first chunk of the file is read. This can be important because some files (e.g. pdf files) may appear to be ordinary text files initially, but actually have some binary content at the end of the file. But if adding lots of files it is obviously quicker to only read the first chunk of the file (rather than read the whole file). Note that if you can identify which files cause this problem you could just set the 'Force this File Type to be Used' flag and continue to only read the first chunk.

3) Iterate through each file type to find one with a filter which matches the file path and the encoding which matches the file type encoding. So for example if you have an file type 'text/xml' with a filer '*.xml' and encoding UTF-8. This will not be used for the file file1.xml which uses UTF-16 encoding - because they have different encodings. Also note that when detecting the encoding it is not always possible to distinguish between local encoding and UTF-8 encoding. In this case the algorithm will search for file types with either encoding.

If no file type matched then...

4) If the encoding was local use 'text/plain'. If the encoding was UTF-8 use 'text/utf8'. If the encoding was UTF-16 use 'text/utf16'. Otherwise use 'application/generic'.

This page was: Helpful | Not Helpful