# File System Indexer and Mount Configuration
The file system indexer (FSIndexer) or file indexer maps one or more file system(s) in the database. It regularly checks if files have been added, deleted or updated in the file system and updates the database. The start folder for mapping is the mount point.
Supported are:
- up to 20.000.000 files and folders
- read-only file systems
- nodeId of the file system so moving files will be recognized
- cluster installations
- multiple mount points
Also note:
- Mount points are configured as relative paths
- Hidden files and folders will not be imported
- Files with "Combining Characters" will not be imported
In order to speed up the synchronization and import process, imported folders are distributed in several so-called index parts which are processed in parallel.
File system indexer specifications can be configured in the mount configuration admin snap-in.
# Mapping Data with the Database
When started, the file system indexer compares the data from the file system with the database following these steps:
During synchronization, files are distributed into a positive list (files found or changed in the file system), and a negative list (files not found in the file system):
The positive list is built from the files searched like this:
- by relative_path, mount and file_id
- by relative_path, mount and fs_hash
- by mount and file_id (except hardlink)
- by relative_path, mount and name
The negative list consists of all files that were not found in the file system. They will be deleted from the database.
Please note: Without an enabled mount, it is not possible to find a file during the synchronization.
# Cache
The information about all imported folders is stored in folder cefs/{folder module name}/file_system_indexer/{mount_name}
. Deleting the cache files causes the file indexer to perform a complete synchronization.
# Mount Configuration
A configured and enabled mount is required for the file system indexer to work.
A mount configuration is stored in the file system under file/mounts
, e.g. default mount point file data.xml.
To configure the default data
or create a new mount, go to admin snap-in DAM/Mount configuration
:
Field | XML | Description |
---|---|---|
Mount-ID | Assign an identification name for this mount, e.g. "data" (max. 28 characters). This name will be the XML file's name. Please note: Because multiple mount points are supported, all must have a unique name. | |
Enable mount | enabled | Enable or disable this mount in the system. If set to "false", the mount is shown, but files or folders cannot be created, edited, deleted, copied, or moved. |
Mount read-only | read_only | If set to "true", the file system permission is set to "read only". Files and folders cannot be edited or deleted (but still imported). |
File module name | file_module | Read-only. Set to "Files" by default. |
Folder module name | folder_module | Read-only. Set to "Folder" by default. |
Root directory | base_path | Define the root directory here. The file structure will be imported from this directory, e.g. /4allportal/data/data . |
Event handler | change_handlers | Define the java classes to react to changes in the file structure, e.g. create or change an entry in the database for folders and/or files. Default:com.cm4ap.ce.fsi.handler.FileChangeHandler com.cm4ap.ce.fsi.handler.LogOutputHandler |
Ignore folders (regex) | exclude_folders | Define all folders that should be ignored during the import process. They are excluded from the DAM (default regex pattern). |
Ignore files (regex) | exclude_files | Define all files that should be ignored during the import process. They are excluded from the DAM (default regex pattern). |
Execution interval (seconds) | period | Define the repeat rate with which the file system indexer should scan and synchronize (in seconds). Default: 43.200 seconds. To make uploads available faster, choose an interval of 60 seconds, for example. |
Use File-IDs | use_fileid | If set to "true", unique file-IDs (Windows)/inodes (Linux) of files and folders are enabled (more details). |
Use Volume-IDs | use_volumeid | If set to "true", a volumeID is used to identify the mount, so the file system indexer indexes the correct mount even after renaming or moving the network drive (more details). |
Minimum index part size | min_index_part_size | minimum number of folders in one index part |
Maximum index part size | max_index_part_size | maximum allowed number of folders in one index part |
Use milliseconds for file modification time | If set to "true", milliseconds will be considered when indexing files and the file modification time (recommended). Please note: When using a Docker Image, note that Docker trims milliseconds when packaging files. |
# Default Regex Pattern
Per default, there is a predefined "exclude" pattern, which always applies to folders and files:
^([.~].*)|(.*[\\/][.~]).*$
Files and folders matching this regex pattern will be ignored during the import process or, if already imported, not be shown in the 4ALLPORTAL.
# File-IDs and Volume-IDs
Unique file-IDs/inodes are required for SMB- and WebDAV-connections. Not all file systems support them. Before setting to "true", check if your file system supports unique fileIDs. If not supported, the filesystem indexer stops.
If use_fileid
is set to "true", a .4apmount
file is created in the root directory of the mount on the first import process.
A volumeID is stored on the connected network drive (mount) to make a mount clearly recognizable for the file system indexer. In case the shared network drive has no, or a wrong volumeID, the filesystem indexer stops.
# Default Mount XML
The default mount XML in file/mounts
looks like this:
<mount enabled="true">
<file_module>file</file_module>
<folder_module>folder</folder_module>
<base_path>/4allportal/data/data</base_path>
<min_index_part_size>15</min_index_part_size>
<max_index_part_size>150</max_index_part_size>
<read_only>false</read_only>
<use_fileid>true</use_fileid>
<use_volumeid>true</use_volumeid>
<exclude_folders> <!-- regex pattern to exclude folder -->
<exclude_folder>(^|.+[/])layout($|[/].*$)</exclude_folder>
</exclude_folders>
<exclude_files> <!-- regex pattern to exclude file -->
<exclude_file>4ap_fileimport[.]xml</exclude_file>
<exclude_file>Thumbs[.]db</exclude_file>
</exclude_files>
<period>43200</period> <!-- repeat in n seconds -->
<change_handlers>
<change_handler>com.cm4ap.ce.fsi.handler.FileChangeHandler</change_handler>
<change_handler>com.cm4ap.ce.fsi.handler.LogOutputHandler</change_handler>
</change_handlers>
</mount>