About

This book serves as a reference and guide for the fidx tool.

Introduction

fidx (file indexer) is a tool for indexing file archives.

The primary goal of this tool is to assist in maintaining list of hashes for files in backup archives, but in addition it can tag files and deduplicate files based on their hash.

Known limitations

  • Only supports unicode compliant pathnames (non-complient file system entries are skipped).
  • Deduplication is currently unimplemented, though the tool can report duplicates.
  • Unsure if deduplication will be supported on Windows.

Installation

If fidx is installed from crates.io then simply run:

$ cargo install fidx

To install from source, clone the repository, open a work directory and run cargo install --path . in it:

$ fossil clone --workdir fidx https://repos.qrnch.tech/pub/fidx fidx.fossil
$ cd fidx
$ cargo install --path .

Initialization

The first step in using fidx on a directory tree is the initialize a state database. This is done by running fidx init in the directory tree's base directory.

If the tool will be used maintain checksums on an external disk which is mounted on /Volumes/backup_archive, then run:

$ cd /Volumes/backup_archive
$ fidx init

All the entries in the database will be relative to the directory where the database was initialized.

Hashing

In order to not miss bit rot due to degenerating storage media the checksum database must only be updated when there were conscious modifications made to the file archive. fidx accomplishes this by never recalculating hashes for files which do not have an altered modification timestamp1 in the filesystem.

Ignore lists

fidx supports ignore lists which can be used to ignore filesystem entries by glob expressions. There's no way to manage ignore entries using the fidx command line tool. Instead, users must manipulate the database directly using their favorite sqlite database editor, such as the standard sqlite3 command line tool.

The default ignore list for a newly initialized fidx database is:

$ sqlite3 .findex
sqlite> .mode column
sqlite> .header on
sqlite> SELECT * FROM ignore;
id  path
--  -------------
1   .findex
2   .findex-shm
3   .findex-wal
4   .findexer.log
5   **/.*.swp

Updating

To update the hash database run the subcommand update:

$ fidx update

This must be run from the base directory (i.e. where the fidx initialization was performed, and the state database resides).

Verify

To verify the integrity of the archive against the stored database run the subcommand verify:

$ fidx verify

The verify subcommand will verify all files from the current subdirectory within the managed tree. To verify the entire archive the verify subcommand must be run from the base directory (the directory where the fidx database resides).

An optional argument can be passed to the verify subcommand to specify a subset to verify. If a tree is managed under /backups, which contains a subdirectory foo, which in turn contains a subdirectory bar (i.e. /backups/foo/bar) the following would only verify files under /backups/foo/bar:

$ cd /backups/foo
$ fidx verify bar

1 In other words: fidx uses the file's modification time to detect intended updates of files. Don't try to do anything creative and abnormal with mtimes which breaks this imporant assumption.

Tagging

The tagging system in fidx is simple (and limited), but it has one particular quirk which can cause some confusion: Tags are internally associated with hashes, not file entries. This is done based off the assumption that the tags are used to describe the contents of a file, which has two benefits:

  • If a file is renamed1 (without changing its contents) it will not lose its tags.
  • If several files share the same contents, only one will need to be tagged but all of the files will gain the tag(s).

Tags are stored in the fidx tree database, and thus are local to the tree.

Managing tags

In order to tag files there needs to be tags in the database. To add tags use the subcommand add-tag (more than one tag can be added at a time):

$ fidx add-tag some-tag another

(Each tag must be unique).

Once there are tags in the database, use the subcommand list-tags to list them:

$ fidx list-tags
    2 another
    1 some-tag

A tag can be renamed using the rename-tag subcommand. The following would rename a tag called from-name to to-name:

$ fidx rename-tag from-name to-name

(Only one tag can be renamed at a time)

Tags can be removed using the remove-tag subcommand:

$ fidx remove-tag some-tag another

Tagging/Untagging files

Note: It is important to only manage tagging/untagging in a tree which has not been modified since last database update since it associates tags with hashes. Always run and update before managing tag associations in a tree.

To attach one or more tags to a file use the subcommand tag, which takes as its first argument the name of the file to tag, and the remaining arguments are tag names:

$ fidx tag fossil-repos-2020.tar scm fossil source-archives

To inspect what tags a file is associated with use the subcommand tags:

$ fidx tags fossil-repos-2020.tar
list tags for fossil-repos-2020.tar
   2 fossil
   6 scm
   7 source-archives

Removing tags from a file can be done using the untag subcommand, which has the same format as tag; first argument is the file name and the following arguments are the tags to disassociate with that file:

$ fidx untag fossil-repos-2020.tar scm

To search for a file based on tags use the subcommand search:

$ fidx search some-tag another

The search is very basic, and only supports implicit AND boolean searches, meaning that only files with all matching tags will be returned.


1 fidx has no concept of a file/directory rename; it treats renames as a deletion and a new file, where the deletion would cause the tag association to disappear if tags where associated with file entries rather than their contents.

Deduplication

Caution: The deduplication feature in fidx is built around its database of hashes, which means that it's important not to perform deduplication unless the database is known to be up to date with the current state of the file system tree.

Listing duplicates

To list all known duplicates use the subcommand dups:

$ fidx dups

This list will not include entries which have already been deduplicated.

Deduplicating

To perform a deduplication use the subcommand dedup:

$ fidx dedup