find: LOCATE02 Database Format

 
 4.2.1 LOCATE02 Database Format
 ------------------------------
 
 'updatedb' runs a program called 'frcode' to "front-compress" the list
 of file names, which reduces the database size by a factor of 4 to 5.
 Front-compression (also known as incremental encoding) works as follows.
 
    The database entries are a sorted list (case-insensitively, for
 users' convenience).  Since the list is sorted, each entry is likely to
 share a prefix (initial string) with the previous entry.  Each database
 entry begins with an offset-differential count byte, which is the
 additional number of characters of prefix of the preceding entry to use
 beyond the number that the preceding entry is using of its predecessor.
 (The counts can be negative.)  Following the count is a null-terminated
 ASCII remainder - the part of the name that follows the shared prefix.
 
    If the offset-differential count is larger than can be stored in a
 byte (+/-127), the byte has the value 0x80 and the count follows in a
 2-byte word, with the high byte first (network byte order).
 
    Every database begins with a dummy entry for a file called
 'LOCATE02', which 'locate' checks for to ensure that the database file
 has the correct format; it ignores the entry in doing the search.
 
    Databases cannot be concatenated together, even if the first (dummy)
 entry is trimmed from all but the first database.  This is because the
 offset-differential count in the first entry of the second and following
 databases will be wrong.
 
    In the output of 'locate --statistics', the new database format is
 referred to as 'LOCATE02'.