groff: Gtroff Internals

 
 5.32 'gtroff' Internals
 =======================
 
 'gtroff' processes input in three steps.  One or more input characters
 are converted to an "input token".(1)  (⇒Gtroff
 Internals-Footnote-1) Then, one or more input tokens are converted to
 an "output node".  Finally, output nodes are converted to the
 intermediate output language understood by all output devices.
 
    Actually, before step one happens, 'gtroff' converts certain escape
 sequences into reserved input characters (not accessible by the user);
 such reserved characters are used for other internal processing also -
 this is the very reason why not all characters are valid input.  ⇒
 Identifiers, for more on this topic.
 
    For example, the input string 'fi\[:u]' is converted into a character
 token 'f', a character token 'i', and a special token ':u' (representing
 u umlaut).  Later on, the character tokens 'f' and 'i' are merged to a
 single output node representing the ligature glyph 'fi' (provided the
 current font has a glyph for this ligature); the same happens with ':u'.
 All output glyph nodes are 'processed', which means that they are
 invariably associated with a given font, font size, advance width, etc.
 During the formatting process, 'gtroff' itself adds various nodes to
 control the data flow.
 
    Macros, diversions, and strings collect elements in two chained
 lists: a list of input tokens that have been passed unprocessed, and a
 list of output nodes.  Consider the following the diversion.
 
      .di xxx
      a
      \!b
      c
      .br
      .di
 
 It contains these elements.
 
 node list            token list   element number
                                   
 line start node      --           1
 glyph node 'a'       --           2
 word space node      --           3
 --                   'b'          4
 --                   '\n'         5
 glyph node 'c'       --           6
 vertical size node   --           7
 vertical size node   --           8
 --                   '\n'         9
 
 Elements 1, 7, and 8 are inserted by 'gtroff'; the latter two (which are
 always present) specify the vertical extent of the last line, possibly
 modified by '\x'.  The 'br' request finishes the current partial line,
 inserting a newline input token, which is subsequently converted to a
 space when the diversion is reread.  Note that the word space node has a
 fixed width that isn't stretchable anymore.  To convert horizontal space
 nodes back to input tokens, use the 'unformat' request.
 
    Macros only contain elements in the token list (and the node list is
 empty); diversions and strings can contain elements in both lists.
 
    Note that the 'chop' request simply reduces the number of elements in
 a macro, string, or diversion by one.  Exceptions are "compatibility
 save" and "compatibility ignore" input tokens, which are ignored.  The
 'substring' request also ignores those input tokens.
 
    Some requests like 'tr' or 'cflags' work on glyph identifiers only;
 this means that the associated glyph can be changed without destroying
 this association.  This can be very helpful for substituting glyphs.  In
 the following example, we assume that glyph 'foo' isn't available by
 default, so we provide a substitution using the 'fchar' request and map
 it to input character 'x'.
 
      .fchar \[foo] foo
      .tr x \[foo]
 
 Now let us assume that we install an additional special font 'bar' that
 has glyph 'foo'.
 
      .special bar
      .rchar \[foo]
 
 Since glyphs defined with 'fchar' are searched before glyphs in special
 fonts, we must call 'rchar' to remove the definition of the fallback
 glyph.  Anyway, the translation is still active; 'x' now maps to the
 real glyph 'foo'.
 
    Macro and request arguments preserve the compatibility mode:
 
      .cp 1     \" switch to compatibility mode
      .de xx
      \\$1
      ..
      .cp 0     \" switch compatibility mode off
      .xx caf\['e]
          => café
 
 Since compatibility mode is on while 'de' is called, the macro 'xx'
 activates compatibility mode while executing.  Argument '$1' can still
 be handled properly because it inherits the compatibility mode status
 which was active at the point where 'xx' is called.
 
    After expansion of the parameters, the compatibility save and restore
 tokens are removed.