Source

SCons / doc / user / scanners.xml

Full commit
<!--

  __COPYRIGHT__

  Permission is hereby granted, free of charge, to any person obtaining
  a copy of this software and associated documentation files (the
  "Software"), to deal in the Software without restriction, including
  without limitation the rights to use, copy, modify, merge, publish,
  distribute, sublicense, and/or sell copies of the Software, and to
  permit persons to whom the Software is furnished to do so, subject to
  the following conditions:

  The above copyright notice and this permission notice shall be included
  in all copies or substantial portions of the Software.

  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
  KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
  WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
  WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

-->

<!--

=head1 Using and writing dependency scanners

QuickScan allows simple target-independent scanners to be set up for
source files. Only one QuickScan scanner may be associated with any given
source file and environment, although the same scanner may (and should)
be used for multiple files of a given type.

A QuickScan scanner is only ever invoked once for a given source file,
and it is only invoked if the file is used by some target in the tree
(i.e., there is a dependency on the source file).

QuickScan is invoked as follows:

  QuickScan CONSENV CODEREF, FILENAME [, PATH]

The subroutine referenced by CODEREF is expected to return a list of
filenames included directly by FILE. These filenames will, in turn, be
scanned. The optional PATH argument supplies a lookup path for finding
FILENAME and/or files returned by the user-supplied subroutine.  The PATH
may be a reference to an array of lookup-directory names, or a string of
names separated by the system's separator character (':' on UNIX systems,
';' on Windows NT).

The subroutine is called once for each line in the file, with $_ set to the
current line. If the subroutine needs to look at additional lines, or, for
that matter, the entire file, then it may read them itself, from the
filehandle SCAN. It may also terminate the loop, if it knows that no further
include information is available, by closing the filehandle.

Whether or not a lookup path is provided, QuickScan first tries to lookup
the file relative to the current directory (for the top-level file
supplied directly to QuickScan), or from the directory containing the
file which referenced the file. This is not very general, but seems good
enough, especially if you have the luxury of writing your own utilities
and can control the use of the search path in a standard way.

Here's a real example, taken from a F<Construct> file here:

  sub cons::SMFgen {
      my($env, @tables) = @_;
      foreach $t (@tables) {
	  $env->QuickScan(sub { /\b\S*?\.smf\b/g }, "$t.smf",
			  $env->{SMF_INCLUDE_PATH});
	  $env->Command(["$t.smdb.cc","$t.smdb.h","$t.snmp.cc",
			 "$t.ami.cc", "$t.http.cc"], "$t.smf",
			q(smfgen %( %SMF_INCLUDE_OPT %) %<));
      }
  }

The subroutine above finds all names of the form <name>.smf in the
file. It will return the names even if they're found within comments,
but that's OK (the mechanism is forgiving of extra files; they're just
ignored on the assumption that the missing file will be noticed when
the program, in this example, smfgen, is actually invoked).

[NOTE that the form C<$env-E<gt>QuickScan ...>  and C<$env-E<gt>Command
...> should not be necessary, but, for some reason, is required
for this particular invocation. This appears to be a bug in Perl or
a misunderstanding on my part; this invocation style does not always
appear to be necessary.]

Here is another way to build the same scanner. This one uses an
explicit code reference, and also (unnecessarily, in this case) reads
the whole file itself:

  sub myscan {
      my(@includes);
      do {
	  push(@includes, /\b\S*?\.smf\b/g);
      } while <SCAN>;
      @includes
  }

Note that the order of the loop is reversed, with the loop test at the
end. This is because the first line is already read for you. This scanner
can be attached to a source file by:

  QuickScan $env \&myscan, "$_.smf";

This final example, which scans a different type of input file, takes
over the file scanning rather than being called for each input line:

  $env->QuickScan(
      sub { my(@includes) = ();
	  do {
	     push(@includes, $3)
		 if /^(#include|import)\s+(\")(.+)(\")/ && $3
	  } while <SCAN>;
	  @includes
      },
      "$idlFileName",
      "$env->{CPPPATH};$BUILD/ActiveContext/ACSCLientInterfaces"
  );

-->

  <para>

    &SCons; has built-in scanners that know how to look in
    C, Fortran and IDL source files for information about
    other files that targets built from those files depend on--for example,
    in the case of files that use the C preprocessor,
    the <filename>.h</filename> files that are specified
    using <literal>#include</literal> lines in the source.
    You can use the same mechanisms that &SCons; uses to create
    its built-in scanners to write scanners of your own for file types
    that &SCons; does not know how to scan "out of the box."

  </para>

  <section>
  <title>A Simple Scanner Example</title>

    <para>

      Suppose, for example, that we want to create a simple scanner
      for <filename>.foo</filename> files.
      A <filename>.foo</filename> file contains some text that
      will be processed,
      and can include other files on lines that begin
      with <literal>include</literal>
      followed by a file name:

    </para>

    <programlisting>
      include filename.foo
    </programlisting>

    <para>

      Scanning a file will be handled by a Python function
      that you must supply.
      Here is a function that will use the Python
      <filename>re</filename> module
      to scan for the <literal>include</literal> lines in our example:

    </para>

    <programlisting>
      import re
      
      include_re = re.compile(r'^include\s+(\S+)$', re.M)
      
      def kfile_scan(node, env, path, arg):
          contents = node.get_text_contents()
          return env.File(include_re.findall(contents))
    </programlisting>

    <para>
    
      It is important to note that you
      have to return a list of File nodes from the scanner function, simple
      strings for the file names won't do. As in the examples we are showing here,
      you can use the &File;
      function of your current Environment in order to create nodes on the fly from
      a sequence of file names with relative paths.
      
    </para>

    <para>

      The scanner function must
      accept the four specified arguments
      and return a list of implicit dependencies.
      Presumably, these would be dependencies found
      from examining the contents of the file,
      although the function can perform any
      manipulation at all to generate the list of
      dependencies.

    </para>

    <variablelist>

      <varlistentry>
      <term>node</term>

      <listitem>
      <para>

      An &SCons; node object representing the file being scanned.
      The path name to the file can be
      used by converting the node to a string
      using the <literal>str()</literal> function,
      or an internal &SCons; <literal>get_text_contents()</literal>
      object method can be used to fetch the contents.

      </para>
      </listitem>
      </varlistentry>

      <varlistentry>
      <term>env</term>

      <listitem>
      <para>

      The construction environment in effect for this scan.
      The scanner function may choose to use construction
      variables from this environment to affect its behavior.

      </para>
      </listitem>
      </varlistentry>

      <varlistentry>
      <term>path</term>

      <listitem>
      <para>

      A list of directories that form the search path for included files
      for this scanner.
      This is how &SCons; handles the &cv-link-CPPPATH; and &cv-link-LIBPATH;
      variables.

      </para>
      </listitem>
      </varlistentry>

      <varlistentry>
      <term>arg</term>

      <listitem>
      <para>

      An optional argument that you can choose to
      have passed to this scanner function by
      various scanner instances.

      </para>
      </listitem>
      </varlistentry>

    </variablelist>

    <para>

    A Scanner object is created using the &Scanner; function,
    which typically takes an <literal>skeys</literal> argument
    to associate the type of file suffix with this scanner.
    The Scanner object must then be associated with the
    &cv-link-SCANNERS; construction variable of a construction environment,
    typically by using the &Append; method:

    </para>

    <programlisting>
       kscan = Scanner(function = kfile_scan,
                       skeys = ['.k'])
       env.Append(SCANNERS = kscan)
    </programlisting>

    <para>

    When we put it all together, it looks like:

    </para>

    <programlisting>
        import re

        include_re = re.compile(r'^include\s+(\S+)$', re.M)

        def kfile_scan(node, env, path):
            contents = node.get_text_contents()
            includes = include_re.findall(contents)
            return env.File(includes)

        kscan = Scanner(function = kfile_scan,
                        skeys = ['.k'])

        env = Environment(ENV = {'PATH' : '/usr/local/bin'})
        env.Append(SCANNERS = kscan)

        env.Command('foo', 'foo.k', 'kprocess &lt; $SOURCES &gt; $TARGET')
    </programlisting>

    <!--

    <para>

      Now if we run &scons;
      and then re-run it after changing the contents of
      <filename>other_file</filename>,
      the <filename>foo</filename>
      target file will be automatically rebuilt:

    </para>

    <scons_output example="scan">
      <scons_output_command>scons -Q</scons_output_command>
      <scons_output_command output="    [CHANGE THE CONTENTS OF other_file]">edit other_file</scons_output_command>
      <scons_output_command>scons -Q</scons_output_command>
      <scons_output_command>scons -Q</scons_output_command>
    </scons_output>

    -->

  </section>

  <section>
  <title>Adding a search path to a scanner: &FindPathDirs;</title>

    <para>

    Many scanners need to search for included files or dependencies
    using a path variable; this is how &cv-link-CPPPATH; and
    &cv-link-LIBPATH; work.  The path to search is passed to your
    scanner as the <literal>path</literal> argument.  Path variables
    may be lists of nodes, semicolon-separated strings, or even
    contain SCons variables which need to be expanded.  Fortunately,
    &SCons; provides the &FindPathDirs; function which itself returns
    a function to expand a given path (given as a SCons construction
    variable name) to a list of paths at the time the scanner is
    called.  Deferring evaluation until that point allows, for
    instance, the path to contain $TARGET references which differ for
    each file scanned.

    </para>

    <para>

    Using &FindPathDirs; is quite easy.  Continuing the above example,
    using KPATH as the construction variable with the search path
    (analogous to &cv-link-CPPPATH;), we just modify the &Scanner;
    constructor call to include a path keyword arg:

    </para>
    
    <programlisting>
        kscan = Scanner(function = kfile_scan,
                        skeys = ['.k'],
                        path=FindPathDirs('KPATH'))
    </programlisting>
    
    <para>
    
    FindPathDirs returns a callable object that, when called, will
    essentially expand the elements in env['KPATH'] and tell the
    scanner to search in those dirs.  It will also properly add
    related repository and variant dirs to the search list.  As a side
    note, the returned method stores the path in an efficient way so
    lookups are fast even when variable substitutions may be needed.
    This is important since many files get scanned in a typical build.
    
    </para>
    </section>