Source

fsl / doc / language.html

Full commit
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
        "http://www.w3.org/TR/REC-html40/loose.dtd">
<html lang="en">

<head>
    <title>File Selection Language Reference</title>
    <link rel="stylesheet" href="fsl.css" type="text/css">
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <meta name="author" content="Kristian Ovaska">
</head>

<body>

<h1>File Selection Language Reference</h1>

<p>
Version 0.5.1 (2005-12-05)<br>
Kristian Ovaska (kristian.ovaska [at] helsinki.fi)

<h2>Contents</h2>

<ul>

<li><a href="#overview">1. Overview</a>
<li><a href="#general-syntax">2. General syntax</a>
<li><a href="#glob-patterns">3. Glob patterns</a>
<li><a href="#rules">4. Rules</a>
    <ul>
    <li><a href="#glob-list">4.1 Glob list rule</a>
    <li><a href="#for-each">4.2 For-each rule</a>
    <li><a href="#in-block">4.3 IN-block</a>
    <li><a href="#if-block">4.4 IF-block</a>
    </ul>
<li><a href="#expressions">5. Expressions</a>
    <ul>
    <li><a href="#expressions-general">5.1 General</a>
    <li><a href="#functions">5.2 Built-in functions</a>
    </ul>
<li><a href="#eval-order">6. Rule evaluation order</a>
<!--
<li><a href="#examples">7. Examples</a>
-->
</ul>

<h2 id="overview">1. Overview</h2>

<p>
File Selection Language (FSL) is a descriptive language for
file selection. A FSL program, also called a rule set, is
made out of rules. Each rule tells whether a file should
or should not be included in the file set.

<p>
FSL rules utilize glob patterns. The pattern <tt>*</tt> matches
all files, <tt>dir1/*</tt> matches all files under dir1,
<tt>dir1/somefile</tt> matches only the file dir1/somefile, and
so on.
However, FSL rules are not limited to bare globs. See
<a href="#rules">below</a> for a full specification of the rules.

<p>
There are two basic kinds of rules: inclusive and exclusive.
Inclusive rules tell that certain files should be included
in the file set. When evaluating an inclusive rule, the
file system is scanned for all files matching the rule.

<p>
Exclusive rule (starts with "NOT") is an exception to an
inclusive rule. It says that even if a file matched an
inclusive rule earlier, it must be excluded from the file set.
Notice that exclusive rules don't cause any file system
scanning by themselves: all scanning comes from inclusive rules.

<p>
If you exclude a directory with an exclusive rule, you exclude
all files and subdirectories of it as well. This is a good
way to speed up file scanning.

<h2 id="general-syntax">2. General syntax</h2>

<p>
The program consists of a list of rules, which are separated by newlines.
Simple rules (usually) fit on one line, while complex ones span several lines.

<p>
The character # marks the beginning of a comment. Everything after it
on the same line is ignored.

<p>
Inside so-called block rules (see <a href="#rules">below</a>), the child
rules must be indented with spaces or tabs. You can choose the indentation
level, but you must always use the same amount within the rule file.
Mixing spaces and tabs is unwise. Nested blocks are indented in the
same manner.

<p>
Simple (non-block) rules generally fit on one line. Due to the block
indentation system, simple rules normally even can't span several lines;
it is a syntax error.
However, expressions (see <a href="#expressions">below</a>)
that have open parenthesis can span several lines freely.
Python programmers will recognize the FSL indentation system.

<p>
Everything is case-insensitive: keywords and glob patterns.

<h2 id="glob-patterns">3. Glob patterns</h2>

<p>
When you write glob patterns, you can use two forms: bare strings
and quoted strings. Bare strings are written as-is, while quoted
strings have quotation marks around them.

<p>
Bare string: <tt>dir1/*</tt><br>
Quoted string: <tt>"dir1/*"</tt>

<p>
There are limitations to bare strings. Bare strings:
<ul>
<li>may not contain whitespace
<li>may not contain the characters <tt>, ( ) " &lt; &gt; = # | !</tt>
<li>may not be a reserved word: AND, EACH, IF, IN, NONREC, NOT, OR
<li>can't be used in expressions (see <a href="#expressions">below</a>)
</ul>

<p>
For example, the pattern "aaa bbb" must be a quoted string.

<p>
Glob patterns may contain both forward slashes (<tt>/</tt>) and
backward slashes (<tt>\</tt>). Forward slashes work on Windows, too,
and backward slashes work on Unix. Glob patterns may contain full
Windows drive specifiers (e.g. <tt>c:\somedir\*</tt>); they don't obviously
work on Unix.

<p>
By default, glob patterns are recursive, i.e. <tt>*</tt> matches all files,
including the subdirectories. You get nonrecursive behaviour by
appending "NONREC" to the glob pattern. For example, <tt>* NONREC</tt>
matches only the files in current root directory, but not in
subdirectories.

<p>
There are two flavours of glob patterns: absolute and relative.
Absolute patterns start with a forward or backward slash or a Windows
drive specifier. A pattern that is not absolute is, logically, a
relative pattern.

<p>
Relative glob patterns are evaluated in the context of a root
directory. By default, the root directory is the current working
directory, but may be set to any directory.
For example, the rule <tt>*</tt> will produce all files in the
file system if the root directory is the file system root, but only
the files under <tt>/usr/local</tt> if the root directory
is <tt>/usr/local</tt>.
The root directory is given to the FSL interpreter as parameter. Also,
so-called IN-blocks (see <a href="#in-block">below</a>) change the
effective root directory temporarily. Absolute patterns are not
allowed inside IN-blocks.

<p>
Usually, it is better to use relative globs, since they are more
flexible than absolute globs.
Absolute globs are always evaluated in the context of the same
root directory, the file system root.
Let's say you have created a relative rule-set for your Unix machine
that you normally evaluate with <tt>/</tt> as the root directory.
Some day, you mirror the file system into another Unix machine
(or a Windows machine using Samba) into a directory <tt>/usr/somedir</tt>.
Now, you can simply use your existing relative rule-set. This wouldn't
be possible if you had hard-coded all the paths into the rules.


<h2 id="rules">4. Rules</h2>

<p>
There are two basic rule types: <a href="#glob-list">glob list</a> rule
and <a href="#for-each">for-each</a> rule.
Both may be prefixed with "NOT", which makes them exclusive rules.
There is also two compound rule types: <a href="#in-block">IN-block</a>
and <a href="#if-block">IF-block</a>.

<pre class="code">
&lt;rule&gt; := (NOT)? &lt;glob-list&gt;
        | (NOT)? &lt;for-each&gt;
        | IN &lt;directory&gt; &lt;start-block&gt; &lt;rule&gt;+ &lt;end-block&gt;
        | IF &lt;expression&gt; &lt;start-block&gt; &lt;rule&gt;+ &lt;end-block&gt;
</pre>

<h3 id="glob-list">4.1 Glob list rule</h3>

<p>
This is the most basic rule. Glob list rule is, as the
name implies, a list of glob patterns separated by commas.
A file matches a glob list rule if it matches any of the globs.

In glob patterns, bare strings and quoted string may be mixed
freely, as can recursive and nonrecursive (NONREC) glob patterns.

<p>
Format:
<pre class="code">
&lt;glob-pattern&gt; (, &lt;glob-pattern&gt;)* (IF &lt;expression&gt;)?
</pre>

<p>
The IF-expression is optional. If present, the
glob list rule is applied only if the <a href="#expressions">expression</a>
is true. The expression is evaluated only once, not for every
file. When using expressions, you usually want to evaluate
the expression for every file in turn. In this case,
you have to use the <a href="#for-each">for-each</a> rule.

<p>
Examples:
<pre class="code">
usr/local/*
somefile, "some file with spaces"
*.gif, *.jpg, *.png
NOT *.ps, *.eps
    (excludes both *.ps and *.eps)
*.html IF exists("index.html")
    (include *.html files only if index.html is present)
</pre>

<h3 id="for-each">4.2 For-each rule</h3>

<p>
Format:
<pre class="code">
EACH &lt;variable name&gt; (IN &lt;glob list&gt;)? IF &lt;expression&gt;
</pre>

<p>
For-each rule is an enchanced glob list. Each file matched by
the glob list is included/excluded only if the <a href="#expressions">expression</a>
matches.
The expression is evaluated for every file in turn.

<p>
The IN-section is optional. If omitted, the glob <tt>*</tt> is used.

<p>
Examples:
<pre class="code">
EACH f IN * IF size(f) &gt; 1024     (include files larger than 1 KB)
EACH f IF size(f) &gt; 1024          (the same)
NOT EACH f IN *.ps IF date(f) &lt; "2000" 
  (excludes *.ps files from the previous millennium)
</pre>

<h3 id="in-block">4.3 IN-block</h3>

<p>
IN-block contains a list of rules that are executed in a different
root directory. The effective root directory is calculated by
concatenating the previous root directory and the directory given
in the IN-block header.

<p>
The rules under the IN-block can be any rules: glob lists,
for-each rules, or other IN-blocks.

<p>
Format:
<pre class="code">
IN &lt;directory&gt;
    &lt;rule1&gt;
    &lt;rule2&gt;
    ...
</pre>

<p>
All glob patterns must be relative. The directory specifier may
be absolute if the IN-block is a top-level IN-block. In a nested
IN-block, the directory specifier must also be relative.

<p>
Example:
<pre class="code">
IN dir1
    *
</pre>

<p>This includes all files under dir1 and is exactly the same as the rule
<pre class="code">
dir1/*
</pre>

<p>
Example of nested IN-blocks:
<pre class="code">
    IN dir1
        IN dir2
            IN dir3
                *
</pre>

<p>
This matches the files <tt>dir1/dir2/dir3/*</tt>.

<h3 id="if-block">4.4 IF-block</h3>

<p>
IF-block is a bit like an glob list rule with an IF-expression,
but an IF-block may contain several rules. The rules are applied
only if the expression evaluates to true. The expression is
evaluated only once.

<p>
Format:
<pre class="code">
IF &lt;expression&gt;
    &lt;rule1&gt;
    &lt;rule2&gt;
    ...
</pre>


<h2 id="expressions">5. Expressions</h2>

<h3 id="expressions-general">5.1 General</h3>

<p>
Expressions are used in for-each rules, glob list rules and IF-block rules
to determine whether a rule should be applied. Each expression evaluates
to true or false.

<p>
Expressions are made of:
<ul>
<li>integer, floating point, string and timestamp literals, e.g. <tt>50</tt>, 
    <tt>5.23</tt>, <tt>"abc"</tt>, <tt>"2005-08-05 21:30"</tt>
<li>variable references if inside a for-each rule, e.g. <tt>f</tt>
<li>function calls, e.g. <tt>size(f)</tt>, <tt>now()</tt>
<li>logical operators NOT, AND, OR, e.g. "<tt>NOT expr</tt>",
    "<tt>expr1 AND expr2</tt>", "<tt>expr1 OR expr2</tt>"
<li>comparison operators <tt>&lt; &lt;= &gt; &gt;= = !=</tt>
    (these work for numbers, strings and timestamps)
</ul>

<p>
Timestamp literals are written inside quotation marks just like strings.
However, they are converted to a "real" timestamp representation internally.
An invalid timestamp literal results in a parse error.
Accepted timestamp formats are:
<ul>
<li><tt>yyyy</tt> (month=1, day=1, hour=0, minute=0, second=0)
<li><tt>yyyy-mm</tt>
<li><tt>yyyy-mm-dd</tt>
<li><tt>yyyy-mm-dd hh:mm</tt>
<li><tt>yyyy-mm-dd hh:mm:ss</tt>
</ul>

<p>
Notice that logical NOT (inside an expression) is
conceptually different from the exclusion NOT before a rule.
Logical NOT merely reverses the truth value of an expression.

<p>
Expression format:
<pre class="code">
&lt;expression&gt; := &lt;simple-expression&gt; ((AND | OR) &lt;expression&gt;)?
&lt;simple-expression&gt; := (NOT)? &lt;atom&gt; (&lt;compare-op&gt; &lt;atom&gt;)?
                     | (NOT)? "(" &lt;expression&gt; ")"
&lt;atom&gt; := &lt;string&gt;
        | &lt;number&gt;
        | &lt;variable-name&gt;
        | &lt;function-name&gt; "(" &lt;atom&gt; ")"
&lt;compare-op&gt; := "&lt;" | "&lt;=" | "&gt;" | "&gt;=" | "=" | "!="
</pre>

<p>
Expressions with open parenthesis can span several lines, unlike
normal simple FSL rules. Inside the expression, indentation doesn't
matter.

<p>
Example:
<pre class="code">
EACH f IN *.txt IF (size(f) < 1000
                  OR age(f) < 30)
</pre>

<p>
The following example would be a syntax error because there
are no (open) parenthesis:
<pre class="code">
EACH f IN *.txt IF size(f) < 1000
                 OR age(f) < 30
</pre>

<h3 id="functions">5.2 Built-in functions</h3>

<p>
Expressions can use built-in functions, which can be divided into two
categories: predicate and value functions. Predicate functions return
a truth value and value functions return a value (like a number).
Value function calls can't be used as complete expressions as themselves,
they must be combined with comparison operators.

<p>
<table>
<tr>
    <th align="left">Function
    <th align="left">Type
    <th align="left">Description
<tr>
    <td><em>age(filename)</em>
    <td><tt>filename -&gt; float</tt>
    <td>Return age of file in days as floating point number (based on modification date)
<tr>
    <td><em>base(filename)</em>
    <td><tt>filename -&gt; filename</tt>
    <td>Return file name without (outermost) extension,
    e.g. for filename <tt>"dir/aaa.ext"</tt>, return <tt>"dir/aaa"</tt>
<tr>
    <td><em>date(filename)</em>
    <td><tt>filename -&gt; datetime</tt>
    <td>Return modification date of file
<tr>
    <td><em>exists(filename)</em>
    <td><tt>filename -&gt; boolean</tt>
    <td>Return true if given file exists
<tr>
    <td><em>extract(time, part)</em>
    <td><tt>datetime, string -&gt; int</tt>
    <td>Extract part from timestamp. Part is one of "year", "month", "day",
    "hour", "minute", "second", "week", "weekday".
<tr>
    <td><em>now()</em>
    <td><tt>-&gt; datetime</tt>
    <td>Return current time
<tr>
    <td><em>size(filename)</em>
    <td><tt>filename -&gt; int</tt>
    <td>Return size of file in bytes
</table>


<h2 id="eval-order">6. Rule evaluation order</h2>

<p>
Rules are evaluated from the first to the last and the last
matching rule is applied.

<p>
For example, consider the following rules:

<pre class="code">
*
NOT *.jpg
</pre>

<p>
The first rule matches all files, but the second rule excludes
all *.jpg files. As the result, all files except *.jpg files are
included in the file set.

<p>
A rule set that may not do what one wishes it to do:

<pre class="code">
NOT *.jpg
*
</pre>

<p>
This matches all files, including *.jpg files, because
the last rule (*) tells to include all files. No exclusive
rule at the beginning of a rule set ever has any effect.
Indeed, the FSL interpreter warns you in this case:
    <i>Warning: exclusive rule at beginning - has no effect</i>.

<p>
Usually, you should have inclusion rules at the beginning of the
rule set and exclusion rules at the end.

<!--
<h2 id="examples">7. Examples</h2>

<p>
Includes config.ini file under root directory, if present.
<pre class="code">
config.ini
</pre>

<hr>
<p>
Includes config.ini files under all directories, except in my/prog
and its subdirectories.
<pre class="code">
*/config.ini
NOT my/prog
</pre>
-->

<p>
<hr>
Up: <a href="index.html">FSL index</a><br>
Updated 2005-11-04
</body>
</html>