Euler diagram support

Issue #3 new
Robert Massaioli repo owner created an issue

I have often found myself in the position where I simply want to generate a full venn diagram of a set of files. In order to do this with just three files I need seven definitions. Let's see a quick example of that using (a, b, c) as the identifiers:

aOnly: a - bcWithoutA
bOnly: b - acWithoutB
cOnly: c - abWithoutC
abWithoutC: (a /\ b) - c
acWithoutB: (a /\ c) - b
bcWithoutA: (b /\ c) - a
abc: a /\ b /\ c

It would be really nice if there was a general pattern and syntax for implementing this feature. Any solution would need to have the following properties:

  1. Be short and simple to write.
  2. Be easy to read and understand.
  3. Generate standard identifiers (that are, ideally, ordering independent)

Comments (6)

  1. Robert Massaioli reporter

    My initial thoughts on this problem was to just support a kind of set notation for this. For example:

    abcVenn: [a, b, c]
    

    The brackets indicate the venn elements. From there you can use 'abcVenn' as a prefix for the implicit identifiers and can use the remaining set identifiers to specify more.

    As for how you use the abcVenn identifier, and how it is output, that is trickier. We need to run an exploration.

  2. Robert Massaioli reporter

    One idea is to use bracket notation to specify which elements you mean. For example:

    abcVenn[a, b] \/ c
    

    In this notation, you refer to the euler diagram by the sets you wanted to include and explicitly NOT include any that are not mentioned. Therefore we could write the original example as:

    abcVenn: [a, b, c]
    
    aOnly: abcVenn[a]
    bOnly: abcVenn[b]
    cOnly: abcVenn[c]
    abWithoutC: abcVenn[a, b]
    acWithoutB: abcVenn[a, c]
    bcWithoutA: abcVenn[b, c]
    abc: abcVenn[a, b, c]
    

    That would give a really simple and short notation that we could use to specify exactly which sets we need. Such a feature would add to the complexity of the setdown code but may have major payouts in the future.

  3. Robert Massaioli reporter

    On further thought, we should only allow the Euler diagrams to be a definition; but we should require that they are only output if they are mapped to an identifier. That way we avoid the combinatorial explosion of set operations problem. For example:

    • 2 elements => 2C1 + 2C2 => 3 unique sets
    • 3 elements => 3C1 + 3C2 + 3C3 => 7 unique sets
    • 4 elements => 4C1 + 4C2 + 4C3 + 4C4 = 15 unique sets

    Hopefully you can see that it is just the sum of the nth row of pascals triangle minus 1 (since we never compute nC0). The sum of the nth row of pascals triangle is 2 ^ n. This means that if we have n elements in the venn diagram then we are going to have (2 ^ n) - 1 unique sets.

    Thus, if we had 10 elements in the diagram then we would have to perform 1023 set operations. That is starting to get expensive; especially when you consider that the setdown definition maintainer might only require the output of only a few of the sets.

    We need a syntax that makes it obvious that when you define a euler diagram that it is only a definition: no files will be output as a result of that definition. Or maybe we should just add in a syntax for definitions that should never be output; that way we could just support lazy behaviour. Maybe a keyword called 'def' or 'lazy' would do the trick.

  4. Robert Massaioli reporter

    There is also the idea that you might want to perform a venn definition inline. For example, something like this:

    result: [a, b, c](a, b) - d
    

    In this syntax we replace the abcVenn[a, b] syntax from using brackets to using parens like so abcVenn(a, b) this is so that we can write venn diagram statements inline. This could be a useful feature and, even if we don't support it immediately, it is likely to be very handy to support long term.

  5. Robert Massaioli reporter

    It is worth noting that Euler himself would write [a, b, c](a b) as abc'. He seemed to use the prime symbol as a special character that could invert the inclusion of a set in the result. This is not bad syntax for mathematical notation but I think that we can do better. For example, this notation becomes cumbersome when you start having 10 elements and need to refer to them more than once. My proposed syntax encourages the good behaviour of creating a shared identifier for your venn diagram.

  6. Robert Massaioli reporter

    The expression [a,b,c,d](a,b) directly translates to (a /\ b) - (c \/ d).

    The general pattern is:

    [all](included) => (/\ included) - (\/ (all - included))
    

    That is, you do the set intersection of all of the included sets minus the set union of all of the non-included sets. That should give you your general answer.

    Then it is just up to us to make that operation fast.

  7. Log in to comment