negative regex does not work in archiving rules

Issue #337 closed
Michal Mistina created an issue

Hello Janos.

I am testing the following. According to http://www.regexr.com should string ^(?!.greenband.sk).$ match everything which does not contain greenband.sk. For references I used this site and this site. I put aforementioned regex string in the fielt From: in archiving rules. 2014-07-01 11_29_06-Archiving rules.png I tested it with pilertest on the following message:

[root@mailpiler regextests]# pilertest blocked_341.eml 
locale: en_US.UTF-8
build: 877
parsing...
post parsing...
message-id: <005401cf91d1$f95d7e00$ec187a00$@external.net> / 263ee49b07f75d05d7c03b650957888e3da2b2f9c6108870663c25f224738dfd
from: *Lolek lolek@external.net lolek external net  (external.net)*
to: *admin admin@mil.sk admin mil sk  (mil.sk )*
reference: **
subject: *blocked_341*
body: *This is a multipart message in MIME format - NextPart 000 0055 01CF91E2.BC *
sent: 1403850923, delivered-date: 0
hdr len: 1354
body digest: d854416075aa625960506e71edff00e8088f30e0d645dd272361a4a6839e9c40
rules check: (null)
retention period: 1625130466
attachments:
direction: 0
spam: 0

If I put contents of the field From: to RegExr, it works and matches that string. Does mailpiler have such restrictions that not all regular expresions work in it? The test was performed on the master branch 39a2e6a306b7.

Comments (19)

  1. Michal Mistina reporter

    I guess bitbucket reparsed the regex string I've mentioned. Here is the correct one:

    ^(?!.*greenband\.sk).*$
    
  2. Michal Mistina reporter

    It's OK for me to block every e-mail which comes from other domains except greenband.sk. That is the target. I will narrow this rule by defining To: field. This is one little step towards the requirements :-)

    So I've tried more things.

    1. I am testing with the same e-mail message as before.

    2. Set archiving rule - From: field to lolek@external.net --- matched

    3. Set archiving rule - From: field to [^lolek@external.net] --- matched

    [root@mailpiler regextests]# pilertest blocked_341.eml 
    locale: en_US.UTF-8
    build: 877
    parsing...
    post parsing...
    message-id: <005401cf91d1$f95d7e00$ec187a00$@external.net> / 263ee49b07f75d05d7c03b650957888e3da2b2f9c6108870663c25f224738dfd
    from: *Lolek lolek@external.net lolek external net  (external.net)*
    to: *admin admin@mil.sk admin mil sk  (mil.sk )*
    reference: **
    subject: *blocked_341*
    body: *This is a multipart message in MIME format - NextPart 000 0055 01CF91E2.BC *
    sent: 1403850923, delivered-date: 0
    hdr len: 1354
    body digest: d854416075aa625960506e71edff00e8088f30e0d645dd272361a4a6839e9c40
    rules check: domain=,from=[^lolek@external.net],to=,subject=,size>0,att.name=,att.type=,att.size>0,spam=-1
    retention period: 1625141879
    attachments:
    direction: 0
    spam: 0
    

    Why in the point 3 the rule matched? The laurikari web page says "If the list begins with ^ the meaning is negated; any character matching no item in the list is matched."

    I don't understand then. I thought that 3rd point should match only if there is not "lolek@external.net" inside From: field in the e-mail message. I expected (null) match.

    I also tried [^[lolek@external.net]] without success.

  3. Michal Mistina reporter

    I'm leaving for few days from today, so I won't be here to test it. Next week, so please wait till you close the issue due to inactivity. Thanx.

  4. Janos SUTO repo owner

    Today I experimented with it, but still no luck, so I contacted the tre developer for help. Hopefully he will help us to fix the regex string.

  5. Michal Mistina reporter

    I tried to experiment with regex matching in subject field. E.g.

    "subject: *test351*"
    

    matches regex [test351] and everything else e.g.

    "subject: *x test351*"
    

    matches [^test351]. That means negative regex is working. The issue is, that if I want to match all other incoming domains except one (e.g. greenband.sk) I have to compare it to the whole string which is inside the From: field. I will take an example of From field from pilertest:

    from: *Lolek lolek@greenband.sk lolek greenband sk  (greenband.sk)*
    

    The steps to achieving this could be these:

    • Match everything that matches following string:
    Lolek lolek@greenband.sk lolek greenband sk  (greenband.sk)
    
    • Negate it.

    I can match everything with this not very clever regex:

    .+?\ .+?@greenband\.sk\ .+?\ greenband\ sk\ +\(greenband\.sk\)
    

    But I don't know how to negate it all, so if it is for example:

    Lolek lolek@external.net lolek external net  (external.net)
    

    it is hit whole by pilertest.

    Easier way to negate everything would be to implement it in the code of mailpiler. For example to create check box with the name "not" next to the form boxes of archiving rules. In that case the whole regex string in the form box would be negated if you check "not" next to it.

  6. Janos SUTO repo owner

    Indeed, it's definitely a way to do so, however you may setup several parameters (from, to, subject, size, attachment, etc.) for a single rule. The if all set conditions are met, then it's a match.

    But let's say you create a new rule with a from and a size value. Then how to evaluate it when using a NOT (or inverted) logic?

    Anyway it would be far more manageable if it could be fixed with the tre regex stuff. Unfortunately no response yet from the developer.

  7. Janos SUTO repo owner

    Eg. rule #1: from: @domain.com, size: >20000

    The point is: how to evaulate the negated expression if there are two (or more) conditions?

    Eg.

    message #1: from=...@aaa.fu, size=23000 message #2: from=...@bbb.fu, size=15000 message #3: from=...@domain.com, size=23000 message #4: from=...@domain.com, size=15000

    So which message should be discarded by the negated rule #1?

  8. Michal Mistina reporter

    I think only the message no.4 will be archived and other messages will be discarded. I don't think size should have "NOT" checkbox available. How is the result of the rule built? I thought there is a logical AND between individual conditions.... !(From) AND (Size). Or you meant to negate the whole rule after it is built? .... !((From) AND (Size))

    Currently how the rules are handled? Like in firewalls? The first match ends of the evaluation of the rules that are behind?

    Sorry for confusion. I am not a developer, so I don't see behind the scenes.

  9. Janos SUTO repo owner

    Still no response from the tre developer, and I'm a bit reluctant of complicating the rule processing with a negating logic.

    Anyway when a message is processed all the rules are checked until there's a hit (just as with firewalls like you mentioned). There's no specific order, just as the sql server presents the rules.

  10. Michal Mistina reporter

    All right. Do you want to leave this issue opened till the tre developer replies? From my perspective there is no other way to solve it. Only political way - to say customer, that he needs to archive both outbound and also inbound e-mails. No problem in that. Maybe if somebody other asks for it, you can now save it as a feature request ;-)

  11. Janos SUTO repo owner

    I'm not sure if he will reply at all. Even I reply within a few days :-) However if you described your environment then I may help you to achieve it, perhaps without complex rules. For example (eg. with postfix) you may choose to archive only incoming emails only.

  12. Michal Mistina reporter

    Thank you for the offer, but the issue is, that there is a black box where we cannot influence the behaviour of it. It just journals everything - sends everything somewhere (mailpiler). The vendor of the black box is going to add such filtering on outgoing/ingoing mails. But this can be released in years, because it's only a feature request. I tried to solve it on the mailpiler. According to these difficulties we can close this case and if the tre developer reponds it can be reopened. Thank you again.

  13. Log in to comment