Incorrect "from" address extracted during import

Issue #1130 closed
Mark Serellis created an issue

Hello,

Continuing my analysis of the imported eml files I have noticed a scenario where the incorrect “from” address is stored in metadata table. The email header contains the following:

From: "Some Person@domain.co.uk" <some.person@domain.co.uk>

pilerimport extracted only “person@domain.co.uk” instead of “some.person@domain.co.uk” perhaps because it looked in the quoted part instead of the angled bracket part?

pilertest shows the below:

from: *some person@domain.co.uk person domain co uk some.person@domain.co.uk (domain.co.uk)*

Regards

Mark

PS: No way to assign this to anybody while creating issue.

Comments (8)

  1. Janos SUTO repo owner

    Hello, it’s ok to leave it unassigned. I’ll test with this line, and let you know.

  2. Janos SUTO repo owner

    Well, pilertest, I mean the parser has extracted not only person@domain.co.uk, but some.person@domain.co.uk as well. Just like you said in the issue description.

    If the parser recognizes an email address (having @ in the string) then it assumes it’s an email address. Anyway, it’s pretty silly to put an email address looking name between the quotes where usually a name stands. I don’t think it’s an issue.

  3. Mark Serellis reporter

    Thanks for getting back on this. I agree, silly, but it is the part of the email address we have no control over so people can put all sorts in there and they usually do. Shouldn’t your code first recognise a standards based “from” line i.e. name between quotes and email between angle brackets and only if it doesn’t find that then try to get an address some other way? I’ve just tried an advanced search in the web gui for the full email address and they were still returned so it isn’t such a big issue (I’ll downgrade issue to minor). I have imported just over 1.1 million emails into piler and can let you know exactly how many instances we have of this if you would find that useful to base a “fix” or “won’t fix” decision on?

  4. Janos SUTO repo owner

    Well, it might, though I didn’t anticipate that anyone puts an email address that is not an email address to the From: line. Anyway, I’m curious about the number you mentioned.

  5. Mark Serellis reporter

    I’ve finished my analysis and can confirm that this really isn’t worth your time to fix. There were around 0.15% of emails affected and of those the vast majority were spam trying to make the email appear as though it came from someone internally. A very small proportion were genuine (like the one I found when I raised the issue) and as you can find the email using either address it really isn’t a problem.

    I’ll close the issue.

  6. Log in to comment