Confusions about Fuzzy matching behavior (prob a bug?)

Create issue
Issue #370 closed
Yu Qiu created an issue

Hi,

When I tried to use fuzzy matching feature, I found the following behavior, which confused me.

I tried to match a financial digit string, such as ($ 123,456,789.00)

  • Case 1: First I tried
regex.match('(?e)(?:^(\\$ )?\\d{1,3}(,\\d{3})*(\\.\\d{2})$){e}', '$ 10,112.111.12')

and it returned:

<regex.Match object; span=(0, 15), match='$ 10,112.111.12', fuzzy_counts=(6, 0, 5)>

  • Case 2: Then I thought it should only need a substitution, so I tried:
regex.match('(?e)(?:^(\\$ )?\\d{1,3}(,\\d{3})*(\\.\\d{2})$){s<=1}', '$ 10,112.111.12')

and it returned

<regex.Match object; span=(0, 15), match='$ 10,112.111.12', fuzzy_counts=(1, 0, 0)>

It is what I expected.

  • Case 3: But at last, I tried this:
regex.match('(?e)(?:^(\\$ )?\\d{1,3}(,\\d{3})*(\\.\\d{2})$){s<=1,i<=1,d<=1}', '$ 10,112.111.12')

it returned None.


So I am confused now. What I expected is that matching will return a possibly minimum edit-ops between source string and target string(regex), but the actual behavior does not.

I understand that the matching are processed through left to right, so in case 1, it would not know that only one substitution will match until it finished all scanning.

But in the case 3, why would it return None?

And how can I achieve the behavior that “ matching will return a possibly minimum edit-ops between source string and target string(regex)”

Any help will be appreciated. Thanks!

Comments (7)

  1. Yu Qiu reporter

    And I found another confusing example:

    regex.match('(?e)(?:^(\\$ )?\\d{1,3}(,\\d{3})*(\\.\\d{2})$){s<=3}', '$ 10,1a2.111.12')
    

    It returned:

    <regex.Match object; span=(0, 15), match='$ 10,1a2.111.12', fuzzy_counts=(2, 0, 0)>
    

    But if you use {s<=2}instead, it will return None:

    regex.match('(?e)(?:^(\\$ )?\\d{1,3}(,\\d{3})*(\\.\\d{2})$){s<=2}', '$ 10,1a2.111.12')
    # Return None
    

    Shouldn’t this only contain 2 substitution, according to the return? But why when I specify {s<=2} it returned None?

  2. Yu Qiu reporter

    Thanks for quick fix. However, I still have something confusing.

    For the input:

    print(regex.compile('(^\\$ \\d{1,3},\\d{3},\\d{3}$){e<=1}').match('$3,038,444'))
    

    I think the edit-ops should be an insertion at index 1. However, it gives the result:

    <regex.Match object; span=(0, 10), match='$3,038,444', fuzzy_counts=(0, 0, 1)>
    

    and

    print(regex.compile('(^\\$ \\d{1,3},\\d{3},\\d{3}$){e<=1}').match('$3,038,444')).fuzzy_changes
    # return ([], [], [1])
    

    According to the document, doesn’t that mean a deletion at index 1?

    Thanks for any help in advance!

  3. Matthew Barnett repo owner

    No, it's not a bug.

    In your example, you want a string that starts "$ ", but the string that you have does not have a space at index 1. It's a deletion.

    Here's another example:

    >>> import regex
    >>> regex.fullmatch('(?:cart){e<=1}', 'cat')
    <regex.Match object; span=(0, 3), match='cat', fuzzy_counts=(0, 0, 1)>
    

    You want "cart", but you have "cat". The character "r" is missing; it's a deletion.

  4. Yu Qiu reporter

    Oh I got wrong direction. So it would be the ops from the pattern to the string. Not the other way around. Got it! Thanks.

  5. Log in to comment