SourceMD creating duplicates of DOI's with '<' characters

Issue #22 new
Former user created an issue

The example I ran across was Q56831927 (which has been added at least 10 times by SourceMD) but there are apparently many others. The DOI is not being entered correctly, so when somebody tries to re-enter the same DOI a duplicate is created (with the same incorrect DOI). See further discussion here:

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Source_MetaData#Duplicated_bad_DOIs_-_Wiley_journals

From this discussion it looks like this can be fixed by just entering the correct full DOI in the first place.

Comments (2)

  1. Arthur Smith

    Hi Magnus - the problem is in the call to fixStringLengthAndHTML - the

    preg_replace ( '/<.+?>/' , ' ' , $s ) ;
    

    bit is wiping out the <...> portion of these DOI's. I think you probably want to completely skip the call to this function whenever the property in question is an ID - certainly for P356 anyway. Maybe keep a list of properties that don't need filtering?

  2. Trilotat

    Magnus, have you resolved issues with SourceMD such that you and Pintoch can restore SourceMD to full operation?

  3. Log in to comment