There's a large number of items which now generate different md5sum values when updating a db from the imdb lists as compared to retrieving the same movie from the imdb website using the imdb id.
So, using http I'd expect this code snippet to produce an xrefkey which matches the one generated in the db for the same movie:
imdb_item = i.get_movie(imdbId) if imdb_item: title = build_title(imdb_item, ptdf=1, _emptyString='').encode('utf_8') xrefkey = md5(title).hexdigest()
In fact, I get different key pairs for over 1600 videos. Has the pre-md5-hashing character encoding changed somewhere?
I've attached a list of 1645 titles which show this issue (the attachment shows the http-generated key, which differs from the md5sum value generated when updating the imdb lists), but one example is:
As far as I can tell, there various items don't seem to have anything in common - affected titles range from 1923 (imdb_id 14142, The Hunchback of Notre Dame) to 2013 (Jay and Silent Bob Get Irish: The Swearing O' the Green, imdb_id 2759112) and their kind_ids vary. Some are movies, some are tv episodes or tv series.