Feature Request: Enable constraints on automatching

Issue #26 new
Simon Kissane created an issue

Suppose I want to import a catalog about the geography of a particular country or part of a country–for example, the VICNAMES catalog I've already imported (catalog#393)–it would be useful to apply constraints on the automatic matching, for example in that catalog you would want the constraint that if P17(Country) is present it must have value Q408(Australia), so if any items violate that constraint the automated matching engine knows they aren't a candidate. Right now, it will match a town in the UK or USA to a town in Australia just because they have the same name, the automatching doesn't have the smarts to know "this catalog is only about Australia so items from other countries can't be matches."

(I'd probably say that stuff with P17 missing should still be included, since many country-specific items don't have P17 set on it – but it depends on the dataset, so the flexibility should be there to choose either behaviour, i.e. whether items lacking the constrained property should be included or excluded for automatching.)

You could possibly pull the constraints from the {{Constraint}} templates on the property talk page, but if you bothered with that at all, I would only do that as just a starting point that can be overriden. To give an example, I was thinking of importing a list of Australian postcodes to import into P281, e.g. from https://github.com/timbennett/australian_postcodes – so for that import you'd want a P17=Q408 constraint, but you can't set that constraint on P281 because it is applicable to all countries not just Australia. So sometimes the constraint is specific to the dataset being imported rather than the target property.

Ideally it would be possible to edit the constraints after the catalog was created, in case you got them wrong when creating it. That implies an ability to retrigger the automated matching with the new constraints in place.

It would also be good to apply these constraints to manual matches as well. Probably it should just be a warning not a prohibition because a dataset may have outliers where the constraint actually should be violated. (e.g. of the thousands of items in the Australian Heritage Database, all are located in Australia, but it actually has three exceptions: one item in Turkey, one item in the UK, and one item in Antarctica.)

Comments (0)

  1. Log in to comment