Duplicated data returned by hs_read_buffer when using MySQL
Symptom: calling mjsonrpc's hs_read_arraybuffer
can sometimes result in “duplicated” data being returned - you will be told that there are twice as many datapoints as there should be for a variable, and you will get the data in the order T0,T1…TN,T0,T1…TN.
The problem appears for data coming from the “System” event (e.g. links defined in /History/Links/System), and I think it only affects MySQL history systems.
The main function to study is SchemaHistoryBase::hs_read_buffer
in history_schema.cxx
.
The underlying cause is that fSchema
(populated by SqlHistoryBase::read_schema
calling MysqlHistory::read_table_and_event_names
) contains schemas where the event_name is system
AND where the event_name is System
(note the different capitalisation). The first of these is created directly in MysqlHistory::read_table_and_event_names
, has a time_from of 0, and sets the event_name to be the same as the table_name; the latter is created in ReadMysqlTableNames
, has a time_from that depends on when the schema was last changed, and sets the event_name to be the “real” midas event name.
The problem is that SchemaHistoryBase::hs_read_buffer
reads data from BOTH of these schemas! Reading from multiple schemas normally makes sense, as if the schema changed during your period of interest, you still want all the data. But here the schemas have overlapping validity periods (and in my case we read the full lot of data twice).
I don't know enough about the MySQL history system to suggest the correct resolution. Are both versions of the schema required (the one based on table name and the one based on event name)? If not, then you could probably remove the call to ReadMysqlTableNames
? If both are required, then perhaps some extra logic needs to be added to hs_read_buffer
so that if a variable matches multiple schemas, we don't re-read data for periods we've already read? Or you could do a "de-duplication" at the end of hs_read_buffer
if that’s easier?
Comments (11)
-
-
restored mysql database. K.O.
-
I think I see trouble when history events are renamed, i.e. /equipment/slow becomes /equipment/Slow.
sqlite history completely broke from this because sqlite database name “mh_slow_slow.sqlite3” not case sensitive on a Mac and “mh_Slow_slow” is same as “mh_slow_slow”.
But this does not seem to create duplicated data.
So to return duplicate data, either there is duplicate data in the database or we read the same data twice.
Also, in SchemaHistoryBase::hs_read_buffer(), we select the schema we will read, then we read them one at a time, but we do not keep track of time progress - so we trust that the schema are already time-ordered and we trust that we do not have duplicate/aliased schemas.
I think I will add a check there - keep track of time going forward, and complain if data is not time-ordered. This will also catch duplicate/aliased schema.
I think I can also catch aliased schema - if two schema refer to the same sql table and the time ranges overlap, we have aliasing.
But no way to duplicate the problem to confirm it is fixed…
K.O.
-
in history_schema.cxx there is confusion between case sensitive and case-non-sensitive things. in some places I use std::string operator=() which is case sensitive, in other places I use strcasecmp() which is case-insensitive.
since we now use utf-8 strings, I think we should bite the bullet and ditch all the case-insensitive stuff. use case-sensitive string comparisons everywhere.
K.O.
-
did the opposite, made all event name and variable names case-insensitive. This fixes problem with partially-case-insensitive sqlite and confusion with case sensitivity in mysql. K.O.
-
Initial attempt to implement protection against duplicate data was unsuccessful. Will try again… K.O.
-
Ok, I see the problem with js_hs_read_arraybuffer() - in history_schema.cxx, “class ReadBuffer” is protected against non-time-monotonic and duplicate data, but in mjsonrpc.cxx “class ReadBuffer” does not have this protection and duplicate data is possible. I think protection is best done in the history_schema base class where we know what is happening and are in a better position to complain about non-monotonic data and to detect duplicate data and duplicate schema. K.O.
-
Now that the history reader detects and complains about duplicate data, I see (maybe) duplicated data for run transition data using the FILE history. K.O.
https://daq16.triumf.ca/?cmd=History&group=PWB&panel=v_p2&A=1596475933&B=1596562333
-
Detection of duplicate data was not quite correct. Fixed. Looks good now. K.O.
-
mjsonrpc.cxx “class ReadBuffer” is now protected against duplicate and non-monotonous data by code in HsSqlSchema::read_data() and HsFileSchema::read_data(). K.O.
-
- changed status to resolved
Fixed in commit series ending with b13ffea. Branch feature/midas-2020-07 (by mistake, should be on branch "develop"). K.O.
- Log in to comment
working on restoring the mysql database on my laptop… K.O.