mfe.c update_odb misaligns some structs during readout

Issue #39 closed
Ben Loer created an issue

the update_odb function of mfe.c copies banks to the Variables equipment subtree. When iterating through a struct defined in the bank list, the VALIGN macro is used to account for additional padding in the struct inserted by the compiler. However it does this by aligning the data pointer relative to 0, rather than relative to the start of the struct. This causes readout of structs starting with 8 byte variables (e.g. doubles) to be potentially misaligned by 4 bytes.

On my machine, 64 bit SLF6, I can produce this effect via:

typdef struct{
  double a;
  double b;
} mybank;

char* mybank_str[] = {
  "a = DOUBLE : 0",
  "b = DOUBLE : 0",
  0
};

BANK_LIST my_bank_list[] = {
  {"TEST", TID_STRUCT, sizeof(mybank), mybank_str},
  {""}
};

EQUIPMENT equipment[] = {

  { EQ_NAME,         /* equipment name */
    {
      EQ_EVID, (1<<EQ_EVID), /* event ID, trigger mask */
      "",              /* event buffer */
      EQ_PERIODIC,          /* equipment type */
      0,                    /* event source */
      "MIDAS",              /* format */
      TRUE,                 /* enabled */
      RO_RUNNING|RO_STOPPED|RO_PAUSED|RO_ODB, /* When to read */
      5000,                 /* poll every so milliseconds */
      0,                    /* stop run after this event limit */
      0,                    /* number of sub events */
      5,                    /* history period*/
      "", "", ""
    },
    read_slow_event, /* readout routine */
    NULL,NULL, /*not used*/
    my_bank_list  /* bank list structure */
  },
  { "" }
};


int read_slow_event(char *pevent, int off)
{
  bk_init32(pevent);

  mybank* pdata;


  bk_create(pevent, "TEST", TID_STRUCT, (void**)&pdata);
  pdata->a = 1.1;
  pdata->d = 2.2;
  bk_close(pevent, pdata+1);
  return bk_size(pevent);
}

The attached patch for mfe.c fixes the issue by explicitly making the alignment relative to the start of the struct.

Comments (6)

  1. dd1

    As summary, without running this code, TID_STRUCT with two doubles is correct in the bank but gibberish in ODB, correct? K.O.

    BTW, in the data bank, the two double will be misaligned (aligned to 4 bytes instead of 8 bytes) because the bank header is 3 four-byte integers, not 4 four-byte integers.

  2. Ben Loer reporter

    The data is gibberish in the ODB. I haven't tested whether readout in the bank is correct as this was intended to be a history-only equipment. Readout does work corretly with the patch so I think the bank must be fine.

    I suspect this problem could affect any code that relies on the VALIGN macro not just the one function that caused my problem.

  3. dd1

    I forgot to write this - I think if you change your code to use TID_DOUBLE instead of TID_STRUCT (as we usually do), the bug will go away and everything will work correctly. (the bug is still there, though). K.O.

  4. dd1

    Thomas and myself looked it this again. I think we understand what happened - if the data structure is in memory allocated by compiler (i.e. on the stack) or in memory allocated by malloc()/calloc(), the first element of the structure is aligned on it's natural alignment and the code works correctly. However for this function, the data is inside a midas event bank and the alignement is kind of arbitrary (it is aligned to 4 bytes, but not necessary to 8 bytes). In this case, if structure contains 8-bytes elements (TID_DOUBLE or bigger, i.e. 16-byte "long double"), the alignement code will always malfunction.

    If your code used TID_FLOAT instead of TID_DOUBLE, everything would have worked correctly.

    We are not sure if your fix is correct yet. But we understand your bug. K.O.

  5. Stefan Ritt

    To me the fix makes sense. I developed this code when 64-bit OSes were non-existing (yeah!). In principle we could align banks inside an event on a 64-bit boundary, but that would break binary compatibility. So to fix Ben's problem, I would vote for applying this patch. Anybody against?

  6. Thomas Lindner
    • edited description
    • changed status to closed

    No objections to Ben's fix for last 6 months, so commit. Fix worked for my tests.

  7. Log in to comment