Compare-talos doesn't understand PGO/non-PGO split

Create issue
Issue #13 new
Justin Lebar created an issue

Check out Hit show details at the top, and look at Dromaeo CSS. The reported score is

3105.28 (min: 2736.49, max: 3472.94)

That's a pretty big difference! Hm. Looking at TBPL [1] reveals the source of the difference. Compare-talos doesn't understand that the Linux-PGO and Linux-NonPGO builds are different, and lumps both tests' scores together. Ouch.

This is dangerous breakage, and it makes compare-talos basically useless. If you compare to a revision which has PGO, everything will look like a regression. [And you can miss regressions which are hidden by compare-talos computing a huge variance for the test.]

I'm tempted to say we should take down the tool until we fix this, so people don't waste their time chasing regressions which appear and disappear depending on which original revs they pick.

Also, someone needs to own this tool.


Comments (4)

  1. mconnor repo owner

    (Reply via

    a) that's pointing at sdwilsh's test env, which is well behind tip by now b) there's a long list of things c-t doesn't understand, by design. It was designed to compare arbitrary csets against each other. Separating results based on buildconfig/branch/etc is an explicit non-goal. c) I think cset selection for PGO/non-PGO is a tbpl problem d) I still own c-t, and we've been fixing stuff as it comes up and pushing to prod as needed. There have been some discussions about handing it off, but I don't see a huge win in it.

  2. Phil Ringnalda

    tbpl respectfully declines to take the blame.

    Either releng needs to report that ts_foo and ts_foo_pgo are different tests on the same platform, or graphserver needs to expose an API that will say that they are different platforms/tests, or you need to switch to using an API that already does so, if it exists. But it's not tbpl's fault that people need to be able to compare the talos runs on a cset which happened to be on the tip when a timer-triggered PGO build happened with one which was not, or to compare one that only got one PGO run with one which sat on the tip until it got four, or to compare the change on PGO builds separately from the change on non-PGO builds.

    Take tbpl completely out of the mix: I have a (local, unbroken, hitting graphs-old) copy of the compare-talos web page open, and I want to know whether or not there were talos changes as a result of my merge in Right now, because it has gotten 3 timed PGO runs plus a nightly, I can only compare it (muddily, comparing the average of (4 x PGO + 1 x non-PGO)) to other csets with 1 non-PGO and 4 PGO runs. I could trigger another set of PGO runs on the cset before it, since lucky for me that sat on tip long enough to get 3 PGO runs, except it just ticked past 18:00, so I'm already getting my 4th timed PGO run on mine, so I'd need to trigger 2 more PGO runs on the one before, and then compare during the brief window when my latest and those two had completed but before mine gets another timed build, or another nightly.

  3. mconnor repo owner

    So, coming back to this, I'm not sure what the right outcome is here. On one hand, comparing PGO and non-PGO results can be useful if that's what you want to do. on the other hand, it's not something you want to accidentally do.

    I don't think graph server is giving us anything useful here, and I think that'll have to be the part that gets fixed first. Is there a bug on that?

  4. Justin Lebar reporter

    Graphserver has been abandoned in favor of the signal-from-noise project. If you want something fixed there, you'll probably have to do it yourself.

  5. Log in to comment