Check out http://goo.gl/LwuOA. Hit show details at the top, and look at Dromaeo CSS. The reported score is
3105.28 (min: 2736.49, max: 3472.94)
That's a pretty big difference! Hm. Looking at TBPL  reveals the source of the difference. Compare-talos doesn't understand that the Linux-PGO and Linux-NonPGO builds are different, and lumps both tests' scores together. Ouch.
This is dangerous breakage, and it makes compare-talos basically useless. If you compare to a revision which has PGO, everything will look like a regression. [And you can miss regressions which are hidden by compare-talos computing a huge variance for the test.]
I'm tempted to say we should take down the tool until we fix this, so people don't waste their time chasing regressions which appear and disappear depending on which original revs they pick.
Also, someone needs to own this tool.