Repeated CV option overestimates RMSE error

Issue #15 new
Kareem Youssef created an issue

The use of 'crossvalidate' to predict for n times averages in a way that is actually different than what I get.

For example, if I do 0.10 100 CV, I am taking 10% of the dataset and creating a test set 10 times larger. Averaging all of these data points gives me let's say an error of 110K. But if I average only the duplicate entries between all the repeats first, and THEN average the remaining data that should be the correct average and I get an error that is lower by 5%.

Comments (9)

  1. Logan Ward

    I'm not entirely sure how you are averaging the results from this crossvalidation test.

    You are correct in that Magpie computes the RMSE by combining all of the test sets from each iteration into a single dataset and measuring the RMSE over all of these entries.

    In contrast, the CV accuracy is often measured by computing​ performance metrics for each test iteration and then averaging the metrics over all test iterations. Is this the approach you are using?

    For MAE, these two approaches (compute metrics using union of all test sets vs compute average of metrics of each test set) are equivalent. however, this is not the case for all metrics (e.g., RMSE).

  2. Kareem Youssef reporter

    Yes the approach you describe in your second point is what I am using. And in most cases not all, the RMSE error using my method is lower. Yes, the RMSE is not equivalent, and at least in my opinion, the CV accuracy measurement is more useful.

  3. Logan Ward

    Magpie currently doesn't easily support the "compute metrics for each iteration, average metrics" measurement easily. Typically, what I do to do this kind of evaluation is dump out the results from the CV test into a file and analyze them in another code. I find this solution to be tedious, but haven't figured an easy way to implement this test via the text interface of Magpie.

    Also, which "CV accuracy measurement" are referring to?

  4. Kareem Youssef reporter

    The CV accuracy measurement I'm talking about is dumping the magpie CV results as a CSV, using excel to consolidate duplicate compound entries via averaging the errors (which is built into excel), then calculating the rmse myself for the list that results from the consolidation.

  5. Logan Ward

    I do prefer the approach of averaging the results of each individual iteration as well, as it allows you to assess the variation in performance across multiple selections of training and test set.

    As the way Magpie currently evaluates CV performance is valid, I'm going to reclassify this issue as an "improvement/proposal" and not a "bug." I'll keep the priority up because it is a real annoyance to have to dump data out of Magpie to get CV statistics. Any objections?

    Also, I'm concerned about how you are measuring CV performance. You shouldn't need to consolidate duplicate entries when evaluating the CV performance of different test iterations. Magpie appends the results of each CV test sequentially to the output (ex: the first 10% of entries of a 90%/10% x 10 are from the first iteration, the next 10% are from the 2nd, ...). So, the method for measuring performance across each fold is to split the data by position in the output file, which does not require knowledge of which entries happen to be in each fold.

  6. Logan Ward

    On that note: Having to know that Magpie stores results from each test sequentially in the dataset is another reason why I want to overhaul this CV code.

  7. Kareem Youssef reporter

    Yeah that's fine with me.

    Oh, you're saying that it might be a better idea to just average every 10% test set in order, and then average all 100 sets into one RMSE. Yeah, I don't actually know if that would be significantly different. I can try that too.

  8. Log in to comment