Repeating 'leave-one out CV' n times

Issue #14 resolved
Kareem Youssef created an issue

There seems to be a lack of functionality for using 'crossvalidate' command to repeat a leave one out CV any number of times. For example, the following line crashes even though it follows the same procedure as a 10/90 test/train split:

res = model crossvalidate $data 0.01 100 (line to be executed, intention is to do leave one out or 1/99 test/train split 100 times)

The above throws the following error:

java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:854) at magpie.data.Dataset.getMeasuredClassArray(Dataset.java:1565) at magpie.statistics.performance.RegressionStatistics.evaluate(RegressionStatistics.java:32) at magpie.models.BaseModel.crossValidate(BaseModel.java:362) at magpie.models.BaseModel.runCommand(BaseModel.java:824) at magpie.models.regression.BaseRegression.runCommand(BaseRegression.java:94) at magpie.user.CommandHandler.runCommandOnVariable(CommandHandler.java:384) at magpie.user.CommandHandler.assignment(CommandHandler.java:338) at magpie.user.CommandHandler.runCommand(CommandHandler.java:257) at magpie.user.CommandHandler.readFile(CommandHandler.java:406) at magpie.Magpie.main(Magpie.java:61) null

Comments (7)

  1. Logan Ward

    I'm not sure what's leading to this error. I suspect your dataset has less than 100 entries but cannot tell from the stack trace.

    Could you send me the script necessary to recreate this error? I'll see if I can figure it out and have Magpie yield a more useful error message.

    Also, a better way to to leave-one-out CV is to call k-fold CV where k is equal to the number of entries in your dataset. The random split CV pulls a random training set each time, and consequently entries will likely be selected for the training set more than once - in contrast to leave one out CV where they appear exactly once.

  2. Logan Ward

    It turns out that the problem was that 1% of the dataset of 44 entries is < 1. This means that the test sets will have 0 entries in them, which caused an error in the class that determines the statistics.

    I've added several different error messages to Magpie that will help make catching this error easier.

    Also, if you want to run LOOCV, try using this command: res = model crossvalidate $data loocv. This is just a shortcut for performing k-fold CV where k is equal to the number of entries.

  3. Kareem Youssef reporter

    Could I do something like:

    res = model crossvalidate $data loocv 100
    

    , to run the LOOCV 100 times?

  4. Logan Ward

    No, I don’t yet support running k-fold cross-validation multiple times. I’ve opened a separate issue for that.

  5. Log in to comment