SVM (fit model) take too long time to run

Issue #144 closed
Thang Hanam created an issue

EnMap-Box 3 seems take infinity time to implement a grid search for SVM. With the same dataset, RF both "fit" and "predict" perform very fast. QGIS shows no errors but the process will not complete.

I must kill QGIS from task manager.

Is this data issue or python codes?

Thank you so much for all your efforts!

Thang

Comments (18)

  1. Thang Hanam reporter

    It completed but take so much time:

    Processing algorithm
    Algorithm 'Fit SVC' starting
    Input parameters:
    { 'classification' : 'C:/Users/...../AppData/Local/Temp/processing_570d65150a5e41f5b5ff8184864ca365/98215dc828fb4e4b988b1e23911eb653/outClassification.bsq', 'code' : 'from sklearn.pipeline import make_pipeline\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.svm import SVC\n\nsvc = SVC(probability=False)\nparam_grid = {\'kernel\': [\'rbf\'],\n \'gamma\': [0.001, 0.01, 0.1, 1, 10],\n \'C\': [0.001, 0.01, 0.1, 1, 10]}\ntunedSVC = GridSearchCV(cv=3, estimator=svc, scoring=\'f1_macro\', param_grid=param_grid)\nestimator = make_pipeline(StandardScaler(), tunedSVC)\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n', 'mask' : None, 'outEstimator' : 
    
    Execution completed in 965.25 seconds
    Results:
    {'outEstimator': ''}
    
    Loading resulting layers
    Algorithm 'Fit SVC' finished
    

    My desktop computer:

    Core i7, 3.4 gHz

    RAM: only 8 gb (but the lagoon is not big at this moment)

    Windows 10 64bit.

    Will test on Linux again.

  2. Benjamin Jakimow

    @hnthang thanks for your report (and the previous too ! :-) ). @janzandr will have a look on it.

  3. Thang Hanam reporter

    Yes. I can.

    Please take the dataset at:

    https://goo.gl/xGtBCt

    involving: - A polygon shapefile of training

    • A polygon shapefile of validation

    • A TIFF file

    Thank you!

    Update: Same dataset, EnMap box IDL takes 20 minutes to fit SVC model.

  4. Andreas Janz

    I looked into your dataset. After rasterizing the class polygons into the image grid, you have the following number of samples per class: [23878, 16167, 5875, 484, 428]

    I would say that class one and two are way to big, which results in slow SVM training.

    You could draw a stratified random subset using the "Random -> Random Points from Classification" algorithm. The resulting point layer can be used as a mask inside the "Classification -> Fit SVC" algorithm to only use the randomly drawn points for training.

  5. Thang Hanam reporter

    Thank you very much. It worked. How simple it is, why don't I think of it first? :-)

    Ah, Does EnmapBox 3 supports a way to read PKL file? I would like to check some parameters (OOB, feature importance...) when I change the code.

    And if not, how can I check OOB, feature importance after model fitting?

  6. Andreas Janz

    Currently there is no GUI for that. But you can load a pkl file via the QGIS python console:

    >>> rfc = Classifier.unpickle(r'C:/Users/janzandr/AppData/Local/Temp/processing_e7b37ea1a0a640b1aec380578bd724fc/a343fb2d2abe41ca8ea2e6eeb9732c58/outEstimator.pkl')
    >>> rfc.sklEstimator().oob_score_
    0.59959349593495936
    rfc.sklEstimator().feature_importances_
    array([ 0.02207857,  0.02242417,  0.03868761,  0.04419304,  0.00263426,
            0.00785004,  0.0030435 ,  0.00979142,  0.00419932,  0.00566204,
            0.00575763,  0.00616819,  0.007472  ,  0.01606803,  0.02260101,
            0.00485058,  0.00259755,  0.00146085,  0.00337161,  0.00823481,
            0.00263131,  0.00307766,  0.00383824,  0.02410678,  0.00112342,
            0.0064868 ,  0.00679441,  0.00209385,  0.00418144,  0.00266206,
            0.00159367,  0.00266199,  0.00282572,  0.00375651,  0.00766116,
            0.00115968,  0.00592162,  0.00665346,  0.0005438 ,  0.00214737,
            0.02105924,  0.00303319,  0.00735307,  0.00189556,  0.00143297,
            0.01354313,  0.00893671,  0.00414995,  0.0090635 ,  0.00354392,
            0.00590729,  0.00146984,  0.00404902,  0.00228352,  0.0019509 ,
            0.00443734,  0.00133683,  0.00069481,  0.00297971,  0.00750999,
            0.00222324,  0.        ,  0.003636  ,  0.00077036,  0.00056854,
            0.00736293,  0.00479188,  0.00367434,  0.00482264,  0.00449001,
            0.00577165,  0.0016629 ,  0.00998304,  0.0043807 ,  0.00071108,
            0.00138022,  0.00853932,  0.00137204,  0.00163447,  0.00168628,
            0.0014465 ,  0.        ,  0.00335613,  0.00373376,  0.00045216,
            0.00227724,  0.00364296,  0.0050091 ,  0.00254149,  0.00270735,
            0.00515205,  0.00324603,  0.00135614,  0.00403323,  0.0015638 ,
            0.00408272,  0.01565615,  0.014656  ,  0.02227166,  0.00155571,
            0.00708097,  0.02193056,  0.00254782,  0.02405818,  0.01035958,
            0.01659258,  0.00506722,  0.00204223,  0.00058577,  0.00761214,
            0.00765514,  0.00095496,  0.00146222,  0.00117895,  0.00515196,
            0.00373413,  0.00745615,  0.00098959,  0.00043197,  0.00415811,
            0.00362786,  0.00725489,  0.00218093,  0.00182656,  0.00555986,
            0.0051268 ,  0.00514899,  0.00938855,  0.00702344,  0.00180092,
            0.00877679,  0.00173388,  0.00350579,  0.00556548,  0.00176326,
            0.00075152,  0.00730184,  0.003002  ,  0.00571242,  0.00337799,
            0.00219824,  0.00326059,  0.00280145,  0.00091224,  0.00621938,
            0.0011134 ,  0.00301511,  0.00226865,  0.00310923,  0.00535463,
            0.00155644,  0.01121872,  0.00286472,  0.00028323,  0.00861104,
            0.00504801,  0.00177807,  0.00050921,  0.        ,  0.00450059,
            0.00474561,  0.00382514,  0.01129892,  0.0063613 ,  0.00414795,
            0.00419037,  0.00676443,  0.00165656,  0.00665025,  0.00906159,
            0.00755743,  0.00549716,  0.00564823,  0.00886059,  0.00608699,
            0.00266069,  0.01098436])
    

    Note that you have to enable OOB estimation in the code snipped of the Fit Dialog:

    from sklearn.ensemble import RandomForestClassifier
    estimator = RandomForestClassifier(oob_score=True)
    
  7. Thang Hanam reporter

    Yeah, I got it but I can not reapply your code. Instead:

    import pickle 
    rfc = open('path to pkl file', 'rb') # read only and in binary 
    rf_new = pickle.load(rfc) 
    
    rf_new.sklEstimator().oob_score_ # check oob_score_ 
    rf_new.sklEstimator().features_importance_ # check feature importance 
    

    and release the same results.

    Just wonder:

    rfc = Classifier.unpickle(r'C...')
    

    For "Classifier", do we need to import somthing?

    For 'r', you mean 'read only' attribute?

  8. Andreas Janz

    Sorry, I forgot to show the import of Classifier:

    >>> from hubflow.core import Classifier
    >>> rfc = Classifier.unpickle(r'C:/Users/janzandr/AppData/Local/Temp/processing_e7b37ea1a0a640b1aec380578bd724fc/a343fb2d2abe41ca8ea2e6eeb9732c58/outEstimator.pkl')
    >>> rfc.sklEstimator().oob_score_
    0.59959349593495936
    
  9. Andreas Janz

    Oh, I missed that one: you can inspect the Classifier in the EnMAP-Box DataSources Panel:

    Unbenannt.PNG

    @jakimowb it seams that the list is not fully shown, which I guess was intended

    Unbenannt2.PNG

    But it would be great, if the user could click on the last line with the '...' to get the full list!

  10. Benjamin Jakimow

    I think in it's better to provide <to be named> "Movdel Visualzer" that, in case of RF, absolute and relative visualize feature importance, considers band names (important for none-spectral input) etc. Similar, there should be a SV model specific GUI, e.g. to visualize grid-search results and give information on support vectors etc.

  11. Andreas Janz

    Additional GUIs in the future are fine, but for now, having the full list by clicking in '...' would be great. Or simply remove the '...' and show the whole list.

  12. Thang Hanam reporter

    Looking for those features!

    Fow now, why EnMap Box have no information for Model Estimator (as illustrated in attached image) ?

    I run a SV grid search (as default), but see nothing in the Calssifier of the Models.

    Thanks! sv_model.jpg

  13. Andreas Janz

    The current version of this model browser has some limitations that will be overcome soon. We are working on it.

  14. Log in to comment