No input dataset in Fit PCA...

Issue #1060 new

Agustin Lobo created an issue 2022-03-08

There is no place for the input dataset in Fit PCA:

‌

Comments (10)

Andreas Janz
Not sure what you mean with “no place for the input dataset”.

Input data (i.e. training data (?)) goes here:

‌
- 2022-03-08T20:22:07+00:00
Agustin Lobo reporter
And where do you select the input image? Note this input is optional, where is the mandatory input?

Also, you can select pkl files only
- 2022-03-09T07:15:24+00:00

Agustin Lobo reporter

See the example in https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

 import numpy as np
 from sklearn.decomposition import PCA
 X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
 pca = PCA(n_components=2)
 pca.fit(X)
PCA(n_components=2)
 print(pca.explained_variance_ratio_)
[0.9924... 0.0075...]
 print(pca.singular_values_)
[6.30061... 0.54980...]

where do you define X in EnMapBox panel?

2022-03-09T08:14:41+00:00

Andreas Janz
You have multiple options to create an unsupervised dataset (i.e. X data only). The first option is what you asked for:

‌

‌
- 2022-03-09T08:31:24+00:00
Andreas Janz
And where do you select the input image? Note this input is optional, where is the mandatory input?

If you don’t specify a training dataset, a unfitted estimator is created:

‌
- 2022-03-09T08:33:33+00:00
Agustin Lobo reporter
1. Using the term “training” for transformations is very odd, as transformations are not ML methods. You estimate a model, but not by training. In the case of PCA, you apply SVD. I suppose this comes from scikit-learn terminology, but I guess this is going to confuse many users.
2. Do not understand what you can get if no input at all is selected. What can you do with that pkl?
3. I guess, by default, the 1st option should be selected. This is how PCA is going to be run in 99% of the cases. I do applaud having the rest of options, though. I will certainly use them.
‌
- 2022-03-09T08:44:12+00:00
Andreas Janz
Using the term “training” for transformations is very odd, as transformations are not ML methods. You estimate a model, but not by training. In the case of PCA, you apply SVD. I suppose this comes from scikit-learn terminology, but I guess this is going to confuse many users.

I see your point, but as you guessed, we want to be consistent with Scikit-Learn terminology and workflows.

ENVI Classic users will be confused, for sure. Hopefully, they will adopt.

‌

Do not understand what you can get if no input at all is selected. What can you do with that pkl?

So far, not much. I originally needed it in the classification case, for later cross-validation or feature ranking, but that does not apply here.

I will make it non-optional.

‌

I guess, by default, the 1st option should be selected. This is how PCA is going to be run in 99% of the cases.

Not sure what you mean with “by default, the 1st option should be selected”. You have to click on one of the options.

‌
- 2022-03-09T09:30:40+00:00
Agustin Lobo reporter
we want to be consistent with Scikit-Learn terminology and workflows.

ENVI Classic users will be confused, for sure. Hopefully, they will adopt.

Hopefully they will not, and certainly will not be able to use that terminology in scientific articles. This is not an issue of ENVI terminolgy, it is conceptual. These transforms are not based on training, they are based on linear algebra.

I guess, by default, the 1st option should be selected. This is how PCA is going to be run in 99% of the cases.

‌

Not sure what you mean with “by default, the 1st option should be selected”. You have to click on one of the options.

One option could be:

Input Dataset

Instead of
Training Dataset [optional]

And when you select ...
it should open the "Create unsupervised dataset (from feature raster)" panel directly, instead of searching for a pkl.

Another option could be

Select Type of Input
(instead of Training Dataset [optional] and the tiny wheel)
with the same options as the wheel but including a 6th option as "None (for unfitted transform pkl file)"

‌
- 2022-03-09T10:37:08+00:00
Andreas Janz
‌

Hopefully they will not, and certainly will not be able to use that terminology in scientific articles. This is not an issue of ENVI terminolgy, it is conceptual. These transforms are not based on training, they are based on linear algebra.

‌

I can see, why you think that in case of a “simple” PCA, but Transformers really are Maschine Learners. I guess you would agree in case of KernelPCA, right?

Even if simple PCA only requires linear algebra, it can be argued, that this is a simple form of sample based (maschine) learning. Even if you but in a whole raster image, technically it is still a training sample.

Anyways, I see your point, but I think I want to keep it as is.

‌

One option could be:

Input Dataset

Instead of
Training Dataset [optional]

I’ll discuss this with the team. Maybe Input is better than Training.

I already removed the [Optional].

‌

And when you select ...
it should open the "Create unsupervised dataset (from feature raster)" panel directly, instead of searching for a pkl.

Don’t like that to much. I really want it to behave like the other learners (classifier, regressor, clusterer).

‌
- 2022-03-09T11:08:27+00:00
Agustin Lobo reporter
well, you need to find a balance between standardization and respect to diversity. Not all algorithms can use the same terminology, but having a particular terminology for each algorithm would be a mess.
- 2022-03-09T11:12:53+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: new

Component: –

Milestone: –

Version: –

Votes: 0

Watchers: 1