Let us know if it looks fine. If not what is the acceptable amount of performance degradation we are fine with ?
Additionally you can tweak some optimization based to change made if needed.
Thank you for your tuning. I had enough understanding of your needs.
I have some ideas.
It seems a bit difficult to give up performance. So I want to offer compression as an option.
If you don’t think badly, I want to fix it as follows.
It basically provides a compressed dictionary and gives users the option to choose whether to use uncompressed "Morpheme" or directly "CompressedMorpheme" at initialization time.
If the initial decompression time is too long, we might think of offering two versions separately.
What do you think about this?
If you agree, I'll write additional code after merge it.
I agree with you. Providing the option to the user to choose between memory optimized dictionary or not would helpful.
You can merge in the change and make it optional as you mentioned i am fine with it.
Thank you very much.
I do not know if this process will be clean. If not, I will think about separating the dictionary.
Could you PR to another branch? i can't change the target branch..
if you possible, new branch is good. if not, develop branch. please.
Sure, changed the PR to new branch
I've done some modifications to the code. I need to review the results. (Look at the heap_optimize branch.)
Please confirm that there is no major memory problem. Please check it against the existing compression rate.
And if you have any other opinions, please tell me.
To describe the altered part
Dictionary compression removal. As the pressure is reduced by jar
Snappy compression removal of CompressedMorpheme.surface It consumes most of the execution time and seems to have little compression effect because of its short length.
Adding "compress" option to elasticsearch plugin
In many ways, your ideas have been very helpful. If we have time, we can increase memory efficiency in more areas.
And I have optimized memory to some extent in the basic morpheme.
Please also check whether your system can run in the uncompressed mode.
Changes to BasicMorpheme are shown below.
Change the analysis results from list to stream.
Change the "feature" from an Array[String] to a String.
In the distant future, we will continue to optimize memory. And we want to keep only one mode.
the function name is CompressedAnalyzer.parse(str)