Harvesting scripts speed up
I am running harvesting with 10000 games from 15-3 size right now. After 24 hours I am down to size 6, will take another 12 hours to finish.
More than 90% of the processor time is used for loading the gammas by oakfoam.
Would be great, it the gammas would only be loaded once for a size, and all games run through one oakfoam instance then. (Simelar to the solution for training gammas)
This would make my parallel changes unnecessary I think.
Comments (13)
-
reporter -
reporter ssems to be easier than I thought, I started (first suggestion) and will let you know
-
reporter - attached harvest-collection-circular2.sh
Could you have a short look at this. you will probably see much more quickly than me, if I made something wrong.
Should be a replacement for harvest-collection-circular.sh
Thanks a lot
-
reporter ok, experience: harvesting 10000 games took 2 hours with this new harvest-collection-circular2.sh
this is about 10 times faster as before, I did not end the last run stopped it after 30 hours with more than 10 hours left.
How should I handle this. Savest way: I add
harvest-collection-circular2.sh harvest-collection-circular-range2.sh
to my repository.
-
reporter I double checked: seems to give exactly the same result as the script before, if one sets the featurelist probability to 1.0
-
repo owner The reason the circular patterns are loaded for each game separately is because I found that when they the patterns were harvest with one process the output grew very big and the post-processing (sorting and 'uniq -c') would crash.
I had a look at the attached file:
- Lines 20-23 don't make sense there.
- Lines 60-64 will probably take very long when a large number of patterns are found.
How much RAM was used for the post-processing parts?
-
reporter harvest-collection-cirular2.sh never used significant amount of RAM. I did some googleing, and it seems the unix tools sort and uniq use temp files in an intelligent manner. But of cause you are right, it takes some time.
I had no crash at all. I will keep you tuned. In the original script harvest-combine.sh used a lot of ram and crashed (8GB in the awk process when harvesting 16000 games)
-
repo owner You say it takes "some time". How long are we talking? I recall now that the time it took was very long and provided zero feedback until it either completed or crashed.
-
reporter when harvesting 10000 games for sizes 15-3 it took about two hours. In this lines it was running about < 10 min for each size. If I remember correctly size 15 took 19 min in total.
The speed up is from the smaller sizes, were loading gammas took >90% of time before.
Detlef
-
repo owner My only issue is that 10 min without any feedback is not ideal. Then I am unsure if the script is working or has failed. I will think a bit if it is possible to somehow show progress while doing it in one step.
-
reporter A little help would be to display a dot between, grep sed grep sort uniq sort
so the dead time is 3 min perhaps, but will increase with higher game numbers of cause
Detlef
-
reporter A short note, if you have sort crashing it is probably due to tmp mounted as ram disk?! Disk space is usually not an issue...
Detlef
-
repo owner - changed status to resolved
It seems like this has been resolved
- Log in to comment
There is another problem with the scripts:
The harvest-combine.sh script uses a lot of ram, it can not handle 16000 games on 8GB of ram. As I would like to be able to harvest >100000 games (this is what is availible from kgs stronger 6d) and 16GB of ram cost 150$:) a solution would be nice.
I could look after this part, do you have time to do the scripting for gammas. I am so very bad with this scripts:(