nrnr / perfnotes.txt

Create neighborhood: 15.615 sec
80r = Search block 5120872: 0.296 sec
40r = Search block 639449: 0.114 sec
20r = Search block 79971: 0.047 sec
10r = Search block 20031: 0.021 sec
Cleanup: 0.012 sec

40r 4d: sig11
40r 3d: .119 sec
40r 2d: .057 sec
40r 1d: .023 sec

inserting 10000000 random vectors... 29.000 sec
80r = range query returned 2680049 items in 1.02700 sec
40r = range query returned 334901 items in 0.16200 sec
20r = range query returned 41902 items in 0.02000 sec
10r = range query returned 5175 items in 0.00300 sec
cleaned up in 2.32100 sec

Create Kdtree: 12.735sec
80r = Search: 2680049, 0.512sec
40r = Search: 334901, 0.076sec  ! crazy, similar speed between 20 and 40 unit radius !
20r = Search: 41902, 0.07sec
10r = Search: 5175, 0.001sec
Cleanup: 0.0279999sec

On the creation of the neighborhood. Reduce the 'inverse' to a single array instead
of one per dimension. This shortens create time from 16.5s to 12.0s. Reduces memory
by 76mb
    Create neighborhood: 12.069 sec
    Search block 639983: 0.116 sec
    Cleanup: 0.011 sec

Area of a unit sphere is ~.52. Validated by the number of items found
in the rectangular search compared to the spherical search of kdtree and nanoflann

Minimize the code inside the search loop. Gives about 20% performance.
Not digging in the right place, but gets the 40radius test to .1 second.
    Create neighborhood: 12.076 sec
    Search block 639983: 0.09 sec
    Cleanup: 0.010 sec

Switched the code to work in 'chunks'. This breaks the array of point numbers
into small sequential ranges. The algorithm wins with these smaller sections.
The other win is that all the array offsets now only index the chunk. By making
the chunk size smaller we can fit the indices into uint8 or a uint16. Cuts a
good 40% off of runtime and really drops memory. We could use still significantly
less memory by using some form of stack to store the results in, currently makes
a size_t array of numPoints size, which means a good 40MB.

Notice the sizes of results is varying here, so there is some problem which could
strike these results as irrelevent. I am highly dubious since these are getting
about half of the results as the unchunked.

Wow, look at the time needed to build the hood. Super fast, and that is (?) valid time.
(Hulk Hogan eat your heart out!)

    -- with 10k chunks in size_t index
    Create neighborhood: 3.332 sec
    Search complete 640842: 0.066 sec
    Cleanup: 0.008 sec

    -- with 10k chunks in uint16 index
    Create neighborhood: 2.942 sec
    Search complete 592354: 0.054 sec
    Cleanup: 0.010 sec

    -- with 255 chunks in uint8 index
    Create neighborhood: 0.887 sec
    Search complete 332549: 0.053 sec
    Cleanup: 0.008 sec

Memory needed to create the 10mill array of points (baseline). So hood and results
are amount of memory over this.