I would want to check why the results are slightly different. Is precision being lost, and does that matter? If so, how much precision, and is it significant? Is the type being used different (long double or float)? It could be that something is being cast incorrectly, and that precision is being lost. Which might matter for some applications... even if it is a tiny bit.
Some precision is being lost, but not a lot - e.g. 38.71999999999999 != 38.72. Whether that could be significant, I don't know. Probing, the second number (from the Python code path) is, more precisely, 38.719999999999998863131622783839702606201171875.
It appears the C code is losing slightly more precision than the equivalent Python code.
Looking at this line a in _vector_distance_helper:
tmp is a double, and ...AsDouble should return a double, but is it possible that self->coords[i] is losing precision? The relevant lline at vector creation is vec->coords = PyMem_New(double, vec->dim);.
The precision is being lost in the accumulator. So we should just have some cases in there for the length vectors we support, plus a fallback to the current code for other lengths. I think that is the same problem with the dot implementation too. I haven't tested this, but I reckon that is it.
About the general discussion about precision:
I did not pay attention to numeric stability of the code. It is pretty straight forward naive implementation.
I hoped that for the typical use case (games) it wouldn't become an issue but I guess that Murphy's Law will strike at some point.