PearNorm inconsistency (Distancer)

Issue #24 resolved
Qian Zhu
created an issue

In: CPCL::Distance() function in pcl.cpp:

The result of calculating pearson(x,y) is different from pearson(y,x) in some rare cases. For example, in line: Dat.Set(i, j, (float) pMeasure->Measure(adOne, PCL.GetExperiments(), PCL.Get(iTwo), PCL.GetExperiments(), eMap, adWeights, adWeights));

Changing the order of parameters to: Dat.Set(i, j, (float) pMeasure->Measure(PCL.Get(iTwo), PCL.GetExperiments(), adOne, PCL.GetExperiments(), eMap, adWeights, adWeights));

Gives different results for the PCL file: GSE3788.GPL96.pcl (attached) When X=5743, Y=916 (entrez gene id)

The results are 4.90 and 4.60 (for the Pearson(x,y) and Pearson(y,x) respectively).

I haven't figured out the exact lines causing this bug. But for now, take caution using Pearson or PearNorm functions in Distancer.

Comments (4)

  1. Casey Greene

    PearNorm just calls Pearson and then transforms it. It's got to be in this:

    It looks like you're looking at quite extreme values (4.9 and 4.6 must be normalized pearson). What are the raw correlation values?

    Perhaps there are differences in these floating point calculations:

            else {
                    if (dDX)
                            dRet /= sqrt(dDX);
                    if (dDY)
                            dRet /= sqrt(dDY);

    Given that

     if (!dDX || !dDY)

    was just checked, can you convert those lines to

            else {
                    dRet /= ( sqrt(dDX) * sqrt( dDY ) );

    to see if that fixes the problem?

  2. Log in to comment