Matrix Operations: Softmax Implementation

Issue #217 wontfix
Bita HashemiNezhad created an issue

Hello, I get a different result when I calculate Softmax of a matrix by the definition that I find at: https://en.wikipedia.org/wiki/Softmax_function

So, I wrote my own Softmax function in Python and for the example that you provided I have:

A = [[1.0, 2.0, 3.0],
     [4.0, 1.0, 2.0], 
     [3.0, 4.0, 1.0]]
def softmax(x, axis=-1):
    y = np.exp(x - np.max(x, axis, keepdims=True))
    return y / np.sum(y, axis, keepdims=True)

array([[ 0.09003057, 0.24472847, 0.66524096], [ 0.84379473, 0.04201007, 0.1141952 ], [ 0.25949646, 0.70538451, 0.03511903]])

This is the same example by Blaze:

#include <iostream>
#include <blaze/Math.h>
int main()
{
    blaze::StaticMatrix<double, 3UL, 3UL> A{ { 1.0, 2.0, 3.0 }
                                     , { 4.0, 1.0, 2.0 }
                                     , { 3.0, 4.0, 1.0 } };
    blaze::StaticMatrix<double, 3UL, 3UL> B;
    B = blaze::softmax(A);     
    std::cout << B << "\n";
    return 0;
}

// ( 0.0157764 0.0428847 0.116573 ) // ( 0.316878 0.0157764 0.0428847 ) // ( 0.116573 0.316878 0.0157764 )

which result is the same as https://bitbucket.org/blaze-lib/blaze/wiki/Matrix%20Operations#!softmax

Therefore, I checked the Tensorflow response:

import tensorflow as tf
import numpy as np
a = tf.constant(np.array([[1.0, 2.0, 3.0],
     [4.0, 1.0, 2.0], 
     [3.0, 4.0, 1.0]]))
with tf.Session() as s:
    print(s.run(tf.nn.softmax(a)))

[[ 0.09003057 0.24472847 0.66524096] [ 0.84379473 0.04201007 0.1141952 ] [ 0.25949646 0.70538451 0.03511903]]

I think the denominator should be a vector not a scalar as in

template< typename VT, bool TF >
VT softmax( const blaze::Vector<VT,TF>& v )
{
   VT tmp( exp( ~v ) );
   const auto scalar( sum( ~tmp ) );
   tmp /= scalar;
   return tmp;
}

Would you please explain to me what I miss?

Comments (3)

  1. Mikhail Katliar

    Hello @taless474 , it looks like Blaze is computing softmax across all elements of its matrix argument, while your code and Tensorflow compute it column-wise.

    From blaze/math/dense/DenseMatrix.h:

    template< typename MT  // Type of the dense matrix
            , bool SO >    // Storage order
    auto softmax( const DenseMatrix<MT,SO>& dm )
    {
       auto tmp( evaluate( exp( ~dm ) ) );
       const auto scalar( sum( ~tmp ) );
       tmp /= scalar;
       return tmp;
    }
    
  2. Klaus Iglberger

    Hi Bita!

    Thanks for raising this issue. @mkotlyar is correct, the Blaze softmax() function computes the result from all matrix elements. Thus the sum of all elements of the resulting matrix is 1 as defined at Wikipedia, paragraph 1. From my point of view, Wikipedia doesn't seem to provide a formal definition of how to deal with matrices or higher-dimensional data structures. Unfortunately, there is also no example for a softmax() evaluation on a matrix.

    The Blaze implementation tries to provide you with all options: You can compute the softmax() from all matrix elements (i.e. just call softmax() on the matrix) or compute softmax() row- or column-wise (i.e. call softmax() on each row or column of the matrix). For instance:

    StaticMatrix<double,3UL,3UL> A{ { 1.0, 2.0, 3.0 }
                                  , { 4.0, 1.0, 2.0 }
                                  , { 3.0, 4.0, 1.0 } };
    
    StaticMatrix<double,3UL,3UL> B;
    B = softmax( A );  // Computing softmax for the complete matrix, sum( B ) == 1
    
    StaticMatrix<double,3UL,3UL> C;
    for( size_t i=0UL; i<3UL; ++i ) {
       row( C, i ) = softmax( row( A, i ) );  // Computing softmax row-wise, sum( C ) == 3
    }
    
    StaticMatrix<double,3UL,3UL> D;
    for( size_t i=0UL; i<3UL; ++i ) {
       column( D, i ) = softmax( column( A, i ) );  // Computing softmax column-wise, sum( D ) == 3
    }
    

    However, I agree that Blaze should provide a more convenient way to get the same results as Tensorflow. Thus we will extend Blaze with an overload for softmax(), which allows to compute a row- or columnwise softmax() (see issue #218). Until we have finished the implementation, you can use the following softmax() overload:

    namespace blaze {
    
    template< bool RF      // Reduction flag
            , typename MT  // Type of the dense matrix
            , bool SO >    // Storage order
    auto softmax( const DenseMatrix<MT,SO>& dm )
    {
       auto tmp( evaluate( exp( ~dm ) ) );
    
       if( RF == rowwise ) {
          for( size_t i=0UL; i<tmp.rows(); ++i ) {
             auto r = row( tmp, i, unchecked );
             const auto scalar( sum( r ) );
             r /= scalar;
          }
       }
       else {
          for( size_t j=0UL; j<tmp.columns(); ++j ) {
             auto c = column( tmp, j, unchecked );
             const auto scalar( sum( c ) );
             c /= scalar;
          }
       }
    
       return tmp;
    }
    
    } // namespace blaze
    

    Example:

    StaticMatrix<double, 3UL, 3UL> B;
    B = softmax<rowwise>( B );
    

    Please note, though, that this implementation hasn't been tested yet! I still hope this helps,

    Best regards,

    Klaus!

  3. Bita HashemiNezhad reporter

    Thank you for your comprehensive response. I will follow issue #218 and utilize the usages you provided.

    Bita

  4. Log in to comment