Matrix Operations: Softmax Implementation
Hello, I get a different result when I calculate Softmax of a matrix by the definition that I find at: https://en.wikipedia.org/wiki/Softmax_function
So, I wrote my own Softmax function in Python and for the example that you provided I have:
A = [[1.0, 2.0, 3.0],
[4.0, 1.0, 2.0],
[3.0, 4.0, 1.0]]
def softmax(x, axis=-1):
y = np.exp(x - np.max(x, axis, keepdims=True))
return y / np.sum(y, axis, keepdims=True)
array([[ 0.09003057, 0.24472847, 0.66524096], [ 0.84379473, 0.04201007, 0.1141952 ], [ 0.25949646, 0.70538451, 0.03511903]])
This is the same example by Blaze:
#include <iostream>
#include <blaze/Math.h>
int main()
{
blaze::StaticMatrix<double, 3UL, 3UL> A{ { 1.0, 2.0, 3.0 }
, { 4.0, 1.0, 2.0 }
, { 3.0, 4.0, 1.0 } };
blaze::StaticMatrix<double, 3UL, 3UL> B;
B = blaze::softmax(A);
std::cout << B << "\n";
return 0;
}
// ( 0.0157764 0.0428847 0.116573 ) // ( 0.316878 0.0157764 0.0428847 ) // ( 0.116573 0.316878 0.0157764 )
which result is the same as https://bitbucket.org/blaze-lib/blaze/wiki/Matrix%20Operations#!softmax
Therefore, I checked the Tensorflow response:
import tensorflow as tf
import numpy as np
a = tf.constant(np.array([[1.0, 2.0, 3.0],
[4.0, 1.0, 2.0],
[3.0, 4.0, 1.0]]))
with tf.Session() as s:
print(s.run(tf.nn.softmax(a)))
[[ 0.09003057 0.24472847 0.66524096] [ 0.84379473 0.04201007 0.1141952 ] [ 0.25949646 0.70538451 0.03511903]]
I think the denominator should be a vector not a scalar as in
template< typename VT, bool TF >
VT softmax( const blaze::Vector<VT,TF>& v )
{
VT tmp( exp( ~v ) );
const auto scalar( sum( ~tmp ) );
tmp /= scalar;
return tmp;
}
Would you please explain to me what I miss?
Comments (3)
-
-
- changed status to wontfix
Hi Bita!
Thanks for raising this issue. @mkotlyar is correct, the Blaze
softmax()
function computes the result from all matrix elements. Thus the sum of all elements of the resulting matrix is1
as defined at Wikipedia, paragraph 1. From my point of view, Wikipedia doesn't seem to provide a formal definition of how to deal with matrices or higher-dimensional data structures. Unfortunately, there is also no example for asoftmax()
evaluation on a matrix.The Blaze implementation tries to provide you with all options: You can compute the
softmax()
from all matrix elements (i.e. just callsoftmax()
on the matrix) or computesoftmax()
row- or column-wise (i.e. callsoftmax()
on each row or column of the matrix). For instance:StaticMatrix<double,3UL,3UL> A{ { 1.0, 2.0, 3.0 } , { 4.0, 1.0, 2.0 } , { 3.0, 4.0, 1.0 } }; StaticMatrix<double,3UL,3UL> B; B = softmax( A ); // Computing softmax for the complete matrix, sum( B ) == 1 StaticMatrix<double,3UL,3UL> C; for( size_t i=0UL; i<3UL; ++i ) { row( C, i ) = softmax( row( A, i ) ); // Computing softmax row-wise, sum( C ) == 3 } StaticMatrix<double,3UL,3UL> D; for( size_t i=0UL; i<3UL; ++i ) { column( D, i ) = softmax( column( A, i ) ); // Computing softmax column-wise, sum( D ) == 3 }
However, I agree that Blaze should provide a more convenient way to get the same results as Tensorflow. Thus we will extend Blaze with an overload for
softmax()
, which allows to compute a row- or columnwisesoftmax()
(see issue#218). Until we have finished the implementation, you can use the followingsoftmax()
overload:namespace blaze { template< bool RF // Reduction flag , typename MT // Type of the dense matrix , bool SO > // Storage order auto softmax( const DenseMatrix<MT,SO>& dm ) { auto tmp( evaluate( exp( ~dm ) ) ); if( RF == rowwise ) { for( size_t i=0UL; i<tmp.rows(); ++i ) { auto r = row( tmp, i, unchecked ); const auto scalar( sum( r ) ); r /= scalar; } } else { for( size_t j=0UL; j<tmp.columns(); ++j ) { auto c = column( tmp, j, unchecked ); const auto scalar( sum( c ) ); c /= scalar; } } return tmp; } } // namespace blaze
Example:
StaticMatrix<double, 3UL, 3UL> B; B = softmax<rowwise>( B );
Please note, though, that this implementation hasn't been tested yet! I still hope this helps,
Best regards,
Klaus!
-
reporter Thank you for your comprehensive response. I will follow issue
#218and utilize the usages you provided.Bita
- Log in to comment
Hello @taless474 , it looks like Blaze is computing softmax across all elements of its matrix argument, while your code and Tensorflow compute it column-wise.
From
blaze/math/dense/DenseMatrix.h
: