Commits

tiedeman  committed 4f8f6a7

fixed bug in identify_file

  • Participants
  • Parent commits 1ca0db9

Comments (0)

Files changed (2)

File Lingua-Identify-Blacklists/Changes

 	- read from files with line-length-limit
 	- integrated general-purpose language identifier (Lingua::Identify::CLD)
 
+0.04
+	- fixed a bug in identify file (classification was based on 64k only)

File Lingua-Identify-Blacklists/lib/Lingua/Identify/Blacklists.pm

 	}
 
 	# prepare the data for blacklist classification
-	# (TODO: we do not run blacklists all the time - 
-	#        schould we process the text later when needed?)
+	# (TODO: is this cheaper than keeping the text in memory and
+	#        processing it later when needed?)
 	chomp $line;
 	&process_string($line,\%dic,$total);
 	if ($options{text_size}){        # use only a certain number of words
 
     # finally: classify with blacklists
     if (exists $options{langs}){
-	# finally: process the text and classify
-	&process_string( $text, \%dic, $total );
 	return &classify( \%dic, %options );
     }