Calculate TF-IDF weight.
Calculate TF-IDF weight.
Lucene 4.6's DefaultSimilarity TF-IDF calculation uses the formula:
sqrt(termFreq) * (log(numDocs / (docFreq + 1)) + 1.0)
Note: this is consistent with the MapReduce seq2sparse implementation of TF-IDF weights and is slightly different from Spark MLlib's TD-IDF calculation which is implemented as:
termFreq * log((numDocs + 1.0) / (docFreq + 1.0))
term freq
doc freq
Length of the document - UNUSED
the total number of docs
The TF-IDF weight as calculated by Lucene 4.6's DefaultSimilarity