×
Similarity Methods Explained
Jaccard Similarity
Jaccard similarity measures the similarity between two sets (unique words). Calculated as: (size of intersection) / (size of union). Learn more on Wikipedia
Cosine Similarity
Cosine similarity measures the cosine of the angle between two term frequency vectors. A cosine of 1 means identical, 0 means no shared terms. Learn more on Wikipedia
Levenshtein Distance
The Levenshtein distance (edit distance) is the minimum number of edits (insertions, deletions, substitutions) to transform one string into another. Displayed here as a normalized percentage similarity. Learn more on Wikipedia
Manhattan Distance
The Manhattan distance (city block distance) is the sum of the absolute differences of vector coordinates. Here, we use term frequencies. Learn more on Wikipedia