Semantic similarity

1/24/2024

EM is a binary metric that returns 1 if two strings (including their positions in a document) are identical and 0 if they aren’t. Both EM and F1 measure performance in terms of lexical overlap. That’s why we rely on metrics to tell us how well - or how poorly - a model is doing. And it does make sense to check a subsample of answers manually, even if only to get a feel for the system.īut it’s clearly beyond our capacity to evaluate hundreds or thousands of results each time we want to retrain a model. In an ideal world, we would have enough time to evaluate our machine learning system’s predictions by hand to get a good understanding of its capabilities. When we build, train, and fine-tune language models, we need a way of knowing how well these models ultimately perform. In this blog post, we’ll show you how to use SAS in Haystack and provide some interpretation guidelines. Rather than measuring lexical overlap, it seeks to compare two answer strings based on their semantic similarity, allowing it to better approximate human judgment than both EM and F1.

Like the language models that we employ in question answering and other NLP tasks, the SAS metric builds upon Transformers. We first introduced SAS in August of 2021 with a paper that was accepted at the conference for Empirical Methods in Natural Language Processing (EMNLP). That’s why we’re excited to introduce a new metric: Semantic Answer Similarity (SAS). However, both metrics sometimes fall short when evaluating semantic search systems.

In our recent post on evaluating a question answering model, we discussed the most commonly used metrics for evaluating the Reader node’s performance: Exact Match (EM) and F1, which measures precision against recall.

0 Comments

Semantic similarity

Leave a Reply.

Author

Archives

Categories