Semantic Textual Similarity using Machine Learning and Conceptual Relatedness
9 Pages Posted: 17 Apr 2020
Date Written: April 15, 2020
Abstract
Large amount of data is available in today’s world which can’t be stored in physical devices. This data contains huge amount of redundant information which could be grouped together and categorized. We present a system which gives the degree of equivalence between two statements i.e. Semantic Textual Similarity (STS). Given two textual fragments, the goal of the system is to determine their semantic similarity i.e. how much are they similar in terms of their meaning. Our system makes use of four different measures of text similarity: 1. Word n-gram overlap. 2. Character n-gram overlap. 3. Se-mantic overlap. 4. Conceptual overlap. Using these measures as features, it trains a sup-port vector regression model on SemEval STS data. Evaluation is done using the Pearson Correlation Coefficient.
Keywords: Semantic Textual Similarity, N-gram, Conceptual Similarity
Suggested Citation: Suggested Citation