Add `similarity` option to disambiguate subcommand
Description
The purpose of this merge request is to add the similarity
strategy to dico disambiguate
subcommand. This strategy compare the two definitions (the one of the token to disambiguate and the one of the candidate symbol) and chose the symbol_id
of the symbol with the strongest similarity. By adding this strategy, the weight
option has been created. This option allow the user to give a certain weight to that similarity comparison and also take into account the order of symbols in the dictionary. A weight equal to 0 would than be the same as the first
strategy.
While making those changes, some function for disambiguation have been move from dictionary.py
to nlp.py
to keep only the essential part in the Dico class.
Modified files
-
src/disambiguate.py
to add the strategy and option to the parser -
src/dictionary.py
to modified the general disambiguation function of the Dico class -
src/nlp.py
to add the disambiguation functions for the two strategy -
README.md
to add the weight option in the example -
bats/test_dico_disambiguate.bats
to add tests for the new strategy and option -
src/requirements.txt
to add spacy models for similarity comparison
Added files
-
sample/json/small_lemma_other_language.json
for the test