Skip to content

Add `similarity` option to disambiguate subcommand

Hébert-Pinard, Charlie requested to merge dico-disambiguate into master

Description

The purpose of this merge request is to add the similarity strategy to dico disambiguate subcommand. This strategy compare the two definitions (the one of the token to disambiguate and the one of the candidate symbol) and chose the symbol_id of the symbol with the strongest similarity. By adding this strategy, the weight option has been created. This option allow the user to give a certain weight to that similarity comparison and also take into account the order of symbols in the dictionary. A weight equal to 0 would than be the same as the first strategy.

While making those changes, some function for disambiguation have been move from dictionary.py to nlp.py to keep only the essential part in the Dico class.

Modified files

  • src/disambiguate.py to add the strategy and option to the parser
  • src/dictionary.py to modified the general disambiguation function of the Dico class
  • src/nlp.py to add the disambiguation functions for the two strategy
  • README.md to add the weight option in the example
  • bats/test_dico_disambiguate.bats to add tests for the new strategy and option
  • src/requirements.txt to add spacy models for similarity comparison

Added files

  • sample/json/small_lemma_other_language.json for the test

Merge request reports