Add subcommand 'learnable' (!41) · Merge requests · dictionary / dico

Medina Cardenas, Lorena Giovanna requested to merge learnable into master Feb 14, 2022

Description

The purpose of this subcommand 'learnable' is to get a set of words that can be learned through dictionary look up. The process starts with a given list of words and it can be 0 to n levels to build the output set, as follows:

Level 0 : The symbols found in the dictionary for the input list of words.(We assume the absence of some words).
Level 1 : We add the symbols whose complete definition contains words in the set created in level 0.
Level 2 ..n : Similar to 1, but we use the set created one level back.

Subcommand

1. The subcommand syntax could be:

$ ./bin/dico learnable -h
usage: dico learnable [-h] [-i PATH] [-o PATH] [-w PATH] [-v] [-k LEVEL]

generate an learnable set of words using a given input word set 

optional arguments:
  -h, --help            show this help message and exit
  -i PATH, --in PATH    input file (default: stdin)
  -o PATH, --out PATH   output file (default: stdout)
  -w PATH, --set PATH   input file for a set of vocables(default: stdin)
  -v, --verbose         show more verbose messages
  -k LEVEL, --level LEVEL
                        level of the search

2. Output

The out put set could be a set of the symbols_id (see complete example below):

{34, 3, 6, 8, 10, 11, 14, 19, 20, 26, 27, 30}

other possibilities are:

a dictionary (format JSON) with the symbols and definitions arrays like 386_dictionary
a set of vocables (lemma) instead of the symbols_id, like :

{"thing", "condition", "do", "general"}

Which one is the best?

3. The input set

It could be a file with a JSON format like:

$ cat sample/json/learnable_input_vocables.json
{"vocables":["thing","important","particular"]}

If we use a JSON format, to catch already a JSON to manipulate is easier.

4. Complete example

If we use the 386_dictionary, and the input : {"vocables":["thing","important","particular"]} and a level "1", we get:

$ ./bin/dico learnable -i sample/json/386_thing.json -w sample/json/learnable_input_vocables.json -v -k 1
# input file: 'sample/json/386_thing.json'
# output file: '/dev/stdout'
# loaded digraph of 35 vertices and 120 arcs
{34, 3, 6, 8, 10, 11, 14, 19, 20, 26, 27, 30}
# learnable: End of processing

Explanation:

Symbols_id (input): "important": 11, "particular": 19, "thing": 34.

The output has these symbols_id and also all the symbols_id whose definitions have these symbols_id.

For example symbol_id = 3, has "particular":

        "symbol_id": 3,
        "raw": "the particular state that something is in",

Add subcommand 'learnable'