An application of Data-Oriented Parsing to Esperanto. Combines a
syntax and a morphology corpus.
|
|
|
|
|
cnf(tree)
make sure all terminals have POS tags; invent one if necessary
("parent_word") |
source code
|
|
|
stripfunc(tree)
strip all function labels from a tree with labels of the form
"function:form", eg. |
source code
|
|
|
dos(words)
`Data-Oriented Segmentation 1': given a sequence of segmented words
(ie., a sequence of morphemes), produce a dictionary with
extrapolated segmentations (mapping words to sequences of morphemes). |
source code
|
|
|
dos1(words)
`Data-Oriented Segmentation 2': given a sequence of segmented words
(ie., a sequence of morphemes), produce a dictionary with
extrapolated segmentations (mapping words to sequences of morphemes). |
source code
|
|
|
|
|
|
|
segmentor(segmentd)
wrap a segmentation dictionary in a naive unknown word segmentation
function with some heuristics (phonological rules could probably
improve this further) |
source code
|
|
|
morphmerge(tree,
md,
segmented)
merge morphology into phrase structure tree |
source code
|
|
|
morphology(train)
an interactive interface to the toy corpus |
source code
|
|
|
|
|
|
|
monato()
produce the goodman reduction of the full monato corpus |
source code
|
|