__init__(self,
weightedrules=None,
lexicon=None,
rootsymbol=None,
unknownwords=None,
openclassdfsa=None,
cleanup=True,
n=10,
name='
' )
(Constructor)
| source code
|
Interface to bitpar chart parser. Expects a list of weighted
productions with frequencies (not probabilities).
- Parameters:
weightedrules - sequence of tuples with strings (lhs and rhs separated by tabs,
eg. "S NP VP") and frequencies. The reason we use this
format is that it is close to bitpar's file format; converting a
weighted grammar with probabilities to frequencies would be a
detour, and bitpar wants frequencies so it can do smoothing.
lexicon - set of strings belonging to the lexicon (ie., the set of
terminals)
rootsymbol - starting symbol for the grammar
unknownwords - a file with a list of open class POS tags with frequencies
openclassdfsa - a deterministic finite state automaton, refer to the bitpar
manpage.
cleanup - boolean, when set to true the grammar files will be removed when
the BitParChartParser object is deleted.
name - filename of grammar files in case you want to export it, if not
given will default to a unique identifier
n - the n best parse trees will be requested >>> wrules = (
("S\tNP\tVP", 1),
("NP\tmary", 1),
("VP\twalks", 1) ) >>> p =
BitParChartParser(wrules,
set(("mary","walks"))) >>> tree =
p.parse("mary walks".split()) >>> print tree
(S (NP mary) (VP walks)) (p=1.0)
>>> from dopg import GoodmanDOP
>>> d = GoodmanDOP([tree], parser=InsideChartParser)
>>> d.parser.parse("mary walks".split())
ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary'])
(p=1.0), ProbabilisticTree('VP@2', ['walks']) (p=1.0)])
(p=0.444444444444)
>>> d.parser.nbest_parse("mary walks".split(), 10)
[ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary']) (p=1.0),
ProbabilisticTree('VP@2', ['walks']) (p=1.0)]) (p=0.444444444444),
ProbabilisticTree('S', [ProbabilisticTree('NP', ['mary']) (p=1.0),
ProbabilisticTree('VP@2', ['walks']) (p=1.0)]) (p=0.222222222222),
ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary']) (p=1.0),
ProbabilisticTree('VP', ['walks']) (p=1.0)]) (p=0.222222222222),
ProbabilisticTree('S', [ProbabilisticTree('NP', ['mary']) (p=1.0),
ProbabilisticTree('VP', ['walks']) (p=1.0)]) (p=0.111111111111)]
>>> d = GoodmanDOP([tree], parser=BitParChartParser)
writing grammar
>>> d.parser.parse("mary walks".split())
ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP@2', ['walks'])]) (p=0.444444)
>>> list(d.parser.nbest_parse("mary walks".split()))
[ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP@2', ['walks'])])
(p=0.444444),
ProbabilisticTree('S', [Tree('NP', ['mary']), Tree('VP@2', ['walks'])])
(p=0.222222),
ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP', ['walks'])])
(p=0.222222),
ProbabilisticTree('S', [Tree('NP', ['mary']), Tree('VP', ['walks'])])
(p=0.111111)]
TODO: parse bitpar's chart output / parse forest
|