Module bitpar :: Class BitParChartParser
[hide private]
[frames] | no frames]

Class BitParChartParser

source code

Instance Methods [hide private]
 
__init__(self, weightedrules=None, lexicon=None, rootsymbol=None, unknownwords=None, openclassdfsa=None, cleanup=True, n=10, name='')
Interface to bitpar chart parser.
source code
 
__del__(self) source code
 
start(self) source code
 
stop(self) source code
 
parse(self, sent) source code
 
nbest_parse(self, sent, n_will_be_ignored=None)
n has to be specified in the constructor because it is specified as a command line parameter to bitpar, allowing it here would require potentially expensive restarts of bitpar.
source code
 
writegrammar(self, f, l)
write a grammar to files f and l in a format that bitpar understands.
source code
Method Details [hide private]

__init__(self, weightedrules=None, lexicon=None, rootsymbol=None, unknownwords=None, openclassdfsa=None, cleanup=True, n=10, name='')
(Constructor)

source code 

Interface to bitpar chart parser. Expects a list of weighted productions with frequencies (not probabilities).

Parameters:
  • weightedrules - sequence of tuples with strings (lhs and rhs separated by tabs, eg. "S NP VP") and frequencies. The reason we use this format is that it is close to bitpar's file format; converting a weighted grammar with probabilities to frequencies would be a detour, and bitpar wants frequencies so it can do smoothing.
  • lexicon - set of strings belonging to the lexicon (ie., the set of terminals)
  • rootsymbol - starting symbol for the grammar
  • unknownwords - a file with a list of open class POS tags with frequencies
  • openclassdfsa - a deterministic finite state automaton, refer to the bitpar manpage.
  • cleanup - boolean, when set to true the grammar files will be removed when the BitParChartParser object is deleted.
  • name - filename of grammar files in case you want to export it, if not given will default to a unique identifier
  • n - the n best parse trees will be requested >>> wrules = ( ("S\tNP\tVP", 1), ("NP\tmary", 1), ("VP\twalks", 1) ) >>> p = BitParChartParser(wrules, set(("mary","walks"))) >>> tree = p.parse("mary walks".split()) >>> print tree (S (NP mary) (VP walks)) (p=1.0)
    >>> from dopg import GoodmanDOP
    >>> d = GoodmanDOP([tree], parser=InsideChartParser)
    >>> d.parser.parse("mary walks".split())
    ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary'])
    (p=1.0), ProbabilisticTree('VP@2', ['walks']) (p=1.0)])
    (p=0.444444444444)
    >>> d.parser.nbest_parse("mary walks".split(), 10)
    [ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary']) (p=1.0),
            ProbabilisticTree('VP@2', ['walks']) (p=1.0)]) (p=0.444444444444),
    ProbabilisticTree('S', [ProbabilisticTree('NP', ['mary']) (p=1.0),
            ProbabilisticTree('VP@2', ['walks']) (p=1.0)]) (p=0.222222222222),
    ProbabilisticTree('S', [ProbabilisticTree('NP@1', ['mary']) (p=1.0),
            ProbabilisticTree('VP', ['walks']) (p=1.0)]) (p=0.222222222222),
    ProbabilisticTree('S', [ProbabilisticTree('NP', ['mary']) (p=1.0),
            ProbabilisticTree('VP', ['walks']) (p=1.0)]) (p=0.111111111111)]
    >>> d = GoodmanDOP([tree], parser=BitParChartParser)
        writing grammar
    >>> d.parser.parse("mary walks".split())
    ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP@2', ['walks'])]) (p=0.444444)
    >>> list(d.parser.nbest_parse("mary walks".split()))
    [ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP@2', ['walks'])]) 
    (p=0.444444),
    ProbabilisticTree('S', [Tree('NP', ['mary']), Tree('VP@2', ['walks'])])
    (p=0.222222),
    ProbabilisticTree('S', [Tree('NP@1', ['mary']), Tree('VP', ['walks'])])
    (p=0.222222), 
    ProbabilisticTree('S', [Tree('NP', ['mary']), Tree('VP', ['walks'])])
    (p=0.111111)]

    TODO: parse bitpar's chart output / parse forest

writegrammar(self, f, l)

source code 

write a grammar to files f and l in a format that bitpar understands. f will contain the grammar rules, l the lexicon with pos tags.