Trees | Indices | Help |
---|
|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
initialize a DOP model given a treebank. uses the Goodman reduction of a STSG to a PCFG. after initialization, self.parser will contain an InsideChartParser. >>> tree = Tree("(S (NP mary) (VP walks))") >>> d = GoodmanDOP([tree]) >>> print d.grammar Grammar with 12 productions (start state = S) NP -> 'mary' [1.0] NP@1 -> 'mary' [1.0] S -> NP VP [0.25] S -> NP VP@2 [0.25] S -> NP@1 VP [0.25] S -> NP@1 VP@2 [0.25] S@0 -> NP VP [0.25] S@0 -> NP VP@2 [0.25] S@0 -> NP@1 VP [0.25] S@0 -> NP@1 VP@2 [0.25] VP -> 'walks' [1.0] VP@2 -> 'walks' [1.0] >>> print d.parser.parse("mary walks".split()) (S (NP mary) (VP@2 walks)) (p=0.25) @param treebank: a list of Tree objects. Caveat lector: terminals may not have (non-terminals as) siblings. @param wrap: boolean specifying whether to add the start symbol to each tree @param parser: a class which will be instantiated with the DOP model as its grammar. Support BitParChartParser. instance variables: - self.grammar a WeightedGrammar containing the PCFG reduction - self.fcfg a list of strings containing the PCFG reduction with frequencies instead of probabilities - self.parser an InsideChartParser object - self.exemplars dictionary of known parse trees (memoization) |
add unique identifiers to each non-terminal of a tree. >>> tree = Tree("(S (NP mary) (VP walks))") >>> d = GoodmanDOP([tree]) >>> d.decorate_with_ids(tree, count()) Tree('S@0', [Tree('NP@1', ['mary']), Tree('VP@2', ['walks'])])
|
count frequencies of nodes by calculating the number of subtrees headed by each node. updates "nonterminalfd" as a side effect >>> fd = FreqDist() >>> tree = Tree("(S (NP mary) (VP walks))") >>> d = GoodmanDOP([tree]) >>> d.nodefreq(tree, fd) 4 >>> fd.items() [('S', 4), ('NP', 1), ('VP', 1)] #[('S', 9), ('NP', 2), ('VP', 2), ('mary', 1), ('walks', 1)]
|
given a parsetree from a treebank, yield a goodman reduction of eight rules per node (in the case of a binary tree). >>> tree = Tree("(S (NP mary) (VP walks))") >>> d = GoodmanDOP([tree]) >>> utree = d.decorate_with_ids(tree, count()) >>> sorted(d.goodman(tree, utree, False)) [(NP, ('mary',)), (NP@1, ('mary',)), (S, (NP, VP)), (S, (NP, VP@2)), (S, (NP@1, VP)), (S, (NP@1, VP@2)), (S@0, (NP, VP)), (S@0, (NP, VP@2)), (S@0, (NP@1, VP)), (S@0, (NP@1, VP@2)), (VP, ('walks',)), (VP@2, ('walks',))] |
merge cfg and frequency distribution into a pcfg with the right probabilities.
|
merge cfg and frequency distribution into a list of weighted productions with frequencies as weights (as expected by bitpar).
|
memoize parse trees. TODO: maybe add option to add every parse tree to the set of exemplars, ie., incremental learning. this uses the most probable derivation (not very good). |
warning: this problem is NP-complete. using an unsorted chart parser avoids unnecessary sorting (since we need all derivations anyway).
|
not working yet. almost verbatim translation of Goodman's (1996) most constituents correct parsing algorithm, except for python's zero-based indexing. needs to be modified to return the actual parse tree. expects a pcfg in the form of a dictionary from productions to probabilities |
Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Wed Jun 16 13:03:58 2010 | http://epydoc.sourceforge.net |