Trees | Indices | Help |
---|
|
treebank conversion script; expects no arguments, uses stdin & stdout. input is VISL horizontal tree format see: http://beta.visl.sdu.dk/treebanks.html#The_source_format output: s-expression, ie., tree in bracket notation. TODO: turn this into a nltk.Corpus reader
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
example =
|
|||
example2 =
|
|||
example3 =
|
|||
relinelev = re.compile(r'
|
|||
reclean = re.compile(r'\s
|
|||
__package__ = None
|
|
parse a horizontal tree into an s-expression (ie., WSJ format). Defaults to stripping morphology information. Parentheses in the input are converted to braces. >>> print example X:np =H:n("konsekvenco" <*> <ac> P NOM) Konsekvencoj =DN:pp ==H:prp("de") de ==DP:np ===DN:adj("ekonomia" <Deco> P NOM) ekonomiaj ===H:n("transformo" P NOM) transformoj >>> parse(example.splitlines()) '(X:np (H:n Konsekvencoj) (DN:pp (H:prp de) (DP:np (DN:adj ekonomiaj) (H:n transformoj))))' >>> print example2 STA:fcl =S:np ==DN:pron-dem("tia" <*> <Dem> <Du> <dem> DET P NOM) Tiaj ==H:n("akuzo" <act> <sd> P NOM) akuzoj =fA:adv("certe") certe =P:v-fin("dauxri" <va+TEMP> <mv> FUT VFIN) dauxros >>> parse(example2.splitlines()) '(STA:fcl (S:np (DN:pron-dem Tiaj) (H:n akuzoj)) (fA:adv certe) (P:v-fin dauxros))' >>> parse(example3.splitlines()) '(STA:par (CJT:fcl (fA:adv Krome) (,) (S:np (DN:art la) (H:n savo) (DN:pp (H:prp de) (DP:np (H:n konkuranto)))) (P:v-fin helpos (((DN:prop Microsoft))))) CJT:icl (P:v-pcp2 refuti) (Od:np (H:n akuzojn) (DN:pp (H:prp pri) (DP:n monopolismo))))' |
following code contributed by Alex Martelli at StackOverflow: http://stackoverflow.com/questions/2815020/converting-a-treebank-of-vertical-trees-to-s-expressions parse a horizontal tree into an s-expression (ie., WSJ format). Defaults to stripping morphology information. Parentheses in the input are converted to braces. >>> reparse(example.splitlines()) '(X:np (H:n Konsekvencoj) (DN:pp (H:prp de) (DP:np (DN:adj ekonomiaj) (H:n transformoj))))' >>> reparse(example2.splitlines()) '(STA:fcl (S:np (DN:pron-dem Tiaj) (H:n akuzoj)) (fA:adv certe) (P:v-fin dauxros))' >>> reparse(example3.splitlines()) '(STA:par (CJT:fcl (fA:adv Krome) (,) (S:np (DN:art la) (H:n savo) (DN:pp (H:prp de) (DP:np (H:n konkuranto)))) (P:v-fin helpos (DN:prop Microsoft))) (CJT:icl (P:v-pcp2 refuti) (Od:np (H:n akuzojn) (DN:pp (H:prp pri) (DP:n monopolismo)))))' |
take a treebank from stdin in horizontal tree format, and output it in s-expression format (ie., bracket notation, WSJ format). Checks whether original sentence and leaves of the tree match, and discards the tree if they don't. Also removes trees marked problematic with the tag "CAVE" in the comments. Example input: <s_id=812> SOURCE: id=812 ID=812 Necesus adapti la metodon por iuj alilandaj klavaroj. A1 STA:fcl =P:v-fin("necesi" <*> <mv> COND VFIN) Necesus =S:icl ==P:v-inf("adapti" <mv>) adapti ==Od:np ===DN:art("la") la ===H:n("metodo" <ac> S ACC) metodon ===DN:pp ====H:prp("por" <aquant>) por ====DP:np =====DN:pron("iu" <quant> DET P NOM) iuj =====DN:adj("alilanda" P NOM) alilandaj =====H:n("klavaro" <cc-h> <tool-mus> P NOM) klavaroj . </s> |
|
example
|
example2
|
example3
|
Trees | Indices | Help |
---|
Generated by Epydoc 3.0.1 on Wed Jun 16 13:03:58 2010 | http://epydoc.sourceforge.net |