Syntax: 1) DOPparser `grammarFromTreebank' treebankfile testset_unlabeled testset_goldstandard [ovis_style_brackets] [postag] [unlabeled_p_r] [export_grammar_file] [branching=bf] [beam=bm] [print_parses] 2) parse `grammarFromFile' grammarfile testset_unlabeled testset_goldstandard [postag] [unlabeled_p_r] [export_grammar_file] [branching=bf] [beam=bm] [print_parses] Many additional options are available, but those that are not listed here must be set in the source code. Note, that the algorithm convert unlabeled text to lower case. Thus, John and john are undistinguishable. grammarFromTreebank: tells the program to read off the (non-)terminals, the rules and the probabilities from a treebank file treebankfile: a file containing labeled sentences in either WSJ-style or OVIS-style grammarFromFile: tells the program to read in the rules and probabilities from a user specified grammar file grammarfile: a file containing the terminals, non-terminals, and production rules. the following format is assumed (this format is automatically exported by the BMM algorithm): TERMINALS follows a list of all the terminals NONTERMINALS follows a list of all the non-terminals PRODUCTION RULES follows a list of rules, organized per left hand side non-terminal, where the sublist for each non-terminal is headed by a title in the format RULESOFNONTERMINAL NP below, the corresponding right hand sides of the rules are listed in the format: rhs(1)*rhs(2)*...rhs(k)*#rule_probability, for example: _PRP*_VBD*#0.1329 testset_unlabeled (obligatory): full path to a file containing unlabeled sentences. Every sentence should end with a space and a . testset_goldstandard (obligatory): full path to a file containing the gold standard parses corresponding to the unlabeled sentences (for the evaluation). Options may be added in any order, and are not case sensitive: ovis_style_brackets: set to true if the bracket style of the treebankfile is OVIS. default=false postag: indicates that sentences in the input file are postag sequences. default = false. dop_parser: indicates the use of the (Goodman reduction of the) DOP parser. default=false. unlabeled_p_r: indicates that evaluation should be done on unlabeled precision and recall, rather than labeled. default = false. export_grammar_file: if set to true, the grammar that was extracted from the treebank is exported as `grammar.txt' to the 'Output' directory in a format that can be read by the parser in subsequent runs (saves time). branching=bf: sets the branching factor. When reading out the parse from the chart, sometimes multiple options for expanding the derivation are available: the branching factor sets the maximum nr of branches that are followed resulting in multiple parses. default=2. beam=bm: sets the beam width, which is the maximum number of derivation that is considered when reading out the parse from the chart. parse probability is the average over the derivation probabilities of the beam. default = 1000. A typical command line would look like: parse `grammarFromTreebank' ./Input/WSJ10_treebank.txt ./Input/WSJ10_section22_unlabeled.txt ./Input/WSJ10_section22_labeled.txt dop_parser postag branching=3 beam=100 A directory "Output" should be created under the current directory. By default, the standard output consisting of Latex versions of the Gold Standard parse and the computed parse, and PARSEVAL measures is written to the file output_complete.txt in the "Output" directory. When the option print_parses is selected a file parses.txt is created, which contains a list of the computed parses in WSJ-format. If the option export_grammar_file is selected, the file grammar.txt is created, which can be imported if one uses the same grammar in subsequent runs of the parser, avoiding the need to read off the grammar from the treebank.