0440949 Andreas van Cranenburgh 1) (S (NP Mary) (VP (V hates) (VP (V visiting) (NP )))) o (NP bees) (S (NP Mary) (VP (V hates) (VP (V visiting) (NP )))) o (NP (V buzzing) (NP bees)) 2) (S (NP ) (VP )) o (NP Mary) o (VP (V ) (VP )) o (V hates) o (VP (V ) (NP )) o (V visiting) o (NP bees) (S (NP ) (VP )) o (NP Mary) o (VP (V ) (VP )) o (V hates) o (VP (V ) (NP )) o (V visiting) o (NP V NP) o (V buzzing) o (NP bees) These shortest derivations are not unique because the order can be changed, eg. the rightmost derivation instead of the leftmost derivation. Wherever there is more than one empty non-terminal, multiple shortest derivations obtain. 3) max depth: 4 (S (NP Mary) (VP (V hates) (VP (V visiting) (NP )))) o (NP bees) (S (NP Mary) (VP (V hates) (VP (V visiting) (NP )))) o (NP (V buzzing) (NP bees)) max depth: 3 (S (NP Mary) (VP (V hates) (VP ))) o (VP (V visiting) (NP )) o (NP bees) (S (NP Mary) (VP (V hates) (VP ))) o (VP (V visiting) (NP )) o (NP (V buzzing) (NP bees)) max depth: 2 (S (NP Mary) (VP )) o (VP (V hates) (VP )) o (VP (V visiting) (NP )) o (NP bees) (S (NP Mary) (VP )) o (VP (V hates) (VP )) o (VP (V visiting) (NP )) o (NP (V buzzing) (NP bees)) 4) John hates buzzing relatives (S (NP John) (VP (V hates) (NP (V buzzing) (NP relatives)))) This sentence is constructed by combining the odd words from the first and the even words from the second sentence. (S (NP John) (VP )) o (VP (V ) (NP )) o (V hates) o (NP (V buzzing) (NP )) o (NP relatives) The constituent (V hates) can be added at the third, fourth or fifth step. 5) The shortest derivation will have a bias for using the largest chunks available, while cognitively it is more plausible that the most frequently occurring chunks are remembered well. The larger chunks will be more numerous and sparse, and consequently it will be more difficult to retain them all and more difficult to obtain generalizations from them. Also, the shorted derivation criterion might not be strong enough to resolve ambiguities, in case multiple derivations with different trees share the same length; in that case you still need frequencies. Perhaps it is useful to combine the derivation length with the subtree frequencies, eg. by multiplying the probability of a derivation by 1/N, where N is the number of chunks used. I would expect that there are "basic level subtrees" analogous to basic level categories: subtrees which are not too specific nor too general, which are used preferrably in derivations, because they maximize re-usability.