0440949 Andreas van Cranenburgh

1)

(S (NP Mary) (VP (V hates) (VP (V visiting) (NP ))))
 o (NP bees)

(S (NP Mary) (VP (V hates) (VP (V visiting) (NP ))))
 o (NP (V buzzing) (NP bees))

2)

(S  (NP ) (VP )) 
 o (NP Mary) 
 o (VP (V ) (VP ))
 o (V hates)
 o (VP (V ) (NP ))
 o (V visiting)
 o (NP bees)

(S  (NP ) (VP )) 
 o (NP Mary) 
 o (VP (V ) (VP ))
 o (V hates)
 o (VP (V ) (NP ))
 o (V visiting)
 o (NP V NP)
 o (V buzzing)
 o (NP bees)

These shortest derivations are not unique because the order can be changed,
eg. the rightmost derivation instead of the leftmost derivation. Wherever there
is more than one empty non-terminal, multiple shortest derivations obtain.


3)

max depth: 4
(S (NP Mary) (VP (V hates) (VP (V visiting) (NP ))))
 o (NP bees)

(S (NP Mary) (VP (V hates) (VP (V visiting) (NP ))))
 o (NP (V buzzing) (NP bees))

max depth: 3
(S (NP Mary) (VP (V hates) (VP ))) 
 o (VP (V visiting) (NP )) 
 o (NP bees)

(S (NP Mary) (VP (V hates) (VP ))) 
 o (VP (V visiting) (NP )) 
 o (NP (V buzzing) (NP bees))

max depth: 2
(S (NP Mary) (VP ))
 o (VP (V hates) (VP ))
 o (VP (V visiting) (NP ))
 o (NP bees)

(S (NP Mary) (VP ))
 o (VP (V hates) (VP ))
 o (VP (V visiting) (NP ))
 o (NP (V buzzing) (NP bees))

4)
John hates buzzing relatives
(S (NP John) (VP (V hates) (NP (V buzzing) (NP relatives))))

This sentence is constructed by combining the odd words from the first and
the even words from the second sentence.

(S (NP John) (VP ))
 o (VP (V ) (NP ))
 o (V hates)
 o (NP (V buzzing) (NP ))
 o (NP relatives)

The constituent (V hates) can be added at the third, fourth or fifth step.


5) The shortest derivation will have a bias for using the largest chunks
available, while cognitively it is more plausible that the most frequently
occurring chunks are remembered well. The larger chunks will be more
numerous and sparse, and consequently it will be more difficult to retain
them all and more difficult to obtain generalizations from them. 
Also, the shorted derivation criterion might not be strong enough
to resolve ambiguities, in case multiple derivations with different trees
share the same length; in that case you still need frequencies.

Perhaps it is useful to combine the derivation length with the subtree
frequencies, eg. by multiplying the probability of a derivation by 1/N, where
N is the number of chunks used.

I would expect that there are "basic level subtrees" analogous to basic level
categories: subtrees which are not too specific nor too general, which
are used preferrably in derivations, because they maximize re-usability.