# Using the text corpus to create a comprehensive list of phrasal verbs

01/2002;

ABSTRACT

The paper describes extraction of Estonian multi-word verbs from text corpora, using a language-and task-specific software tool SENVA, which is based on a statistical language-independent software tool SENTA (Dias et al, 2000). The outcome is a comprehensive list of 16,000 phrasal verbs. We describe the extraction tool, manual post-editing principles, and evaluate the outcome in terms of precision and recall, comparing the results with man-made electronic dictionaries, and with the results of a manual extraction experiment of a sub-set of the MWV-s. . 1 We use the term phrasal verb here to denote what is multi-word lexical verb in English grammars; we use the latter term in the rest of the paper for clarity.

• "In (Kaalep, Muischnek 2002) we reported about an experiment, involving the creation of a database of Estonian MWV-s, based on both human-oriented dictionaries and various text corpora. The current paper has a closer look at one subtask of the experiment – finding new MWV-s in a corpus. "
