January 2004
10 Reads
This paper presents an approach to nor-malize documents in constrained domains. This approach reuses resources developed for controlled document authoring and is decomposed into three phases. First, can-didate content representations for an input document are automatically built. Then, the content representation that best corres-ponds to the document according to an ex-pert of the class of documents is identified. This content representation is finally used to generate the normalized version of the docu-ment. The current version of our prototype system is presented, and its limitations are discussed.