The AMI Meeting Corpus contains 100 h of meetings captured using many synchronized recording devices, and is designed to support
work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been
transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple
gaze and head
... [Show full abstract] movement. In this written version of an LREC conference keynote address, I describe the data and how it was
created. If this is “killer” data, that presupposes a platform that it will “sell”; in this case, that is the NITE XML Toolkit,
which allows a distributed set of users to create, store, browse, and search annotations for the same base data that are both
time-aligned against signal and related to each other structurally.