Multimodal human-computer interaction providing users with flexible «natural» means of expression, such as speech and gestures, has until now motivated few ergonomic studies. The need for reliable empirical data on users' preferences, behaviours and strategies (with respect to this new form of multimodality) is crucial for the definition of adequate user models that could significantly contribute to the efficiency and usability of future multimodal user interfaces. We describe a prospective ergonomic study which aims to provide empirical data on how users will use unconstrained speech and gestures for interacting with standard computer applications. Despite recent scientific advances in natural language interpretation, continuous speech recognition and gesture analysis, operational interaction systems capable of interpreting complex multimodal utterances accurately are not yet available. Therefore, in order to study the forms of multimodal expression that users will develop spontaneously while interacting with software systems capable of «understanding» speech and 2-dimensional gestures (on the screen), we have designed a realistic simulation of such an interaction system, using the Wizard of Oz experimental paradigm. Eight subjects performed various design tasks, using this simulated user interface during three weekly sessions. Analyses of the audio-visual protocols suggest great inter-individual differences, as regards subjects' styles of expression and their strategies for combining speech with gestures. Styles of expression can be divided into two broad classes according to the nature of the relationships between the subjects and the system: communication with a machine interlocutor or manipulation of a graphical representation of the application. After a short initial adaptation phase, which is confined to the first scenario or to the first session, the styles of most subjects do not evolve. We observed a few instances of later evolution which point to inter-individual differences relating to the capability of transferring competences acquired in the context of human communication to a human-computer interaction environment.