Taking the action, rather than the utterance or the text, as the unit of analysis, this article isolates different modes, investigating the interdependent relationships, illustrating that the visual mode of gestures can take up a hierarchically equal or a super-ordinate position in addition to the commonly understood sub-ordinate position in relation to the mode of spoken language. Building on McNeill, Birdwhistell, Eco, and Ekman and Friesen, and using a multimodal interaction analytical approach (Norris), I analyse in detail three separate everyday (inter)actions in which a deictic gesture is being performed and spoken language is used by the social actor performing the gesture. With these examples, I build on previous work in multimodal analysis of texts and multimodal interaction analysis, illustrating that the verbal is not necessarily more important than the visual (Kress and Van Leeuwen; Norris; Scollon), demonstrating that verbal and visual modes can be utilized together to (co)produce one message (Van Leeuwen), and showing that a mode utilized by a social actor producing a higher-level discourse structure hierarchically supersedes other modes in interaction (Norris).