This article investigates the interactional relevance of weak cesuras in multimodal transitions in enactments. Previous research has pointed out that enactments are multimodally accomplished phenomena in that they do not only consist of a quotation but usually involve changes in prosody and bodily conduct, too. Furthermore, it has been noted that an upcoming quotation may be projected in the preceding talk by phonetic cues. There is, however, little research on the precise multimodal realization of such transitions and their possible interactional relevance. Taking this as a starting point, we analyze a collection of co-enactments. Firstly, we show that quotations are projected not only by phonetic but also bodily cues, which often build up gradually in the preceding talk. These smooth transitions into enactment are analyzed as “cesural areas.” Secondly, we argue that such cesural areas and the cumulation of multimodal projections open up an opportunity space in the sense of Lerner (1991), whereby a joint enactment involving co-participants, i.e., a co-enactment, is possible. Thirdly, we show that participants jointly develop the meaning of the enactment in this space, mutually taking up and elaborating on their prior contributions. The data is taken from a corpus of collaborative storytellings in German.