The prominent success of music streaming services has brought increasingly complex challenges for music recommendation. In particular, in a streaming setting, songs are consumed sequentially within a listening session, which should cater not only for the user's historical preferences, but also for eventual preference drifts, triggered by a sudden change in the user's context. In this paper, we ... [Show full abstract] propose a novel online learning to rank approach for music recommendation aimed to continuously learn from the user's listening feedback. In contrast to existing online learning approaches for music recommendation, we leverage implicit feedback as the only signal of the user's preference. Moreover, to adapt rapidly to preference drifts over millions of songs, we represent each song in a lower dimensional feature space and explore multiple directions in this space as duels of candidate recommendation models. Our thorough evaluation using listening sessions from Last.fm demonstrates the effectiveness of our approach at learning faster and better compared to state-of-the-art online learning approaches.