Opportunistic affect sensing offers unprecedented potential for capturing spontaneous affect, eliminating biases inherent in the controlled setting. Facial expression and voice are two major affective displays, however most affect sensing systems on smartphone avoid them due to extensive power requirements. Encouragingly, due to the recent advent of low-power DSP (Digital Signal Processing) co-processor and GPU (Graphics Processing Unit) technology, audio and video sensing are becoming more feasible on smartphone. To utilize opportunistically captured facial expression and voice, gathering contextual information about the dynamic audio-visual stimuli is also important. This paper discusses recent advances of affect sensing on the smartphone and identifies the key barriers and potential solutions for implementing opportunistic and context-aware affect sensing on smartphone platforms. In addition to exploring the technical challenges (privacy, battery life and robust algorithms), the challenges of recruiting and retention of mental health patients have also been considered; as experimentation with mental health patients is difficult but crucial to showcase the importance/effectiveness of the smartphone centred affect sensing technology