We address self-perception and object discovery by integrating multimodal tactile, proprioceptive and visual cues. Considering sensory signals as the only way to obtain relevant information about the environment, we enable a humanoid robot to infer potential usable objects relating visual self-detection with tactile cues. Hierarchical Bayesian models are combined with signal processing and protoobject artificial attention to tackle the problem. Results show that the robot is able to: (1) discern between inbody and outbody sources without using markers or simplified segmentation; (2) accurately discover objects in the reaching space; and (3) discriminate real objects from visual artefacts, aiding scene understanding. Furthermore, this approach reveals the need for several layers of abstraction for achieving agency and causality due to the inherent ambiguity of the sensory cues.