We use representations and expectations formed during life-long learning to support attentional allocation and perception. In comparison to traditional laboratory investigations, real-world memory formation is usually achieved without explicit instruction and on-the-fly as a by-product of natural interactions with our environment. Understanding this process and the quality of naturally formed representations is critical to understanding how memory is used to guide attention and perception. Utilizing immersive, navigable, and realistic virtual environments, we investigated incidentally generated memory representations by comparing them to memories for items which were explicitly memorized. Participants either searched for objects embedded in realistic indoor environments or explicitly memorized them for follow-up identity and location memory tests. We show for the first time that memory for the identity of naturalistic objects and their location in 3D space is higher after incidental encoding compared to explicit memorization, even though the subsequent memory tests came as a surprise to participants. Relating gaze behavior to memory performance revealed that encoding time was more predictive of subsequent memory when participants explicitly memorized an item, compared to incidentally encoding it. Our results suggest that the active nature of guiding attentional allocation during proactive behavior allows for behaviorally optimal formation and utilization of representations. This highlights the importance of investigating cognition under ecologically valid conditions and shows that understanding the most natural processes for encoding and maintaining information is critical for understanding adaptive behavior.