Evoked potentials (EPs) were used to help identify the timing, location, and intensity of the information-processing stages applied to faces and words in humans. EP generators were localized using intracranial recordings in 33 patients with depth electrodes implanted in order to direct surgical treatment of drug-resistant epilepsy. While awaiting spontaneous seizure onset, the patients gave their fully informed consent to perform cognitive tasks. Depth recordings were obtained from 1198 sites in the occipital, temporal and parietal cortices, and in the limbic system (amygdala, hippocampal formation and posterior cingulate gyrus). Twenty-three patients received a declarative memory recognition task in which faces of previously unfamiliar young adults without verbalizable distinguishing features were exposed for 300 ms every 3 s; 25 patients received an analogous task using words. For component identification, some patients also received simple auditory (21 patients) or visual (12 patients) discrimination tasks. Eight successive EP stages preceding the behavioral response (at about 600 ms) could be distinguished by latency, and each of 14 anatomical structures was found to participate in 2–8 of these stages. The earliest response, an N75-P105, focal in the most medial and posterior of the leads implanted in the occipital lobe (lingual g), was probably generated in visual cortical areas 17 and 18. These components were not visible in response to words, presumably because words were presented foveally. A focal evoked alpha rhythm to both words and faces was also noted in the lingual g. This was followed by an N130-P180-N240 focal and polarity-inverting in the basal occipitotemporal cortex (fusiform g, probably areas 19 and 37). In most cases, the P180 was evoked only by faces, and not by words, letters or symbols. Although largest in the fusiform g this sequence of potentials (especially the N240) was also observed in the supramarginal g, posterior superior and middle temporal g, posterior cingulate g, and posterior hippocampal formation. The N130, but not later components of this complex, was observed in the anterior hippocampus and amygdala. Faces only also evoked longer-latency potentials up to 600 ms in the right fusiform g. Words only evoked a series of potentials beginning at 190 ms and extending to 600 ms in the fusiform g and near the angular g (especially left). Both words and faces evoked a N150-P200-PN260 in the lingual g, and posterior inferior and middle temporal g. A N310-N430-P630 sequence to words and faces was largest and polarity-inverted in the hippocampal formation and amygdala, but was also probably locally-generated in many sites including the lingual g, lateral occipitotemporal cortex, middle and superior temporal g, temporal pole, supramarginal g, and posterior cingulate g. The P660 had the same distribution as has been noted for the P3b to rare target simple auditory and visual stimuli in ‘oddball’ tasks, with inversions in the hippocampus. In several sites, the N310 and N430 were smaller to repeated faces, and the P630 was larger. Putative information-processing functions were tentatively assigned to successive EP components based upon their cognitive correlates, as well as the functions and connections of their generating structures. For the N75-P105, this putative function is simple feature detection in primary visual cortex (V1 and V2). The N130-P180-N240 may embody structural face encoding in posterobasal inferotemporal cortex (homologous to V4?), with the results being spread widely to inferotemporal, multimodal and paralimbic cortices. For words, similar visual-form encoding (in fusiform g) or visual-phonemic encoding (in angular g) may occur between 150 and 280 ms. During the N310, faces and words may be multiply encoded for form and identity (inferotemporal), emotional (amygdala), recent declarative mnestic (hippocampal formation), and semantic (supramarginal and superior temporal sulcal supramodal cortices) characteristics. These multiple characteristics may be contextually integrated across inferotemporal, supramodal association, and limbic cortices during the N430, with cognitive closure following in the P630. In sum, visual information arrives at area 17 by about 75 ms, and is structurally-encoded in occipito-temporal cortex during the next 110 ms. By 150–200 ms after stimulus onset, activation has spread to parietal, lateral temporal, and limbic cortices, all of which continue to participate with the more posterior areas for the next 500 ms of event-encoding. Thus, face and word processing is serial in the sense that it can be divided into successive temporal stages, but highly parallel in that (after the initial stages where visual primitives are extracted) multiple anatomical areas with distinct perceptual, mnestic and emotional functions are engaged simultaneously. Consequently, declarative memory and emotional encoding can participate in early stages of perceptual, as well as later stages of cognitive integration. Conversely, occipitotemporal cortex is involved both early in processing (immediately after V1), as well as later, in the N430. That is, most stages of face and word processing appear to take advantage of the rich ‘upstream’ and ‘downstream’ anatomical connections in the ventral visual processing stream to link the more strictly perceptual networks with semantic, emotional, and mnestic networks.