as I see it, your question actually combines two different issues, which are hierarchically interrelated:
(a) Is there a modality effect observable when assessing working memory (WM), and
(b) is the n-Back task a valid representation of the WM construct.
In order to come to grips with these issues it might be helpful to contemplate a little about general concept of WM, as it is still debated in the respective literature.
1. Hypothesized constructs, like WM, typically are defined by a set of “mechanisms” or “cognitive processes”. These processes are understood to be represented by individuals’ behavior on specific experimental paradigms resembled by sets of similar tasks. Since no task (or even paradigm) is “process-pure”, characterizing and defining one such construct like WM eventually requires
(a) use of different tasks as well as paradigms, and
(b) to check whether/to what extent individual performance differences on different tasks assumed to index the same construct correlate with each other.
2. Accordingly, a number of researchers (though not all of them) seem to have agreed on the following paradigms contributing to the WM-construct: (a) complex span tasks (always applied in a dual-task context),
(b) memory updating,
(c) sorting span tasks,
(d) n-Back tasks (cf. Schmiedeck et al., 2014). These paradigms all have in common that they require simultaneous storage and processing. This is what WM commonly is defined as.
They differ, however, with respect to
(a) applicability of different strategies on the subjects’ side,
(b) the degree to which familiarity information might be used,
(c) the degree to which shifting the focus of attention is required, and
(d) the involvement of retrieval processes from long-term memory. This all adds up to considerable variance in the respective data.
Note, that some authors also include the complex span, the updating, and the Recall-n-Back paradigms, but substitute sorting span by binding tasks (cf. Wilhelm et al., 2013). To these authors (for instance, Oberauer’s group) working memory in the first place is a system for building, maintaining and rapidly updating (i.e. storage and processing again) arbitrary bindings (e.g., among list positions, locations in space, propositional schemata). These authors argue that the capability for rapid formation of temporary bindings enables the system to construct and maintain new structures, such as random lists, spatial arrays, or mental models. Working memory thus is thought to be important for reasoning because reasoning requires the construction and manipulation of representations of novel structures.
3. Moreover, even within each paradigm, further sources of variance are introduced by varying task contents (e.g., numerical n-Back as compared to visual-spatial n-Back). This depends on
(a) different expertise with essential basic skills (e.g., mental calculation), (b) differential knowledge (e.g., placing objects on a specific dimension, for instance “size”),
(c) applicability of specific strategies (e.g., “visualization”-techniques). Also, (d) different tasks/paradigms might not work equally well for different groups.
4. From all this follows: observed correlations between two single tasks (like, e.g., numerical n-Back and complex span, or numerical n-Back and visuo-spatial n-Back) need not to be high, and still they both might be valid indicators of WM.
Therefore, it is necessary to disentangle the influence of using
(a) different paradigms, and
(b) different task-contents within paradigms, and the size of correlations between tasks.
Some research groups have tried to do this, advocating structural equation modeling and latent variable analysis. According to Schmiedeck et al. (2014), in
(a) younger adults the WM-factor is more strongly defined by complex span and memory updating than by n-Back and sorting span, while in
(b) older adults WM is measured equally well with all four paradigms mentioned above. Also,
(c) typically across all paradigms, verbal/numerical and visual/spatial task contents are more or less dissociated, possibly resulting in modality effects (see also Wilhelm et al., 2013).
5. To conclude:
(a) Latent correlation of n-Back, memory updating, and complex span tasks of WM turn out to be high (Schmiedeck et al., 2014; cf. also Wilhelm et al., 2013).
(b) Latent factors of both complex span task and n-Back load highly on a general WM-factor, which also comprises factor of memory updating and the sorting span paradigms.
Thus, across all age groups, all of these paradigms appear to be good operational definitions of WM, however, not to the same degree:
(i) While in younger adults, complex span and memory updating are near to perfect indicators of the general WM-factor and n-Back and sorting span have considerably lower loadings,
(ii) for older adultsno significant differences between standardized factor loadings have been found.
(c) While – for some good practical/theoretical reasons – in certain experimental settings preference might be given to chose only one specific paradigm or even task to operationalize WM-functions, researchers should be cautioned against equaling a certain paradigm/task with the construct it is supposed to measure.
(d) Also, last but not least, there are “modality”-effects indeed, considering different loads of verbal/numerical factors as compared to spatial/figurative factors across all paradigms used in the aforementioned latent factor analyses of WM.
Interestingly, as far as I see it, these differences seem to be larger with respect to the respective complex span tasks than to the respective n-Back tasks.
Two recommended references:
Wilhelm, O., Hildebrandt, A., Oberauer, K. (2013). What is working memory capacity, and how can we measure it? Frontiers in Psychology, July 2013, Vol. 4, Article 433. doi: 10.3389/fpsyg.2013.00433
Schmiedeck, F., Lövden, M., Lindenberger, U. (2014). A task is a task is a task: putting complex span, n-back, and other working memory indicators in psychometric context. Frontiers in Psychology, Dec. 2014, Vol. 5, Article 1475. doi: 10.3389/fpsyg.2014.01475
In four experiments using a variation of the Hebb repetition task, we investigated the effects on learning rate, of repetition spacing and of the overlap in experimental items between repeating and nonrepeating lists. In the first two experiments it was shown that when repeating and nonrepeating lists were all permutations of the same items, learni...