Theoretical background

Our theoretical work has it roots in a Theory of Visual Attention (TVA) first proposed by Bundesen (1990) and later developed in to a Neural Theory of Visual Attention (NTVA; Bundesen, Habekost, & Kyllingsbæk, 2005; see also Bundesen & Habekost, 2008). According to TVA visual perception of the world around us may be thought of as a race between multiple mental hypotheses of the objects in our visual field. The few winners of the race will be encoded into a limited visual short-term memory (VSTM) for consciousness and action. The storage capacity of VSTM is limited to K elements, which is one of the basic parameters in the model (K is typically around four elements).

The encoding process is conceived as a two-stage process: During the first stage an attentional weight is computed for every element in the display. This weight represents the strength of the sensory evidence that the element is relevant. The second stage of processing is the race between elements for encoding into VSTM. At this stage the total processing capacity of the visual system is assumed to be constant, and the value of this constant, C elements/s, is the second basic parameter of the model. The total processing capacity is distributed among the elements in direct proportion to their weights. The allocated processing capacity determines how fast an element can be encoded into VSTM: The encoding time follows an exponential distribution with a rate parameter that equals the allocated processing capacity. The elements actually selected are the ones whose encoding processes are finished before the stimulus presentation is terminated and before VSTM has been filled up (see Shibuya & Bundesen, 1988).

At a more detailed level, TVA describes both visual identification and selection of elements in the visual field as making perceptual categorizations. A perceptual categorization has the form: "element x belongs to category i", where x is an element in the visual field and i is a perceptual category (e.g., a color). The rate of a perceptual categorization is defined in the rate equation:

where the v value equals the strength of the sensory evidence that element x has feature i, η(x, i), multiplied by a perceptual bias related to category i, βi, multiplied by the relative weight of element x (i.e., the weight of element x, wx, divided by the sum of the weights of all other elements in the visual field S). Attentional weights are derived from pertinence values, which represent the importance of noticing elements belonging to certain categories. The weight of element x is given by the weight equation:


where R is the set of all perceptual categories, η(x, j) is the strength of the sensory evidence that element x belongs to category j, and πj is the pertinence value of category j. The elements that become encoded into VSTM are the ones that first finish processing with respect to some categorization.


Figure 1. A possible anatomical implementation of the NTVA model. In the first wave of processing, visual signals travel from the eyes through the lateral geniculate nucleus (LGN) of the thalamus to the visual cortex. Here, attentional weights are computed in accordance with TVA's weight equation (η * π) and stored in a map of the visual field that is located in the pulvinar (Pul) nucleus of the thalamus. The weights are used for distributing attentional capacity (i.e., cortical neurons), by way of signal gating, before the second wave of processing occurs. Now, the visual categorizations supported by the strongest firing activity (i.e, high firing rates in many neurons = large v values; in accordance with TVA's rate equation) tend to establish feedback loops with neurons in the thalamic reticular nucleus (TRN), which in turn reactivate cells in the LGN. This way the feedback loops sustain the cortical activity that represents the categorizations; this corresponds to VSTM encoding at the psychological level. 


Bundesen, Habekost, and Kyllingsbæk (2005) proposed a direct interpretation of TVA's rate and weight equations at the level of single neurons in the visual system (see Figure 1). According to this model, NTVA, the attentional weight of an object corresponds to the number of neurons that respond to its properties. This number can be varied by dynamic remapping of receptive fields (Moran & Desimone, 1985) such that signals from attended objects have a higher probability of being gated to a given neuron (and thus control its response). By a complementary mechanism, the perceptual bias for making certain categorizations corresponds to up- or downscaling the activity in neurons specialized for making the categorization. We used this simple model to account in detail for sixteen central studies in the single-cell literature, covering the major findings in the field. Thus, NTVA provides mathematical framework that unifies psychology and neurobiology. Recently, we have used TVA based assessment procedures for studies of individual differences in the attentional function brain damaged patients compared to healthy subjects (e.g., Bublak et al., 2005; Dubois et al., 2010; Duncan et al., 1999, 2003; Finke et al., 2005; Kyllingsbæk, 2006).