="http://www.w3.org/2000/svg" viewBox="0 0 512 512">

5.3 Vision


Learning Objectives

By the end of this section, you will be able to:

  • Describe the basic anatomy of the visual system
  • Discuss how rods and cones contribute to different aspects of vision
  • Describe how monocular and binocular cues are used in the perception of depth


   The visual system constructs a mental representation of the world around us (figure below). This contributes to our ability to successfully navigate through physical space and interact with important individuals and objects in our environments. This section will provide an overview of the basic anatomy and function of the visual system. In addition, we will explore our ability to perceive color and depth.


Several photographs of peoples’ eyes are shown.

Our eyes take in sensory information that helps us understand the world around us. (credit “top left”: modification of work by “rajkumar1220″/Flickr”; credit “top right”: modification of work by Thomas Leuthard; credit “middle left”: modification of work by Demietrich Baker; credit “middle right”: modification of work by “kaybee07″/Flickr; credit “bottom left”: modification of work by “Isengardt”/Flickr; credit “bottom right”: modification of work by Willem Heerbaart)


   The eye is the major sensory organ involved in vision (figure below). Light waves are transmitted across the cornea and enter the eye through the pupil. The cornea is the transparent covering over the eye. It serves as a barrier between the inner eye and the outside world, and it is involved in focusing light waves that enter the eye. The pupil is the small opening in the eye through which light passes, and the size of the pupil can change as a function of light levels as well as emotional arousal. When light levels are low, the pupil will become dilated, or expanded, to allow more light to enter the eye. When light levels are high, the pupil will constrict, or become smaller, to reduce the amount of light that enters the eye. The pupil’s size is controlled by muscles that are connected to the iris controlled by cranial nerves II and III, and is the colored portion of the eye.


Different parts of the eye are labeled in this illustration. The cornea, pupil, iris, and lens are situated toward the front of the eye, and at the back are the optic nerve, fovea, and retina.

The anatomy of the human eye.


   After passing through the pupil, light crosses the lens, a curved, transparent structure that serves to provide additional focus. The lens is attached to muscles that can change its shape to aid in focusing light that is reflected from near or far objects. In a normal-sighted individual, the lens will focus images perfectly on a small indentation in the back of the eye known as the fovea, which is part of the retina, the light-sensitive lining of the eye. The fovea contains densely packed specialized photoreceptor cells (figure below). These photoreceptor cells, known as cones, are light-detecting cells. Cones are much less sensitive to changes in light compared to rods and make no contribution to night vision. Cones are much faster at detecting changes in the day time environment compared to rods, are very sensitive to acute detail and provide tremendous spatial resolution. They also are directly involved in our ability to perceive color. The retinal center of gaze is specialized for daytime vision as this area of the retina used for visual acuity, the fovea, is densely packed with cone cells. At night or during times of darkness, this center of cones provides little to no help in navigating the environment, which is what ancient astronomers documented by noticing that in order to see faint distant stars they have to look to areas around the star instead of directly at it. Additionally while walking through the forest on a dark night it would be more beneficial to look to one side or the other of an unfamiliar sound in order to visually identify the potential threat as an aggressive predator, an inquisitive Sasquatch, or a friendly fellow adventurer.

While cones are concentrated in the fovea, where images tend to be focused, rods, another type of photoreceptor, are located throughout the remainder of the retina. Rods are specialized photoreceptors that work well in low light conditions, and while they lack the spatial resolution and color function of the cones, they are involved in our vision in dimly lit environments as well as in our perception of movement on the periphery of our visual field. The astronomers discussed above were using their rods by averting the focus of their gaze to an area outside the focus of a distant star or potential threat in terms of the hikers in the nighttime wilderness. Rods as their names suggest are long cylindrical cells made up of stacks of disks which are highly sensitive to light. As the light level in a room or the environment increases, the electrical  response of the rods becomes saturated and the cells cease to response to variations in light intensity. Primates and humans have only one type of rod, but three types of cones (as discussed earlier in terms of the trichromatic perception of color).


This illustration shows light reaching the optic nerve, beneath which are Ganglion cells, and then rods and cones.

The two types of photoreceptors are shown in this image. Cones are colored green and rods are blue.


   We have all experienced the different sensitivities of rods and cones when making the transition from a brightly lit environment to a dimly lit environment. Imagine going to see a blockbuster movie on a clear summer day. As you walk from the brightly lit lobby into the dark theater, you notice that you immediately have difficulty seeing much of anything. After a few minutes, you begin to adjust to the darkness and can see the interior of the theater. In the bright environment, your vision was dominated primarily by cone activity. As you move to the dark environment, rod activity dominates, but there is a delay in transitioning between the phases. If your rods do not transform light into nerve impulses as easily and efficiently as they should, you will have difficulty seeing in dim light, a condition known as night blindness.

Rods and cones are connected (via several interneurons) to retinal ganglion cells. Axons from the retinal ganglion cells converge and exit through the back of the eye to form the optic nerve. The optic nerve carries visual information from the retina to the brain. There is a point in the visual field called the blind spot (also known as the optic disc) where we are not able to perceive or take in any information due to a lack of receptor sites due to this being the location where the bundles of retinal ganglion cells exit the retina. We are not consciously aware of our blind spots for two reasons: First, each eye gets a slightly different view of the visual field; therefore, the blind spots do not overlap. Second, our visual system fills in the blind spot so that although we cannot respond to visual information that occurs in that portion of the visual field, we are also not aware that information is missing.

The optic nerve from each eye merges just below the brain at a point called the optic chiasm. As the figure below shows, the optic chiasm is an X-shaped structure that sits just below the cerebral cortex at the front of the brain. At the point of the optic chiasm, most of the information from the right visual field (which comes from both eyes) is sent to the left side of the brain, and most of the information from the left visual field is sent to the right side of the brain. It is important to notice in the figure below that the outside visual field maintains a ipsilateral (same side) path on its way to the mid-brain where it is processed in the lateral geniculate nucleus (LGN) and the pulvinar, and finally to the primary visual processing areas in the occipital lobe, whereas the inside (closer to the nose) visual field crosses the optic chiasm to the contralateral (opposite) side where its information is processed by the contralateral LGN and pulvinar before being sent to the contralateral primary visual processing area of the occipital lobe.


An illustration shows the location of the occipital lobe, optic chiasm, optic nerve, and the eyes in relation to their position in the brain and head.

This illustration shows the optic chiasm at the front of the brain and the pathways to the occipital lobe at the back of the brain, where visual sensations are processed into meaningful perceptions.


   Once inside the brain, visual information is sent via a number of structures to the occipital lobe at the back of the brain for processing. Visual information is processed in through two pathways which can generally be described as the “what” or ventral pathway, and the “where/how” or dorsal pathway. The ventral pathway is involved in object recognition and identification. Activity spreading through ventral pathway travels from the primary visual area in the back of the occipital cortex through the inferior temporal lobe and along the bottom of the temporal cortex toward the anterior inferotemporal cortex. As the activation travels from the primary visual cortex of the occipital lobes to later areas of the inferior temporal cortex, receptive fields of neurons increase in size, latency of activation, and complexity of information they are tuned to response to. The dorsal pathway plays an important role in the classification and conceptualization of elements in the visual world, and also is recruited while judging the significance or relevance of the information in focus (Mishkin, Ungerleider & Macko, 1983). Interruption of this pathhway has been shown to disrupt object discrimination, without  affecting perception of spatial relations between objects (Jeannerod & Jeannerod, 1997). The dorsal pathway by contrast progresses along the central superior (top) portions of the occipital lobe moving through the posterior parietal cortex and is involved with location in space and how one might interact with a particular visual stimulus (Milner & Goodale, 2008; Ungerleider & Haxby, 1994). For example, when you see a ball rolling down the street, the “what pathway” identifies what the object is, and the “where/how pathway” identifies its location or movement in space. The dorsal stream functions to provide a detailed map of the visual environment and also detect and analyze movements. Areas of the parietal cortex where the dorsal stream projects to is essential for the perception and interpretation of spatial relationships between yourself and the environment, maintaining self body image and sense of self space, and the learning of tasks involving coordination of the body in the environment (Paradiso, Bear & Connors, 2007). Interruption of the dorsal pathway creates visual spatial disorientation characterized not only by misperception of the relative positions of spatial landmarks, but also by locating deficits during object-oriented action (Ungerleider & Mishkin, 1982).



Functional areas related to the processing of spatial and semantic information. Based on the Goodale and Milner, 1992.



   We do not see the world in black and white; neither do we see it as two-dimensional (2-D) or flat (just height and width, no depth). Let’s look at how color vision works and how we perceive three dimensions (height, width, and depth).

Color Vision

   Normal-sighted humans and apes have three different types of cones that mediate color vision. Each of these cone types is maximally sensitive to a slightly different wavelength of light demonstrating a trichromatic system of color perception creating combinations of red, green, and blue. A receptor that is maximally sensitive to a specific wavelength is referred to as being tuned to that specific wavelength or associated color. Tuning curves as displayed for the three different types of receptors shows the range of sensitivity, the minimum stimulus intensity that the receptor is activated. For example, blue cones are most sensitive to reflected light of 437 nm, for this reason these receptors are referred to as S or short-wavelength receptors. Cones that code for green response maximally to 533 nm and are referred to as M or medium wavelength receptors, and red codes respond maximally to 564 nm and are referred to a L or long wavelength receptors. As the curves on the graph below represent, the three receptors do respond to wavelengths besides what they are maximally tuned to, but the signal transmitted from the cones is weaker in this case. Photoreceptors have a graded sensitivity meaning each rod and cone responds to a wide variety of colors, but signals a specific wavelength by the amplitude of the receptor response to the light. Therefore we are able to perceive such a vast array of hues by way of varying activation of all three cones as well as activation from rods mixing signals as light is transduce from the cones and rods, to the bipolar cells, and through the retinal ganglion cells where those cells bundle together and enter the brain by way of the optic nerve through the optic disc (blind spot).


A graph is shown with “sensitivity” plotted on the y-axis and “Wavelength” in nanometers plotted along the x-axis with measurements of 400, 500, 600, and 700. Three lines in different colors move from the base to the peak of the y axis, and back to the base. The blue line begins at 400 nm and hits its peak of sensitivity around 455 nanometers, before the sensitivity drops off at roughly the same rate at which it increased, returning to the lowest sensitivity around 530 nm . The green line begins at 400 nm and reaches its peak of sensitivity around 535 nanometers. Its sensitivity then decreases at roughly the same rate at which it increased, returning to the lowest sensitivity around 650 nm. The red line follows the same pattern as the first two, beginning at 400 nm, increasing and decreasing at the same rate, and it hits its height of sensitivity around 580 nanometers. Below this graph is a horizontal bar showing the colors of the visible spectrum.

This figure illustrates the different sensitivities for the three cone types found in a normal-sighted humans. (credit: modification of work by Vanessa Ezekowitz)


   While the trichromatic theory describes how many receptor types we have in order to perceive combination of activation from the three cone types, the opponent process model describes processes the receptor and bipolar cells use in order to create different signals representing the variety of hues we are able to classify. Opponent process describes an antagonistic pattern of response by cones where color is coded in opponent pairs: black-white, yellow-blue, and green-red. The basic idea is that some cells of the visual system are excited by one of the opponent colors and inhibited by the other. So, a cell that was excited by wavelengths associated with green would be inhibited by wavelengths associated with red, and vice versa. This opponent process is thought to occur when the information is transferred from the cones to the bipolar cells, where hues are coded for ganglion cells and sent through the optic nerve. One of the implications of opponent processing is that we do not experience greenish-reds or yellowish-blues as colors. Another implication is that this leads to the experience of negative afterimages. An afterimage describes the continuation of a visual sensation after removal of the stimulus. For example, when you stare briefly at the sun and then look away from it, you may still perceive a spot of light although the stimulus (the sun) has been removed. When color is involved in the stimulus, the color pairings identified in the opponent-process theory lead to a negative afterimage. Therefore you end up seeing the colors that are opponent to the colors you were looking at before the stimulus was removed from sight. You can test this concept using the flag in the below figure.


An illustration shows a green flag with a thick, black-bordered yellow lines meeting slightly to the left of the center. A small white dot sits within the yellow space in the exact center of the flag.

Stare at the white dot for 30–60 seconds and then move your eyes to a blank piece of white paper. What do you see? This is known as a negative afterimage, and it provides empirical support for the opponent-process theory of color vision.


   But these two theories—the trichromatic theory of color vision and the opponent-process theory—are not mutually exclusive. Opponent process works to create hues based on the amount of information receptors are translating. Opponent processing of color vision has been demonstrated in various species other than human such as monkeys (De Valois, 1960), and birds (Maturana & Varela, 1982) allowing for a better understanding of the appearance of the visual world for other species. Cones are maximally responsive to three different wavelengths that represent red, blue, and green (Trichromatic signaling), and once the signal moves from the cones through the bipolar cells and into the brain by way of the ganglion bundles in the optic nerve, the cells respond in a way consistent with opponent-process theory (Land, 1959; Kaiser, 1997).

Depth Perception

   Our ability to perceive spatial relationships in three-dimensional (3-D) space is known as depth perception. With depth perception, we can describe things as being in front, behind, above, below, or to the side of other things. Gibson and Walk (1960) demonstrated children as young as 6 months are hesitant in being coaxed to crawl over a visual cliff. This suggests humans develop depth perception while the vision system matures and therefore depth perception is not an acquired or learned capability.

Our world is three-dimensional, so it makes sense that our mental representation of the world has three-dimensional properties. We use a variety of cues in a visual scene to establish our sense of depth. Some of these are binocular cues, which means that they rely on the use of both eyes. One example of a binocular depth cue is binocular disparity, the slightly different view of the world that each of our eyes receives. To experience this slightly different view, do this simple exercise: extend your arm fully and extend one of your fingers and focus on that finger. Now, close your left eye without moving your head, then open your left eye and close your right eye without moving your head. You will notice that your finger seems to shift as you alternate between the two eyes because of the slightly different view each eye has of your finger. A 3-D movie works on the same principle: the special glasses you wear allow the two slightly different images projected onto the screen to be seen separately by your left and your right eye. As your brain processes these images, you have the illusion that the leaping animal or running person is coming right toward you.



Celebrities don 3D glasses during the Grammys in 2010.


Another binocular cue, convergence is the brains interpretation of eye muscle contraction, leading to the perception of closer objects when both eyes are focusing on stimuli closer to the nose compared to stimuli farther away. The perception of distance in contrast from the perception of depth as discussed above is based on experiences with objects and stimuli we have encountered in our lives to where we learn associations between estimated differences and differences in eye muscle tension.

Although we rely on binocular cues to experience depth in our 3-D world, we can also perceive depth in 2-D arrays. Think about all the paintings and photographs you have seen. Generally, you pick up on depth in these images even though the visual stimulus is 2-D. When we do this, we are relying on a number of monocular cues, or cues that require only one eye. If you think you can’t see depth with one eye, note that you don’t bump into things when using only one eye while walking—and, in fact, we have more monocular cues than binocular cues.

An example of a monocular cue would be what is known as linear perspective. Linear perspective refers to the fact that we perceive depth when we see two parallel lines that seem to converge in an image (figure below). Some other monocular depth cues are interposition, the partial overlap of objects, and the relative size and closeness of images to the horizon.


A photograph shows an empty road that continues toward the horizon.

We perceive depth in a two-dimensional figure like this one through the use of monocular cues like linear perspective, like the parallel lines converging as the road narrows in the distance. (credit: Marc Dalmulder)


All these methods of determining where something is and its size are fine while standing still, but when we start moving around, how do we adapt to our changing environment? Perceptual constancy demonstrates that even when important pieces of information such as lighting, shape and size of the environment change we are still able to maintain that the objects in the environment are not also changing. A door for example is rectangular (usually) when it is closed, and when it is propped open, when view the same door we still perceive the door as rectangular, a process known as shape constancy. We also utilize shape constancy and size constancy to maintain an assumption that although our view of the world changes, the objects within the environment remain the unchanged.


   Dr. Bruce Bridgeman (a neuroscientist form the University of California, Santa Cruz) was born with an extreme case of lazy eye that resulted in him being stereoblind, or unable to respond to binocular cues of depth. He relied heavily on monocular depth cues, but he never had a true appreciation of the 3-D nature of the world around him. If a bird were to jump out from a tree, those around Bridgeman would act in surprise of the approaching animal, while Dr. Bridgeman perceived the bird as just another aspect of the background. This all changed one night in 2012 while Dr. Bridgeman was seeing Martin Scorsese’s 3-D family adventure with his wife. Even though Dr. Bridgeman thought it was a waste of money to purchase the 3-D glasses due to his known vision issues, Bruce paid for the glasses when he purchased his ticket. As soon as the film began, Bruce put on the glasses and experienced something completely new. For the first time in his life he appreciated the true depth of the world around him. Remarkably, his ability to perceive depth persisted outside of the movie theater. For the first time, Dr. Bridgeman saw a lamppost standing out from the background. Trees, cars and people looked more alive and more vivid than ever. And, remarkably, he’s seen the world in 3-D ever since that day. “Riding to work on my bike, I look into a forest beside the road and see a riot of depth, every tree standing out from all the others,” he says. A similar case of a vision epiphany was recorded by neurologist Oliver Sacks’s patient know as “Stereo Sue”. Sue Barry (“Stereo Sue”) was also stereo blind until her mid forties when she also had a vision epiphany after undergoing vision therapy which allowed her to perceive objects in the environment as three dimensional as opposed to one continuous two dimensional background. In both cases, Dr. Bridgeman’s and Sue Barry’s eyes were not used to converging on the same point in space. Dr. Bridgeman and Sue Barry both had congenital conditions that affected the focus of both their eyes (Dr. Bridgeman was born with alternating exotropic strabismus, often called “lazy eye”, and Sue Barry was born with congenital strabismus also known as cross eyed). The glasses in the case of Dr. Bridgeman, and vision therapy in the case of Sue Barry were able to train both eyes that before would focus on different points in space to converge on the same spot, creating a visual epiphany of stereoscopic vision. There are cells in the nervous system that respond to binocular depth cues at specific critical periods. Normally, these cells require activation during early development in order to persist, so experts familiar with Dr. Bridgeman’s  case (and others like his) assume that at some point in his development, Dr. Bridgeman must have experienced at least a fleeting moment of binocular vision. It was enough to ensure the survival of the cells in the visual system tuned to binocular cues. The mystery now is why it took Bruce nearly 70 years to have these cells activated (Peck, BBC Future, 2012, Sacks, 2006).


   Light waves cross the cornea and enter the eye at the pupil. The eye’s lens focuses this light so that the image is focused on a region of the retina known as the fovea. The fovea contains cones that possess high levels of visual acuity and operate best in bright light conditions. Rods are located throughout the retina and operate best under dim light conditions. Visual information is translated by the rods and cones, is transmitted to the bipolar cells and on to the ganglion cells which bundle together and leaves the eye via the optic nerve. Information from each visual field is sent to the opposite side of the brain at the optic chiasm. Visual information then moves through a number of brain sites before reaching the occipital lobe, where it is processed.

Two theories explain color perception. The trichromatic theory asserts that three distinct cone groups are tuned to slightly different wavelengths of light, and it is the combination of activity across these cone types that results in our perception of all the colors we see. The opponent-process theory of color vision asserts that color is processed in opponent pairs and accounts for the interesting phenomenon of a negative afterimage. We perceive depth through a combination of monocular and binocular depth cues.



Openstax Psychology text by Kathryn Dumper, William Jenkins, Arlene Lacombe, Marilyn Lovett and Marion Perlmutter licensed under CC BY v4.0. https://openstax.org/details/books/psychology




Review Questions:

1. The ________ is a small indentation of the retina that contains cones.

a. optic chiasm

b. optic nerve

c. fovea

d. iris


2. ________ operate best under bright light conditions.

a. cones

b. rods

c. retinal ganglion cells

d. striate cortex


3. ________ depth cues require the use of both eyes.

a. monocular

b. binocular

c. linear perspective

d. accommodating


4. If you were to stare at a green dot for a relatively long period of time and then shift your gaze to a blank white screen, you would see a ________ negative afterimage.

a. blue

b. yellow

c. black

d. red


Critical Thinking Question:

1. Compare the two theories of color perception. Are they completely different?

2. Color is not a physical property of our environment. What function (if any) do you think color vision serves?


Personal Application Question:

1. Take a look at a few of your photos or personal works of art. Can you find examples of linear perspective as a potential depth cue?




binocular cue

binocular disparity

blind spot



depth perception




linear perspective

monocular cue

opponent-process theory of color perception

optic chiasm

optic nerve





trichromatic theory of color perception

Answers to Exercises

Review Questions:

1. C

2. A

3. B

4. D


Critical Thinking Question:

1. The trichromatic theory of color vision and the opponent-process theory are not mutually exclusive. Research has shown they apply to different levels of the nervous system. For visual processing on the retina, trichromatic theory applies: the cones are responsive to three different wavelengths that represent red, blue, and green. But once the signal moves past the retina on its way to the brain, the cells respond in a way consistent with opponent-process theory.

2. Color vision probably serves multiple adaptive purposes. One popular hypothesis suggests that seeing in color allowed our ancestors to differentiate ripened fruits and vegetables more easily.



afterimage: continuation of a visual sensation after removal of the stimulus

binocular cue: cue that relies on the use of both eyes

binocular disparity: slightly different view of the world that each eye receives

blind spot: point where we cannot respond to visual information in that portion of the visual field

cone: specialized photoreceptor that works best in bright light conditions and detects color

cornea: transparent covering over the eye

depth perception: ability to perceive depth

fovea: small indentation in the retina that contains cones

iris: colored portion of the eye

lens: curved, transparent structure that provides additional focus for light entering the eye

linear perspective: perceive depth in an image when two parallel lines seem to converge

monocular cue: cue that requires only one eye

opponent-process theory of color perception: color is coded in opponent pairs: black-white, yellow-blue, and red-green

optic chiasm: X-shaped structure that sits just below the brain’s ventral surface; represents the merging of the optic nerves from the two eyes and the separation of information from the two sides of the visual field to the opposite side of the brain

optic nerve: carries visual information from the retina to the brain

photoreceptor: light-detecting cell

pupil: small opening in the eye through which light passes

retina: light-sensitive lining of the eye

rod: specialized photoreceptor that works well in low light conditions

trichromatic theory of color perception: color vision is mediated by the activity across the three groups of cones