Seeing my seeing and the world as external memory
What is it like to see? Weirdly, exactly what it feels like to see can be elusive. The thing is that my understanding of my mind’s nature seems to be capable of changing drastically. This is what I want to write about today.
When I started researching eye movements at the beginning of
the summer, I was basically starting from ground zero. My supervisor had a
major challenge: figuring out the best way to introduce eye movements to two
undergraduate psychology students with relatively limited biological training. She
made a somewhat interesting choice: a 2003 book titled Active Vision: The Psychology
of Looking and Seeing dominated the reading list. This book
would bring about the first true major shift in my understanding of my mind.
Importantly, the book hasn’t just shown me I was wrong about
how I thought about vision but planted the seeds for new intuitions that give
me insight into my visual system. In a way, this book paved the way for a
reduction of my subjective experience of vision to the neural substrate that
produces it. That is, it allowed me to experience my vision just as the
neural process of vision.
OK. Back to active vision. The thesis of active vision is relatively simple. Basically, it proposes that vision isn’t simply a passive process but rather an active (shocker) process that makes room for eye movements and expectations to influence our visual experiences. In general, our vision serves us well, allowing us to recognize our friends on the street, quickly notice a car coming towards us and figure out how tall the steps we’re climbing are. Our vision works so well that we rarely have to think about it at all. Because we have so little need to understand how our vision works, most people don’t hold much of a view on the subject at all. I bet that if you asked the average person on the street, they would agree with the idea that our visual systems work like cameras: light comes into our eyes, something happens, then we see the output as if we were looking at the display on a camera. This camera view is more or less what's meant by a passive view of vision.
This thinking—that vision is like a camera—comes with several hidden underlying assumptions. For example, that the visual processing that goes on in our brains is hierarchical: that we begin by extracting the most basic aspects of what we see then the further along you go, the more high-level the processing is (figure 1). Another assumption has to do with the detail of what we see. If you asked a person how much of what they see they actually see in detail, they might look at you like you’re crazy. Obviously, again like a camera, we can see whatever light comes into our eyes! Just as you can see the book in front of you, you can see the person sitting to your left. If you told them to look straight ahead then held a book off the central path of their eyes, though, they will likely struggle greatly and maybe be unable to read a single sentence of the book! What is going on here?

Figure 1. An illustrative figure depicting hierarchical processing. At each successive stage, visual processing becomes more and more specific until the point of reaching consciousness in a state similar to a photo from a camera.
The crux of this blog post is this idea: our visual
experience at any given time has much less detail than we would think. The
reality is that the 20/20 vision you may be so proud of only applies to a tiny area
of what you can see. In the retina (the part of the eye that converts light to
neural signals our brains can understand), detail drops off quickly once you leave it's center. When you look at the primary visual cortex in the back of the
brain, a similar pattern emerges: vast swaths of neural real estate are devoted
to processing visual input from the center of the eyes where detail is great,
while much smaller amounts are devoted to processing the periphery.
So why do we seem to have the sense of such completeness in
what we see? Both underlying assumptions of the passive view I identified: that
visual processing happens hierarchically, and that the detail across the visual
field is relatively constant do some heavy lifting. First, we know that vision
isn’t just a hierarchical process. We’re really good at forming expectations
about what we will see, and those expectations affect what we do ultimately
see (Figure 2). If we expect to run into a friend, it will take much less time to
recognize them. Likewise, if we expect to see a cat in our backyard but instead
cross paths with a skunk, it can take time for us to overwrite our expectation
of a cat with the visual information coming in that suggests we’re looking at a
skunk. So, in the edges of our vision, we’re constantly making predictions
about what we see. Those predictions may have a bigger impact on what we see
than the visual information coming from our retinas.
Figure 2. An illustrative figure of non-hierarchical processing. In addition to lower level areas influencing higher level visual areas, higher level areas influence processing within the lower-level areas. Additionally, expectations of what we will see influence processing in the brain. Our expectations can "prime" our visual areas to identify something as a certain object. As long as the visual information agrees, this allows us to more efficiently explore the world!
One HUGE factor ties all this together with a bow, though:
eye movements. Eye movements run our visual world. Our eyes are almost
always moving whether it be the tiny jumps of microsaccades or the smooth
rotation of smooth pursuit eye movements. Our eyes move more frequently than
our heart beats! When we move our eyes, we can focus the small detailed parts
of our retinas on objects of interest. Essentially the only limitation of our
ability to make eye movements lies in the short durations they take and the
need to briefly pause between them to process visual information. So, when we
want to see information in the world, we move our eyes to collect it. If you
know a picture frame is to your left, the fact that only a hazy rectangle is
visible in your peripheral vision doesn’t matter because your brain knows it’s
there and so you can just “see” what you know the picture frame looks like
rather than the actual blurry rectangle your retina is sending. If we’re ever
unsure that that blurry rectangle actually is the picture frame, we can just
make an eye movement to see it in all its’ glorious detail. Because our
predictions are often pretty accurate, the detailed view we get by making an
eye movement tends not to violate our expectations and so we get the sense that
we saw that area in detail all along.
An important idea grows out of this: that we never do form
the final image on the camera which we subsequently see. Instead, we’re always
in the process of seeing. Seeing isn’t getting a final picture and analyzing
it. Instead, it’s making predictions about the world, comparing those
predictions to the information our retinas are dispatching to the brain, and
taking efforts to remedy any conflicts between the two. If we need more
detailed retinal information to gain information about an object, we can just
move our eyes to the area of uncertainty in our peripheral vision. We don’t
need a camera mechanism to form a final image because the world already does
for us. All the visual information we need is in the world and our efficient predictions
and eye movements allow us to draw on it as we need. Our sense of visual
completeness doesn’t arise from our visual systems generating a complete image
of the world. We know that the conclusions drawn by our visual systems can be
spotty. Instead, the sense of visual completeness comes from the steadiness of
the world: the knowledge that we can quickly get visual information from
the world alleviates the anxiety of not knowing what’s out there. Seeing is
something I do, not just something that happens.
This is my very first actual blog post! It’s difficult to
write these, but it’s certainly an interesting practice. The biggest struggle
is trying to determine who my intended audience is. As of now, nobody but I do
think it’s good practice to write in a way most people can understand while still being mindful about how the topics I’m discussing can require a good
amount of background knowledge.
References
Churchland, P. S., Ramachandran, V. S., & Sejnowski, T. J. (1994). A
critique of pure vision. In Large-scale neuronal theories of the brain
(pp. 23–60). The MIT Press.
Findlay, J. M., & Gilchrist, I. D. (2003). Active Vision.
Oxford University Press.
https://doi.org/10.1093/acprof:oso/9780198524793.001.0001
O’Regan, J. K. (1992). Solving the “real” mysteries of visual
perception: The world as an outside memory. Canadian Journal of Psychology / Revue Canadienne de Psychologie, 46(3), 461–488.
https://doi.org/10.1037/h0084327
Further reading
Churchland, P. (2013). Touching a Nerve: Our Brains, Our Selves. W. W.
Norton & Company.

Comments
Post a Comment
I'm very happy to hear your thoughts on my thoughts! I will be moderating all comments for the sake of preventing any spam so it could take a bit of time before I get to accepting your comment.