Seeing my seeing and the world as external memory

What is it like to see? Weirdly, exactly what it feels like to see can be elusive. The thing is that my understanding of my mind’s nature seems to be capable of changing drastically. This is what I want to write about today.

When I started researching eye movements at the beginning of the summer, I was basically starting from ground zero. My supervisor had a major challenge: figuring out the best way to introduce eye movements to two undergraduate psychology students with relatively limited biological training. She made a somewhat interesting choice: a 2003 book titled Active Vision: The Psychology of Looking and Seeing dominated the reading list. This book would bring about the first true major shift in my understanding of my mind.

Importantly, the book hasn’t just shown me I was wrong about how I thought about vision but planted the seeds for new intuitions that give me insight into my visual system. In a way, this book paved the way for a reduction of my subjective experience of vision to the neural substrate that produces it. That is, it allowed me to experience my vision just as the neural process of vision.

OK. Back to active vision. The thesis of active vision is relatively simple. Basically, it proposes that vision isn’t simply a passive process but rather an active (shocker) process that makes room for eye movements and expectations to influence our visual experiences. In general, our vision serves us well, allowing us to recognize our friends on the street, quickly notice a car coming towards us and figure out how tall the steps we’re climbing are. Our vision works so well that we rarely have to think about it at all. Because we have so little need to understand how our vision works, most people don’t hold much of a view on the subject at all. I bet that if you asked the average person on the street, they would agree with the idea that our visual systems work like cameras: light comes into our eyes, something happens, then we see the output as if we were looking at the display on a camera. This camera view is more or less what's meant by a passive view of vision.

This thinking—that vision is like a camera—comes with several hidden underlying assumptions. For example, that the visual processing that goes on in our brains is hierarchical: that we begin by extracting the most basic aspects of what we see then the further along you go, the more high-level the processing is (figure 1). Another assumption has to do with the detail of what we see. If you asked a person how much of what they see they actually see in detail, they might look at you like you’re crazy. Obviously, again like a camera, we can see whatever light comes into our eyes! Just as you can see the book in front of you, you can see the person sitting to your left. If you told them to look straight ahead then held a book off the central path of their eyes, though, they will likely struggle greatly and maybe be unable to read a single sentence of the book! What is going on here?

Figure 1. An illustrative figure depicting hierarchical processing. At each successive stage, visual processing becomes more and more specific until the point of reaching consciousness in a state similar to a photo from a camera.

The crux of this blog post is this idea: our visual experience at any given time has much less detail than we would think. The reality is that the 20/20 vision you may be so proud of only applies to a tiny area of what you can see. In the retina (the part of the eye that converts light to neural signals our brains can understand), detail drops off quickly once you leave it's center. When you look at the primary visual cortex in the back of the brain, a similar pattern emerges: vast swaths of neural real estate are devoted to processing visual input from the center of the eyes where detail is great, while much smaller amounts are devoted to processing the periphery.

So why do we seem to have the sense of such completeness in what we see? Both underlying assumptions of the passive view I identified: that visual processing happens hierarchically, and that the detail across the visual field is relatively constant do some heavy lifting. First, we know that vision isn’t just a hierarchical process. We’re really good at forming expectations about what we will see, and those expectations affect what we do ultimately see (Figure 2). If we expect to run into a friend, it will take much less time to recognize them. Likewise, if we expect to see a cat in our backyard but instead cross paths with a skunk, it can take time for us to overwrite our expectation of a cat with the visual information coming in that suggests we’re looking at a skunk. So, in the edges of our vision, we’re constantly making predictions about what we see. Those predictions may have a bigger impact on what we see than the visual information coming from our retinas.

Figure 2. An illustrative figure of non-hierarchical processing. In addition to lower level areas influencing higher level visual areas, higher level areas influence processing within the lower-level areas. Additionally, expectations of what we will see influence processing in the brain. Our expectations can "prime" our visual areas to identify something as a certain object. As long as the visual information agrees, this allows us to more efficiently explore the world!

One HUGE factor ties all this together with a bow, though: eye movements. Eye movements run our visual world. Our eyes are almost always moving whether it be the tiny jumps of microsaccades or the smooth rotation of smooth pursuit eye movements. Our eyes move more frequently than our heart beats! When we move our eyes, we can focus the small detailed parts of our retinas on objects of interest. Essentially the only limitation of our ability to make eye movements lies in the short durations they take and the need to briefly pause between them to process visual information. So, when we want to see information in the world, we move our eyes to collect it. If you know a picture frame is to your left, the fact that only a hazy rectangle is visible in your peripheral vision doesn’t matter because your brain knows it’s there and so you can just “see” what you know the picture frame looks like rather than the actual blurry rectangle your retina is sending. If we’re ever unsure that that blurry rectangle actually is the picture frame, we can just make an eye movement to see it in all its’ glorious detail. Because our predictions are often pretty accurate, the detailed view we get by making an eye movement tends not to violate our expectations and so we get the sense that we saw that area in detail all along.

An important idea grows out of this: that we never do form the final image on the camera which we subsequently see. Instead, we’re always in the process of seeing. Seeing isn’t getting a final picture and analyzing it. Instead, it’s making predictions about the world, comparing those predictions to the information our retinas are dispatching to the brain, and taking efforts to remedy any conflicts between the two. If we need more detailed retinal information to gain information about an object, we can just move our eyes to the area of uncertainty in our peripheral vision. We don’t need a camera mechanism to form a final image because the world already does for us. All the visual information we need is in the world and our efficient predictions and eye movements allow us to draw on it as we need. Our sense of visual completeness doesn’t arise from our visual systems generating a complete image of the world. We know that the conclusions drawn by our visual systems can be spotty. Instead, the sense of visual completeness comes from the steadiness of the world: the knowledge that we can quickly get visual information from the world alleviates the anxiety of not knowing what’s out there. Seeing is something I do, not just something that happens.

This is my very first actual blog post! It’s difficult to write these, but it’s certainly an interesting practice. The biggest struggle is trying to determine who my intended audience is. As of now, nobody but I do think it’s good practice to write in a way most people can understand while still being mindful about how the topics I’m discussing can require a good amount of background knowledge.

References

Churchland, P. S., Ramachandran, V. S., & Sejnowski, T. J. (1994). A critique of pure vision. In Large-scale neuronal theories of the brain (pp. 23–60). The MIT Press.

Findlay, J. M., & Gilchrist, I. D. (2003). Active Vision. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198524793.001.0001

O’Regan, J. K. (1992). Solving the “real” mysteries of visual perception: The world as an outside memory. Canadian Journal of Psychology / Revue Canadienne de Psychologie, 46(3), 461–488. https://doi.org/10.1037/h0084327

Search This Blog

Practicing Perception

Seeing my seeing and the world as external memory

Comments

Post a Comment

Popular posts from this blog

The intersection of Hoffman's rule of generic views and Noë’s sensorimotor contingencies

First Post

More thoughts on the world as external memory