Henri Breuil and Alfred Yarbus Walk into a Bar…A Primer on Pictorial Composition. (Part III)

Note: It is highly recommended that you read So What’s With Jane already? A Primer on Pictorial Composition. (Part I) and “To the makers of music – all worlds, all times.” A Primer on Pictorial Composition. (Part II) before embarking on this installment.

Science shows us truth and beauty and fills each day with a fresh wonder of the exquisite order which governs our world.” -Polykarp Kusch

Thus far we have established a clear definition of pictorial composition and how it is that our biology determines its success. Hopefully, at this point, you are starting to think about composition more in terms of biology instead of prescriptive geometry.

I would like to build on the last two installments by looking at two additional issues concerning our biology. First, I would like to explore whether the ability to elicit meaning from an image is innate or “learned”. Quite a bit of research has been carried out on this topic, and the data should prove insightful for our continuing quest. Secondly, I would like to examine how we visually “experience” a picture in more depth. The results of exploring these two issues should further strengthen our understanding of pictorial composition and improve our ability to assess the functionality of current approaches to it.

In my last installment, I introduced a story from Henri Breuil, a French Catholic priest and amateur archaeologist, which described a Turkish officer who was incapable of recognizing a drawing of a horse, “because he could not move round it.” Being a strict Muslim, the officer was entirely unfamiliar with depictive art and as such, he could not garner meaning from the image. It is easy to see how such stories might lead many to conclude that eliciting meaning from a two-dimensional representation is not an innate human ability. But would that conclusion be correct?

Studies into innate form perception and pictorial perception might suggest otherwise:

Clearly some degree of form perception is innate. This, however, does not dispose of the role of physiological growth or of learning in the further development of visual behavior. Accordingly we turned our attention to the influence of these factors.

…We tested infants with three flat objects the size and shape of a head. On one we painted a stylized face in black on a pink background, on the second we rearranged the features in a scrambled pattern, and on the third we painted a solid patch of black at one end with an area equal to that covered by all the features. We made the features large enough to be perceived by the youngest baby, so acuity of vision was not a factor. The three objects, paired in all possible combinations, were shown to 49 infants from four days to six months old. The results were about the same for all age levels: the infants looked mostly at the “real” face, somewhat less often at the scrambled face, and largely ignored the control pattern. The degree of preference for the “real” face to the other one was not large, but it was consistent among individual infants, especially the younger ones. The experiment suggested that there is an unlearned, primitive meaning in the form perception of infants as well as of chicks.

…The last experiment to be considered is a dramatic demonstration of the interest in pattern in comparison to color and brightness. This time, there were six test objects: flat disks six inches in diameter. Three were patterned-a face, a bull’s-eye and a patch of printed matter. Three were plain-red, fluorescent yellow and white. We presented them, against a blue background, one at a time in varied sequence and timed the length of the first glance at each. The face pattern was overwhelmingly the most interesting, followed by the printing and the bull’s-eye. The three brightly colored plain circles trailed far behind and received no first choices. There was no indication that the interest in pattern was secondary or acquired.“ -Fantz, R., ‘The origin of form perception’, Scientific American, 1961, 204, pp. 66–72

“While a picture is not totally arbitrary, it does involve a good deal of conventionalization in its production, and learning is involved in its interpretation. However, the learning that is involved is often rapid and “instantly generalized.” -Knowlton, James Q. “On the definition of “picture”.” AV Communication Review 14.2 (1966): 157-183.

An often-cited paper by Julian Hochberg and Virginia Brooks from Cornell report the case of a 19 -month-old who had been raised in a pictureless environment. This child could appropriately name pictures of all the familiar objects whose names he had previously learned, and he could do this upon his first exposure to these pictures. “It seems clear from the results that at least one human child is capable of recognizing pictorial representations of solid objects (including bare outline-drawings) without specific training or instruction. This ability necessarily includes a certain amount of what we normally expect to occur in the way of figure-ground segregation and contour-formation. At the very least, we must infer that there is an unlearned propensity to respond to certain formal features of lines-on-paper in the same ways as one has learned to respond to the same features when displayed by the edges of surfaces” -Hochberg, Julian, and Virginia Brooks. “Pictorial recognition as an unlearned ability: A study of one child’s performance.” The American journal of psychology 75.4 (1962): 624-628.

However, some cross-cultural studies demonstrate that difficulty arises more so from pictorial depth perception as opposed to representations of simple objects. While many anecdotal reports, like those of the abovementioned Henri Breuil, suggest that learning is required to recognize pictures in general–research with communities that have little experience with pictures indicates that the greatest difficulty arises in perceiving depth in pictorial material. Subjects who encounter such difficulty would often show a strong preference for “split-type drawings” which depict essential characteristics of an object without pictorial depth.

Data collected among the Baganda of Uganda indicates that pictorial perceptual skills are positively and significantly related to relative amounts of exposure to Western culture. Both urban and relatively more acculturated rural residents make overall more correct identifications of pictorial objects and more consistent use of cues to pictorial depth than more traditional Baganda. These results offer support for the proposition that visual perceptual skills are related to culturally constituted experience.” -Kilbride, Philip L., and Michael C. Robbins. “Pictorial depth perception and acculturation among the Baganda.” American Anthropologist 71.2 (1969): 293-301.

Reports of difficulty in pictorial perception by members of remote, illiterate tribes have periodically been made by missionaries, explorers, and anthropologists. Robert Laws, a Scottish missionary active in Nyasaland (now Malawi) at the end of the 19th century, reported: “Take a picture in black and white and the natives cannot see it. You may tell the natives, ‘This is a picture of an ox and a dog,’ and the people will look at it and look at you and that look says that they consider you a liar. Perhaps you say again, ‘Yes, this is a picture of an ox and a dog.’ Well, perhaps they will tell you what they think this time. If there are a few boys about, you say: ‘This is really a picture of an ox and a dog. Look at the horn of the ox, and there is his tail!’ And the boy will say: ‘Oh! Yes and there is the dog’s nose and eyes and ears!’ Then the old people will look again and clap their hands and say, ‘Oh! Yes, it is a dog.’ When a man has seen a picture for the first time, his book education has begun.”

Mrs. Donald Fraser, who taught health care to Africans in the 1920’s, had similar experiences. This is her description of an African woman slowly discovering that a picture she was looking at portrayed a human head in profile: “She discovered in turn the nose, the mouth, the eye, but where was the other eye? I tried by turning my profile to explain why she could only see one eye, but she hopped round to my other side to point out that I possessed a second eye which the other lacked.” There were also, however, reports of vivid and instant responses to pictures: “When all the people were quickly seated, the first picture flashed on the sheet was that of an elephant. The wildest excitement immediately prevailed, many of the people jumping up and shouting, fearing the beast must be alive while those nearest to the sheet sprang up and fled. The chief himself crept stealthily forward and peeked behind the sheet to see if the animal had a body, and when he discovered that the animal’s body was only the thickness of the sheet, a great roar broke the stillness of the night.” -Deregowski, Jan B. “Pictorial perception and culture.” Scientific American(1972). Nov.:82-88.

So with this in mind, we can perhaps we can better understand why it is that the universal icons that we find in many corners of the world do not seem to contain strong depth cues. This type of information would serve us well should we find ourselves working to design imagery that would target the very young or simply as many members of the species as possible.

Now some may be quick to counter that pictures that contain depth cues must be more inherently complex than simple representations that do not require depth cues, and as such–would be more difficult to process. While in some cases this indeed may be true—there are studies that demonstrate a similar dynamic in performance for extremely simple line configurations.FigBPONZOThe seemingly simple line configuration presented by Italian psychologist Mario Ponzo in 1911 is an effective demonstration of perception at odds with the physical world. The standard Ponzo illusion is configured so that a horizontal line or another figure that is nearer to the interior apex of two converging lines has a tendency to be perceived as greater in length or size as opposed to an identical line or another figure within the converging lines but more distant from the apex. If the standard Ponzo figure is interpreted as a distance or linear perspective cue abstract, then an observer will interpret the “inducing lines” of the Ponzo configuration as parallel lines which are in fact converging into the distance in accordance with the effects of linear perspective. In this context, it would be appropriate to assume that two similar objects at different distances can provide equal-sized retinal images only if the more distant object is larger than, the nearer.FigCPONZOVariations on the illusion demonstrate similar effects. In the above variation, we can see that the circle on the right appears larger than the one on the left. As with the standard illusion, both shapes are identical in size.

While some experiments in the past have manipulated Ponzo line configurations and other geometric “illusions” to downplay the contributions of linear perspective (e.g., Coren & Girgus 1978; Yamagami 1978), many tests were performed that confirmed the impact of depth cues in influencing Ponzo effect judgments (e.g., Gogel, 1975; Kilbride & Leibowitz, 1975; Leibowitz, Brislin, Perlmutter, & Hennessy, 1969; Miller, 1997; Newman & Newman, 1974; Patterson & Fox, 1983; Schiller & Wiener, 1962). I submit that the alterations to the Ponzo configuration that purport to confound intuitive explanations involving linear perspective do not refute the contributions of perspective cues—but seem to reinforce the connection by demonstrating a significant diminishment of the effect as distance and perspective cues are further abstracted.

An additional bolster to the idea of Ponzo’s effect magnitude being reliant on contextual distance/perspective cues can be found with cross-cultural experiments regarding the illusion in Uganda. (Leibowitz & Pick 1972). Reactions to the geometric configurations varied between study groups who were accustomed to “industrialized” environments and groups living in more natural, rural environments. Students from a local university responded to the illusion very similarly to U.S. university students, while the rural villagers saw no illusion at all.

Furthermore regarding depth cues and size, when I first read the Deregowski’s Scientific American article, I remember taking a particular interest in the report of how the people reacted when the picture of the elephant was projected onto a sheet. When the chief approached the sheet it seemed that he was surprised at the thickness—possibly implying that the scale of the projection must have been reasonably similar to the size of an actual elephant. In considering this, it made me suspect that the problems with pictorial depth cues might be related to size constancy and the way in which we use size to communicate pictorial depth. In a 2011 paper, Stephen E. Palmer et. al. writes “In earlier research, Konkle and Oliva (in press) found that the preferred visual size of a picture of an object is proportional to the logarithm of its known physical size. They showed that, when viewing pictures of objects of different physical sizes within a frame, smaller sizes within the frame were preferred for smaller objects in the real world (e.g., strawberries or a key), whereas larger sizes in the frame were preferred for larger real-world objects (e.g., a piano or chair). They called these effects `canonical size’ in analogy with Palmer et al’s (1981) `canonical perspective’ effects, showing that people systematically prefer some perspective views of objects over others.

Overall, the findings support a clear bias toward canonical size in aesthetic preferences for framed 2-D images. This bias seems to be conceptually related to another ecological bias reported by Sammartino and Palmer (submitted) for objects that are characteristically located above the viewer in the world to be located high in the picture frame (eg ceiling-mounted light fixtures and flying eagles) and for objects that are characteristically located below the viewer in the world to be located lower in the picture frame (e.g., bowls on tables and swimming stingrays). We call these effects `ecological’ because they appear to be driven by people preferring images in which the spatial properties of the image of the depicted object within its frame fit the ecological properties of the physical object relative to the viewer. Canonical-size effects on aesthetic judgments thus indicate that people tend to prefer images in which the size of the object’s image within its frame fits their knowledge of its actual physical size.” –Linsen, S., Leyssen, M. H. R., Gardner, J. S., & Palmer, S. E. (2011). Aesthetic preferences in the size of images of real-world objects. Perception. 40 (3), 291-298. T

For an even better look at many of these studies I recommend the following paper as it covers many of the ones listed here: Bovet, Dalila, and Jacques Vauclair. “Picture recognition in animals and humans.” Behavioral brain research 109.2 (2000): 143-165.

At this point I would like to move forward on to how exactly we interact with a picture, or a “complex stimulus” and I cannot think of a better place to start than with the work of Alfred Yarbus.

Alfred Lukyanovich Yarbus was a Russian psychologist who studied eye movements in the 1950s and 1960s. He pioneered the study of saccadic exploration of complex images, by recording the eye movements performed by observers while viewing natural objects and scenes. In this very influential work, Yarbus showed that the trajectories followed by the gaze depended on the task that the observer has to perform. The gaze tends to jump back and forth between the same parts of the scene, for example, the eyes and mouth in the picture of a face. If an observer were asked specific questions about the images, his/her eyes would concentrate on areas of the images of relevance to the questions. His book Eye Movements and Vision, published in Russian in 1965 and translated into English by Basil Haigh in 1967, has had a profound influence on recent approaches to the study of eye movements and vision.

While Eye Movements and Vision is fascinating from cover to cover, chapter seven–Eye Movements during Perception of Complex Objects–is especially insightful for those of us studying pictorial composition.

While I wish I could quote this chapter in its entirety, I will limit myself to a few key bits that offer the most bang for the buck. I encourage everyone reading this paper to try to read this entire chapter (again, chapter seven), if not the entire book for yourself. It is incredibly insightful and may significantly alter your notions regarding how we interact with pictures.


Examples of Yarbus’ eye-tracking data from studies conducted using a classic painting by Russian artist Ilya Efimovich Repin. Painted in 1884 in support of social reform, the image depicts a soldier returning home from exile in Siberia, greeted by his mother as his wife shyly lingers behind the door.

Yarbus states: “Analysis of the eye-movement records shows that the elements attracting attention contain, in the observer’s opinion, may contain, information useful and essential for perception. Elements on which the eye does not fixate, either in fact or in the observer’s opinion, do not contain such information.

Yarbus goes further to state that detail, brightness factors, or even a favorite color will not determine the degree of attention unless those elements “give essential and useful information” within their context. In addition, “Analysis shows that the outlines have no effect on the character of the eye movements. In the movements of the eye, we have no analogy with the movements of the hand of a blind person, tracing the outlines and contours. Outlines and contours are important for the appearance of the visual image, but when the image has appeared and is seen continuously, the observer has no need to concern himself especially with borders and contours. Borders and contours are only elements from which, together with other no less important elements, our perception is composed, and the object recognized.“

“…Records of eye movements show that the observer’s attention is usually held only by certain elements of the picture. As already noted, the study of these elements shows that they give information allowing the meaning of the picture to be obtained. Eye movements reflect the human thought process; so the observer’s thought may be followed to some extent from records of eye movements (the thought accompanying the examination of the particular object). It is easy to determine from these records which elements attract the observer’s eye (and, consequently, his thought), in what order, and how often.

However, it should be noted that “The observer’s attention is frequently drawn to elements which do not give important information but which, in his opinion, may do so. Often an observer will focus his attention on elements that are usual in the particular circumstances, unfamiliar, incomprehensible, and so on.”Yarbus_The_Visitor

“…In conclusion, I must stress once again that the distribution of the points of fixation on an object, the order in which the observer’s attention moves from one point of fixation to another, the duration of fixations, the distinctive cyclic pattern of examination, and so on are determined by the nature of the object and the problem facing the observer at the moment of perception.” -Yarbus, A. (1967). Eye movements and vision (B. Haigh & L. A. Riggs, Trans.). New York: Plenum Press

Now while Yarbus’ work with eye tracking is extremely insightful—current research and technologies allow us to look much deeper at how we experience a picture. Such study offers us a glimpse, not only into how we might garner meaning from complex stimuli, but how we might be influenced by an image’s aesthetic qualities as well.

It is important to note that our understanding of the neural underpinnings of perception is largely built upon studies employing 2-dimensional images. Percept surrogates have been used for many years to study cortical regions along the ventral and dorsal visual processing streams. Even simplified monochrome shapes, silhouettes, and line drawings can be shown to elicit significant responses in regions of the occipital and temporal cortex that respond more strongly to intact object images (object-selective cortex).

Studies of more specific areas of the brain go further to help us understand why certain spatial preferences might arise (Battaglia et al, 2011) Such research explores how observers of a still image of an action may extract dynamic information by extrapolating future position from the motion implied by the photograph (Kourtzi and Kanwisher, 2000). This concept will be something that we will revisit a bit later when discussing center and inward bias (Palmer et. al., 2008).

We will also look at the fruits of a newly emerging sub-discipline of empirical aesthetics dubbed “Neuroaesthetics”. This new branch of investigation takes a scientific approach to the study of aesthetic perceptions of art, music, or any object that can give rise to aesthetic judgments. Neuroaesthetics uses neuroscience to explain and understand the aesthetic experiences at the neurological level. It is a popular area of research and has been steadily gaining multidisciplinary interest and contributions from neuroscientists, art historians, artists, and psychologists.

As this installment is already quite lengthy, I will refrain from going into to all of this in detail now. In closing allow me to stress once again the importance of considering our biology in the role of “picture building.” It may be initially difficult to put aside the many prescriptive geometric heuristics that have been deployed by so many artists in the past, but I believe that we can achieve more efficient and effective results in the here and now by embracing the fruits of so many scientific disciplines. It is an exciting time for both science and art.

In the next installment, we will be looking at many historical devices “used” in pictorial composition, assess the claims that surround their use, and examine if current research confirms their effectiveness.

PS—Feel free to use the contact link above or the comment section below to share any questions or suggestions regarding this ongoing series.

A special thank you to Leah Waichulis for her help with this installment.


Battaglia, Fortunato, Sarah H. Lisanby, and David Freedberg. “Corticomotor excitability during observation and imagination of a work of art.” Frontiers in human neuroscience 5 (2011): 79.

Bovet, Dalila, and Jacques Vauclair. “Picture recognition in animals and humans.” Behavioral brain research 109.2 (2000): 143-165.

Coren, Stanley, and Joan S. Girgus. Seeing is deceiving: The psychology of visual illusions. Lawrence Erlbaum, 1978.

Deregowski, Jan B. “Pictorial perception and culture.” Scientific American (1972) Nov;227(5):82-88.

Fantz, R., ‘The origin of form perception’, Scientific American, (1961), 204, pp. 66–72.

Knowlton, James Q. “On the definition of “picture”.” AV Communication Review 14.2 (1966): 157-183.

Gogel, Walter C., and Robert E. Newton. “Depth adjacency and the rod-and-frame illusion.” Perception & Psychophysics 18.2 (1975): 163-171.

Hochberg, Julian, and Virginia Brooks. “Pictorial recognition as an unlearned ability: A study of one child’s performance.” The American journal of psychology 75.4 (1962): 624-628.

Kawabata, Nobuo, Kiyoshi Yamagami, and Morikazu Noakl. “Visual fixation points and depth perception.” Vision Research 18.7 (1978): 853-854.

Kilbride, Philip L., and Herschel W. Leibowitz. “Factors affecting the magnitude of the Ponzo perspective illusion among the Baganda.” Perception & Psychophysics 17.6 (1975): 543-548.

Kilbride, Philip L., and Michael C. Robbins. “Pictorial depth perception and acculturation among the Baganda.” American Anthropologist 71.2 (1969): 293-301.

Kourtzi, Zoe, and Nancy Kanwisher. “Activation in human MT/MST by static images with implied motion.” Journal of cognitive neuroscience 12.1 (2000): 48-55.

Leibowitz, Herschel W., and Herbert A. Pick. “Cross-cultural and educational aspects of the Ponzo perspective illusion.” Perception & Psychophysics 12.5 (1972): 430-432.

Linsen, S., Leyssen, M. H. R., Gardner, J. S., & Palmer, S. E. (2011). Aesthetic preferences in the size of images of real-world objects. Perception. 40 (3), 291-298. T

Miller, R. J. “Pictorial depth cue orientation influences the magnitude of perceived depth.” Visual Arts Research (1997): 97-124.

Newman, Colin V., and Barbara M. Newman. “The Ponzo illusion in pictures with and without suggested depth.” The American journal of psychology(1974): 511-516.

Palmer, S. E., & Gardner, J. S. (2008) Aesthetic issues in spatial composition: Effects of position and direction on framing single objects. Spatial Vision, 21, 421-449.

Patterson, Robert, and Robert Fox. “Depth separation and the Ponzo illusion.” Perception & psychophysics 34.1 (1983): 25-28.

Schiller, Peter, and Wiener, Morton. “Binocular and stereoscopic viewing of geometric illusions.” Perceptual and motor skills 15.3 (1962): 739-747.

Yarbus, A. (1967). Eye movements and vision (B. Haigh & L. A. Riggs, Trans.). New York: Plenum Press

5 Comments Henri Breuil and Alfred Yarbus Walk into a Bar…A Primer on Pictorial Composition. (Part III)

  1. Mary

    Regarding the story of Breuil and the Turkish officer. I’d like to add a couple of thoughts. The story may be true or it may be apocryphal or it may be a mis-read true event. I use the word apocryphal deliberately because there are issues of religious belief in play, and beyond that it is a beautifully descriptive term.

    Assuming the event occurred, the reasons for inability of the officer to “see” the horse in the painting could be rather different than a simply unfamiliarity with 2d visual representation. For a start, muslims are not universally devoid of familiarity with visual images, and the proscription on creating images of animals and people only emerged at certain points in Islamic history, in certain sectors of the religion, and has not been held to universally. The objection to making images comes to Islam from the Jewish tradition and is actually a variously twisted and turned proscription against making an image of something on earth or in heaven and worshiping it. The aspect of worshiping the image is often dropped off the end of the statement in various discussions outside of the religions concerned, but it is actually crucial, if not the only salient point when it comes to the actual issue of images in Judeo-Christian-Islamic religions.

    That brings me to my primary point, that it likely the officer refused to allow himself to recognise the image in his mind for hear of falling into the “sin” of worshiping an image. Along the lines of a cognitive hands over the eyes or ears, we could add the Officer to our classic “three monkeys” – see no evil, hear no evil, speak no evil, and think no evil. This view is made more likely by the fact fact that the object presented to the officer – the very large horse portrait, was obviously an object of veneration, the size, prominence of it’s display, and presumable very ornate and expensive frame etc, all communicated in an instant (context of visual communication here) that the object was venerated, which is verging too close for comfort to worship. So to be on the safe side the Officer “does not recognise”, or does not acknowledge the recognition of the image.

    Hope that makes sense, and I haven’t run on too long. Thought I could offer a perspective on the religious matters at hand, I’m not Moslem, but have had a long association with Christian religious issues, including the ridiculous battle around images in some sectors of the Christian Church.

    A secondary point is that a proscription against the making of images of animals and people, or gods, and worshiping them does not proscribe the making of images of plants, buildings, rivers, rocks etc etc, all of which are found in beautiful frescoes and mosaics in some of the world’s most beautiful buildings – middle eastern mosques. The point here is even if the ability to read a 2d image as representing a physical object needs to be learnt and is not biological, then surely learning to read the images of certain objects in 2d will suffice to equip one to read the images of others, even if one is unfamiliar with the images of the second category?

    I don’t think that the proliferation of extremely ancient stone age rock art and carvings is indicative of a lack of biological ability to read 2d and 3d images and objects as representation of real objects, but rather supports the view it is innate, or instinctive in humans.

    1. Anthony Waichulis

      Thank you for your thoughts here Mary. The point of the story in this context was not the religious aspects of the tale, but the idea that the ability to understand a two-dimensional image as a biological percept surrogate may not be innate. If you read further in the series you will see that there are indeed a number of actual studies (cited) that support this idea. This tale of the Turkish officer was not meant as substantiation of the phenomena. 🙂

      1. Mary Muss

        Thanks Anthony. I realised the point is (clearly) examining the innate or learnt nature of our visual perception. It is just that I’ve come across this story a number of times (including in psych unit at University) and found it perplexing as I see the (mentioned) issues around it. I’ll look at the other studies if I can find them. It’s intriguing to think about visual images perhaps not being instinctively recognisable by humans. The learnt aspect of almost all other sensory perceptions seems by comparison to me to be fairly obvious, but I had assumed that visual recognition was a basic survival skill, provided one had functioning sight, a perception that could be developed but that was existent in a basic state innately, but apparently not! I have assumed likewise that auditory perception whilst clearly able to be developed to a high level, is basic and innate – as in we all recognise a human cry, an animal’s cry, or a threatening sound for what they are, but again perhaps that is not so either! Thanks again for your articles I read them eagerly!


Leave A Comment

Your email address will not be published. Required fields are marked *