Gestalt theory: Visual and Sonic Gestalt

Alex McLean, MSc Arts Computing

1st November 2005

Introduction

Gestalt theory puts forward the primacy of the whole in perception, not as an accumulation of perceptions of its parts but in something more. Our perceptions are subconsciously constructed towards the end of creating a perception as stable, or as "good" as possible. Our conscious mind is then presented with this organised percept of the whole.

Gestalt theory is weak when it attempts to justify itself through physiological explanation. We simply do not yet know enough about the brain to explain perception in this way, and so to get at the truth we isolate figures which evoke Gestalt percepts, that is percepts showing features not in the parts but in our perceptual construction of the whole.

The focus of Gestalt theory has largely been upon visual phenomena, but good research is being done on sonic and musical Gestalt theory too. The essay will explore the principles of Gestalt theory using both visual and aural examples. It will then look at some of the applications of Gestalt theory within the study of music, and conclude with an analysis of some of the criticisms of the theory.

Commonalities between Visual and Sonic Gestalt

From one perspective, visual and sonic perception could not be more different, one is spatial perception, the other temporal. However it is easy to see how each contains elements of the other - visual perception also changes over time, when we look at moving or changing forms, even when we see a static image our eyes move across it in meaningful patterns. Further, our two ears allow us to detect distance and direction, and our musical sensibilities perceive movement in a space defined by such dimensions as timbre, pitch, duration, distortion, resonance and so on.

Gjerdingen applies the Grossberg-Rudd neural model of apparent movement in vision to music with favourable results. He goes on to highlight striking analogs between visual and aural perception such as luminance/amplitude and colour/timbre. But yet the overall experience of seeing and hearing seem to be such different experiences. Gjerdingen offers a compelling explanation for this, that while high-level cognition of vision and sound may be analogously weak, their low-level neural processes show striking similarities. That is, even though sound and light are very different mediums, the brain may process them in very similar ways.

With this low-level commonality in mind, I have explored Gestalt principles by constructing both visual and audio examples in the following section.

Sonic vs Visual Gestalt - examples

The following examples aim to illustrate, in a straightforward manner, presence of gestalt principles across visual and aural perception. Note that the audio examples are of my own devising, and have not met with rigorous scientific testing. The principles exemplified are based on those seen in lectures by Professor Leymarie, Goldsmiths College, 2005.

When viewed with a java-enabled browser, the visual examples will be clickable to reveal or highlight a gestalt effect. They were made using the processing java environment. The audio examples were rendered using Perl, in some cases using my feedback.pl environment triggering synthesis by my datadirt software. Sourcecode for all examples is linked.


figure 1. Proximity

proximity
Static version (java unavailable)
Click the image to move the circles together
sourcecode
proximity
Static version (java unavailable)

Proximity / Contiguity

In the audio example we hear identical sounds played, but group them into sets of four and three sounds because of their proximity on the musical surface of the time line. This is analogous to the grouping of visual properties by proximity, for example in the two dimensional space of figure 1.

audio: ( ogg | mp3 | sourcecode )

figure 2. Similarity

similarity
Static version (java unavailable)
Click the image to move the circles together and cause grouping principle conflict
sourcecode
similarity
Static version (java unavailable)

Similarity

This audio sample demonstrates how we group sounds by similarity. The sounds are the same distance apart temporally, but are pitched differently. We see an analogous effect using shading in figure 2.

audio: ( ogg | mp3 | sourcecode )

Proximity and similarity in conflict

In the following audio sample you can hear differences in pitch suggesting groupings in conflict with those suggested by difference in temporal distance. You can see the visual equivalent effect by clicking upon figure 2.

audio: ( ogg | mp3 | sourcecode)

Gestalt principles lead us towards certain grouping judgements, but as we have just seen and heard, they often conflict. This leads to an uncertain judgement, or in some extreme cases an unstable one, where we flip between two judgements (as in figure 6). However, usually one principle overrules the other, and these relative strengths may be measured through controlled testing. Some of the tests performed in the study of Lerdahl & Jackendoff's Grouping Preference Rules (GPR) are explored later in this essay.

figure 3. Closure

triangle
Static version (java unavailable)
Click the image to remove the circles and the super-imposed triangle (and the phantom shading around it) goes too.
sourcecode
triangle
Static version (java unavailable)

Closure / Good continuation

figure 4. Continuation

continuation
Static version (java unavailable)
Click the image to reveal alternate interpretation
sourcecode
continuation
Static version (java unavailable)

This audio example contains a number of evenly spaced sounds of fixed duration, with silence in between. However, together the sounds are likely to be perceived as a continuous movement. The effect is still present when listening on closed-backed headphones, which rules out environmental reverb as a factor in joining the sounds.

audio: ( ogg | mp3 | sourcecode )

Figures 3 and 4 show equivalent visual effects, showing a distinction between closure where lines are joined together to find the simplest form, and continuation where we prefer the simplest path when perceiving lines. After clicking on figure 3 we realise that the perceived white triangle is invisible, and clicking on figure 4 reveals an alternate, less preferable grouping of the lines than we would normally see.

Figure 3 is a visual illusion that fools us into perceiving what is not there. So strong are the clues to a super-imposed white triangle that we are likely to see a phantom, illusionary shading around it. Equivalent sonic illusions exist too; Diana Deutsch researches in the area of sonic and musical illusion, finding arrangements that listeners tend to perceive as different from their reality. For example, her "octave illusion" causes people to hear different tones in different ears. What is heard differs from person to person but rarely reflects the actual pitch and panning of the sounds.


figure 5. Area

area
Static version (java unavailable)
Click the image to change the relative areas
sourcecode
area
Static version (java unavailable)

Area / smallness

Figure 5 show that area has a role to play in perception, we tend to perceive the smaller area as figure and the bigger one as ground. Figure being an object, ground being the background on which it is placed.

I attempt to reproduce the visual effect of figure 5 in audio by contrasting bursts of white noise with bursts of silence (if you can have a burst of silence). I chose this contrast because being devoid of any features, white noise is structurally similar to silence. Indeed in many situations the brain interprets noise as silence, for example the background hiss of a radio or tape recording. [1]

The duration of the noise and silence is varied as an analog to varying area in figure 5. I suggest it is easier to interpret the shorter silence as the figure against the ground of the noise in audio example one, and the shorter noise as figure against the ground of silence in example two.

Example 1:
audio: ( ogg | mp3 | sourcecode )
Example 2:
audio: ( ogg | mp3 | sourcecode )

figure 6. Figure-ground

figure_ground
Static version (java unavailable)
Click the image to switch colours between vase and faces
sourcecode
figure_ground
Static version (java unavailable)

Figure and ground

We have already encountered figure and ground in the previous example. Such interrelationships and overlaps are found between many of the Gestalt principles. Indeed, area could also have been used to illustrate the similarity rule, commonly sized shapes being more likely to be perceived as a group.

Figure 6 shows an unstable figure-ground relationship. We can either see two symmetrical faces or a vase, but probably not both at the same time. For the audio example, I chose a different technique to explore figure and ground relationships; polyrhythm.

audio: ( ogg | mp3 | sourcecode )

This polyrhythm has four repeating sounds, each spaced differently. One sound plays every third time unit, one every fourth, one every tenth and one every fourteenth. This creates a pattern that repeats every 840th time unit, but it would seem likely that we would perceive a looping sooner than that. I perceive it repeating every twelfth unit. This perceived loop point could be considered the ground over which the other features create combinations as varying figures. Our choice of ground and figure could be a product of features of the sounds themselves such as volume and attack time, as well as the mathematical interactions between the different spacings.

Musical Gestalt

Western musical theory considers Gestalt Theory in two main areas; grouping/segmentation and expectation, which I shall explore separately.

Grouping and Segmentation

In "A Generative Theory of Tonal Music", Fred Lerdahl and Ray Jackendoff define sets of rules for analysing music. For the purposes of this essay, we focus on the grouping rules, which are separated into two sets, first the five Grouping Well-Formedness Rules (GWFR) and second, the seven Grouping Preference Rules (GPR). The GWFR simply define some structural constraints, simply put:

So what we are left with is a simple tree structure, where groups contain groups, the biggest group being a whole musical piece, and the smallest group being a note or drum beat that cannot be partitioned further without going beyond the limits of classical musical notation. Between those lie groupings such as motives, themes, phrases, periods, theme-groups and sections.

Their GPR are the rules for how these groups might be chosen. They name seven, which I summarise;

The relative strengths of their rules are largely unspecified and untested by Lerdahl & Jackendoff, but they explicitly invite others to research further. Here follows an exploration of some of those who accepted this invitation.

Frankland & Cohen identified the GPR 2 proximity rule and the GPR 3 change rule as being "local grouping rules" most fundamental to low level grouping structure within the Generative Theory of Tonal Music. They tested and quantified the Rest and Attack Point aspects of GPR 2 and the Register and Length aspects of GPR 3. The results broadly supported the overall theory, but found the Attack Point rule to be by far the strongest of those they measured, and suggested use of the other rules to be tempered in light of this.

Though his study of the Local Boundary Detection Model (LBDM), Emilios Cambouropoulos extends GPR 3 to suggest that change is not always on grouping boundaries, but between the two events that proceed a boundary. In experiments with expert performances of the work of Chopin, he found that while less than 50% of notes on boundaries were lengthened, 92% of notes proceeding them were lengthened. These figures could be arrived at because Cambouropoulos chose a 20 bar section of a piece where boundaries were unambiguous and uncontentious.

Cambouropoulos proposes one hypothesis that might explain this tendency for the penultimate note in a group to be emphasised - "When a note IOI is long in relation to its surrounding notes, further lengthening should be quite significant in absolute terms for it to be perceptible whereas a much smaller lengthening of a preceding short note)delay of long note) is more effective." An interesting idea that requires further research and testing.

The tests of Deliège [1987] backed up an interesting assertion by Lerdahl & Jackendoff that groupings are more easily identified by more experienced listeners, which Deliège clarified as trained musicians. However she found non-musicians also mostly made segmentations in line with the rules, and so concludes that these grouping preference rules may be applied broadly.

Deliège and Frankland & Cohen were in agreement in suggesting that the GPR are incomplete. The results of Deliège's tests suggest a need for a rules based on changes in harmony, and she goes on to suggest additional rules might be based on sound density.


Expectancy

The other major area of influence of Gestalt over music theory is on expectation.

Wertheimer and Koffka pointed out that perception is not merely a product of the environment. Although our influence over what we perceive is largely unconscious, it is nonetheless active. What we have perceived before has influence over what we will see again. Supporting evidence is popularly found with Dalmatian as photographed by R. C. James (figure 7).

figure 7. Veridical expectation

Dalmation, by R. C. James
Dalmatian, by R. C. James

This helps explain expectation as a cultural phenomenon. We learn to expect things based on what we have experienced before. It is much easier to parse the dalmatian image having already seen it, even when many years have passed in the meantime - our memory in some way tells us what to expect. But what happens when that expectation is unfulfilled?

figure 8. Schematic expectation

No Dalmation, originally by R. C. James, scrambled by Alex McLean
No Dalmatian, originally by R. C. James, scrambled by Alex McLean

Figure 8 shows an attempt to scramble many of the identifying elements that form the percept of the dalmatian. Having seen her so many times, we expect to do so again in this altered image. We quickly realise that she has gone, but nonetheless the expectation of her presence remains in her place. [2].

This phenomenon is observable in the world of music too. Even in a well-known and loved piece, unfulfilled expectation remains an important feature of the music. How can we continue to expect something that we know is simply not there?

The answer may be found by dividing expectation into two types, schematic and veridical. Schematic expectation is based on cultural experiences of the world of music and veridical expectation is based on previous listens of the piece in question. Schematic expectation is what we are particularly interested in here, an automatic expectation based on cultural experience of music in general, an unsuppressable expectation even when we know it will go unfulfilled. A certain sound is still implied by the sounds that preceded it, even though we know from previous listens that that it will not occur.

What kind of expectancies arise in music? Conventional knowledge in western music says that a skip, or extended pause, gives expectancy of a reversal, or change in direction. This effect is attributed by Eugene Narmour (1989) to two principles; registral return and registral direction.

According to Narmour's first principle of registral return, the listener expects an interval to be opposite in direction, but similar in size to the one that preceded it. Narmour attributes this expectation to the Gestalt rule of similarity.

Narmour's second principle of registral direction is governed by the size of intervals between notes. It predicts that a small interval creates the expectation of motion in the same direction, and a longer interval creates expectation of a change in direction. These expectations can be attributed to the Gestalt rules of good continuation and symmetry respectively.

Bharucha (1994) also wrote of his research in expectation in music, but took a different approach to testing the theory. Instead of using human subjects, he constructed artificial neural networks. He puts forward the idea that if you can get such a model to demonstrate learning of expectations, you can suggest that human senses might follow a similar model.

Bharucha's approach seems frustrating on one level - whatever the results, it only allows a starting point for suggesting a theory. We do not know a great deal about the brain, but can be sure that artificial neural networks are not an accurate model of its functioning. So Bharucha does not prove or disprove anything beyond the success of an artificial model. Nonetheless, Bharucha relates that his network is indeed able to learn a musical schema and exhibit schematic expectation, as long as we do not take the neural network too literally, the results are of interest to us.


Conclusion

Like any theory, Gestalt has its critics. It is often argued that the large number of Gestalt principles are so flexible that they could fit any pattern of test results. Pomerantz (1986) argues that "the laws ... are distinctly disorganized ... there is considerable variation in the names the laws are given, their description, how they are grouped, and how many of them there are." He also points out that the number of gestalt laws vary from between 114 to a single, all encompassing law. However, Pomerantz then goes on to propose a "neo-Gestalt" psychology that addresses many of these concerns, with good reference to the spirit of the original Gestalt claims. So his argument is not with the spirit of Gestalt theory, but with its original implementation.

With this kind of criticism in mind it is certainly worth being careful while researching around this subject. The tendency should be towards scientific analysis, using Gestalt theory as a starting point towards a more rigorous corpus of rules. Any vagueness in the theory can therefore be countered with testing, measuring and refinement. Part of this process may involve de-emphasising parts of the theory. As Narmour suggests with reference to Pomerantz, perhaps we should focus on the bottom-up Gestalt principles such as similarity and proximity, principles that are "measurable, formalizable and thus open to empirical testing."

However to discard principles simply because they're unformalisable would be a mistake. In the practice of composing, formal considerations may only be a starting point, and subjective judgements may become central. Paul Klee (tr. 1953), a painter who is known to have followed Gestalt theory puts it this way; "Already at the very beginning of the productive act, shortly after the initial motion to create, occurs the first counter motion, the initial movement of receptivity. This means: the creator controls whether what he has produced so far is good." This implies that between creative acts on their work-in-progress, the artist constructs the "goodness." Klee's choice of the word control here is perhaps revealing, it implies that in the midst of a creative act, responsibility for defining "good" is taken away from theory and placed in the hands of the artist. If this is the case, that would explain why Gestalt principles can be vague, the detail of the theory is remade during the construction of every artwork.

Whatever Klee meant, it's clear that we shouldn't attempt to over-constrain artistic practice, and likewise shouldn't encourage wild, ungrounded scientific research. But then, perhaps Gestalt theory as a whole can provide language to allow both sides to meet, converse and converge.

Notes

  1. According to John Cage, true silence is impossible to achieve. Even when sitting in an anechoic chamber, Cage heard sounds, which he attributed to the low pitched sound of blood rushing through his veins and the high-pitched sound of his nervous system (perhaps that was just tinnitus, though).
  2. Although it is perhaps interesting to note that having stared at the scrambled image for a while, it seems to become easier to not see the dalmatian in the original image.

Bibliography