July-August 2008 American Scientist Magazine
The Psychoacoustics of Harmony Perception [abbr; -commentary(*2)]

Centuries after three-part harmony entered Western music, research is starting to clarify why different chords sound tense or resolved, cheerful or melancholy
Norman D. Cook(*2) , Takefumi Hayashi

Sing your favorite college fight song or the United States national anthem to a suitable instrumental accompaniment, and the chances are that you will hear lots of stirring major chords. The Star-Spangled Banner is a perfect example: When you sing "Oh say, can you see?" you are singing the three notes (one of them raised an octave) of a major chord.
   Now think of a wistful, pensive song, and there is a good chance that the mood will be set by minor chords. For example, in the Beatles' Yesterday , when Paul McCartney intones "Why she had to go, I don't know, she wouldn't say," the notes "why-had-go" form a minor triad.
   Music theorists were, of course, aware of the different emotional resonance of major and minor chords long before Sir Paul wrote his opus. Jean-Philippe Rameau, the French composer and author of an influential book on harmony, wrote in 1722: "The major mode is suitable for songs of mirth and rejoicing," sometimes "tempests and furies," and sometimes "tender and gay songs," as well as "grandeur and magnificence." The minor mode, on the other hand, is suitable for "sweetness or tenderness, plaints, and mournful songs."
   The major/minor distinction entered Western music during the Renaissance era, as composers moved away from the monophonic melodies and two-part harmonies used, for instance, in Gregorian chants and embraced harmony based on three-tone chords (or triads ). Composers found that triadic harmony allowed them to tap a deeper range of emotions, of conflict and resolution. That is why, to the modern ear accustomed to chords, Gregorian chants sound curiously monotonous and emotionally flat.
   Major and minor chords remain absolutely central to Western music, as well as to non-Western traditions in which three-tone chords are not used, but short melodic sequences often imply major or minor modes. And yet the psychological effect remains unexplained. Today, this question has somehow become an embarrassment to theorists. For example, in a book on music psychology, John Sloboda makes brief reference to research indicating that the major and minor modes elicit positive and negative emotions in both adults and children as young as three years, but neglects to discuss this remarkable fact ( Exploring the Musical Mind , 2005). In David Huron's Sweet Anticipation (2006), the entire issue is relegated to a single footnote. Most theorists are adamant that the association of major keys with positive emotions, and minor keys with negative emotions, is a learned response. It is simply the "Western idiom," and pointless to explain in the same way that it is pointless to explain the conventions of English spelling or grammar.
   We believe, however, that the different emotional responses to minor and major have a biological basis. But before we venture into such controversial territory, we propose to answer a simpler question first: Why do some chords sound stable and resolved, and give a sense of musical finality, whereas other chords leave us in the air and expecting some sort of resolution?
   Psychophysical research has provided part of the answer. More than a century ago, Hermann Helmholtz identified the acoustic basis of musical dissonance . There is more going on in a triad than mere dissonance or consonance, however; some relatively consonant chords nevertheless feel unresolved. We have therefore developed an acoustical model of harmony perception that explains harmony in terms of the relative positions of three pitches. In particular, we have identified two qualities that we call tension and valence , which together explain the perception of "stability" and explain how major chords differ acoustically from minor chords. This model will give us a basis for speculating on the reasons for their different emotional connotations.

Upper Partials
The scientific explanation of music begins with the wave structure of tones. Even a single isolated tone is more complex than it appears, due to the presence of so-called upper partials (or higher harmonics). This one fact of physical acoustics was unknown to Renaissance theorists, but is easily studied today with a laptop computer and appropriate software. The effects of the upper partials underlie many of the subtler phenomena of musical harmony.

Figure 3 - Frequencies of tone

The basic pitch of an isolated tone can be described in terms of its "fundamental frequency" (denoted F0, and expressed in terms of cycles per second, or hertz [Hz]). The F0 can be illustrated as a sine wave, as in Figure 3. Associated with the F0 are several upper partials—F1, F2, F3 and so on—which are sound waves that vibrate at multiples of the fundamental frequency. For example, if the F0 is middle-C (261 Hz), then F1 is 522 Hz, F2 is 783 Hz and so on.
   Any musical sound (other than a pure sine wave) will necessarily be a combination of these partials. The number and strength of the various partials give each note its unique timbre, and make a middle-C on a piano, for example, sound different from the same note played on a saxophone. In general, the upper partials become weaker and weaker and can eventually be ignored, but at least the first five or six partials have a significant effect on our perception.
   The "upper partial story" would be easy if all of the partials were separated by octaves, but that is not the case, because pitch perception scales logarithmically. That is, although the first upper partial falls one octave higher than the fundamental frequency, further multiples of the F0 fall at gradually smaller and smaller intervals above that (Figure 3 b). Thus, if the fundamental frequency is middle-C, then F1 is an octave above middle-C (written C'). However, the next partial, F2, is between one and two octaves above middle-C, because its frequency is only 3/2 the frequency of F1. In Western music, this tone is called G'. Thus, as illustrated in Figure 3, the middle-C on a piano comprises a mixture of tones: C, C', G', C", E", and so on. This surprising fact makes the phenomenon of harmony more complex, but at the same time far more musically interesting.

Consonance and Dissonance
Like isolated tones, two-tone intervals are normally described in terms of their fundamental tones. But when a piano player strikes two notes on the keyboard, a smorgasbord of upper partials enters into the listener's ears (see Figure 3c).

Figure 4 - Chord dissonance

Beginning with Hermann Helmholtz in 1877, several generations of experimentalists have studied the perception of consonance or dissonance of different intervals. They have consistently found that normal listeners hear an "unpleasant," "grating" or "unsettled" sonority whenever two tones are one or two semitones apart. (One semitone is the interval between two adjacent notes, white or black, on the keyboard.) In addition, two tones separated by 11 semitones are also notably dissonant, despite the fact that they do not lie close to one another on a keyboard, and an interval of 6 semitones is perceived as mildly dissonant (see Figure 4a ) .
   In 1965, psychologists Reinier Plomp and Willem Levelt explained the experimental perception of dissonance by using a theoretical curve (see Figure 4b) to represent the dissonance between two pure sine waves. This curve does not explain the dissonance of large intervals such as 6 or 11 semitones. However, when Plomp and Levelt added more and more upper partials, the "total dissonance" gradually came to resemble the empirical curve very closely. As shown in Figure 4c, the model of Plomp and Levelt predicts small decreases in dissonance at or near to many of the intervals of the diatonic scales (3, 4, 5, 7, 9 and 12 semitones).
   The match between the minima of dissonance and the tones of the most common musical scales means that the spacing of the tones in scales is not an arbitrary invention. On the contrary, it is a consequence of the way that the human auditory system works, and it is no surprise to see the same intervals used in different musical cultures around the world. Some tone combinations have lesser dissonance, and music that is constructed with these less dissonant intervals is more pleasing to the human ear. Of course, the creation of "pleasant music" requires much more than simply avoiding dissonance. Indeed, some musical traditions or styles may actually encourage dissonance. Nevertheless, the amount of consonance or dissonance employed will always be an important factor in how the music is perceived.

The perception of chords—whether they are 3-tone triads, 4-tone tetrads or more complex chords and cadences—is likewise influenced by upper partials. In a triadic chord, as in a 2-tone interval, the frequencies with the greatest amplitude are usually those of the fundamentals, the three distinct notes that are written in the musical score. The upper partials usually have smaller amplitudes, but give the chord a rich feeling that we might call its overall "sonority." On rare occasions—such as in barbershop quartet singing—the upper partials may reinforce each other to such an extent that they are almost as strong as the fundamentals, and this creates the much-coveted illusion of a "fifth voice."

For simplicity, though, let us begin the discussion of triadic harmony by considering only the fundamental frequencies. The three pitches can be plotted on a "triadic grid," as shown in Figure 5, with the size of the lower interval shown on the vertical axis and the size of the higher interval on the horizontal axis. (As before, these interval widths are expressed in semitones.) For example, a major chord in "root position" has a lower interval of 4 semitones and an upper interval of 3 semitones (grid position 4–3). Any other triad in Western music can also be specified by its location on the triadic grid. Other musical cultures employ different scales, and may thus have chords that lie in the gaps of this grid. (For example, Arabic and Turkish music use a scale with 24 tones in an octave, compared to only 12 in Western music, and thus enjoy a greater variety of possible harmonies.)
   Figure 5 shows various inversions of the major and minor triads, in which one or two notes are raised by an octave. The six types of chords shown in this figure provide the harmonic framework for nearly all Western classical and popular music. The other locations on the triadic grid include many other chords of varying utility and beauty, as well as certain chords that are simply avoided in most types of music.
   The triadic grid provides a useful framework for studying how the inclusion of the upper partials affects the harmonic sonority of a 3-tone chord. This framework will enable us to address the two main questions we referred to in the introduction: Why are certain triads perceived as more or less stable, and how can we account for the commonly perceived positive and negative emotional valence of the major and minor chords?

Dissonance in Triads
Structurally, each triad contains three distinct intervals, so the obvious first step in trying to explain their sonority is to add up the dissonance of these intervals to obtain the total dissonance. Figure 6a illustrates the total dissonance of all the triads on the triadic grid, taking into account only the fundamental frequencies. The figure shows two strips of relatively strong dissonance, corresponding to triads that contain an interval of one or two semitones. An oblique view of the graph shows the dissonance even more clearly. We can see an extremely steep peak of dissonance when both intervals are one semitone in magnitude, and two high ridges of dissonance when one of the intervals is less than two semitones. The remainder of the triadic grid is a valley of consonance—and this is where all of the common triads lie.
   When we add one set of upper partials to the calculation of total dissonance, the "valley of consonance" splits into two regions (Figure 6b) . As we add more upper partials, the fine structure of the maps gradually gets more complicated, but the general pattern remains more or less the same (Figure 6c). That is, there are regions of strong dissonance (when either interval is small) and expanses of relatively strong consonance (where all of the common triads lie).
   Clearly, an explanation of harmony in general cannot rely solely on the total dissonance of triads, because such a view would imply that all of the commonly used triads have more or less the same sonority. Perceptually, that is simply not true. Major and minor chords are commonly described as stable, final and resolved. Other triads, even those that do not contain any 1- or 2-semitone intervals, are heard as tense or unresolved. A study published in 1986 by Linda Roberts, an expert in auditory perception at Bell Laboratories, showed that these perceptions were consistent among musicians and non-musicians; others have tested children and adults, and people from the West and Far East with similar results. Thus, factors other than dissonance must be involved in the sonority of a chord.
   Realizing that "sensory dissonance" can explain only so much, music psychologists such as Sloboda, Huron, David Temperley and Klaus Scherer have assumed that normal listeners become "brainwashed" to hear the major and minor chords as stable and resolved, simply because they are so frequently employed in all kinds of popular music. Because the other chords are used less often, they maintain, listeners hear them as unfamiliar, and therefore as ambiguous, unresolved and "musically dissonant," even though they are not acoustically dissonant.
   In essence, these theorists invite us to consider all aspects of music perception as being social constructs, and to believe in the overpowering influence of learning and culture. The antidote to such ideas is simply to play, for example, an augmented chord (C-E-G#) followed by a major (C-E-G) or minor (C#-E-G#) chord … and listen. Even though all of the intervals in the augmented chord are consonant, there is something inherently tense, unsettled and unstable about it—something unmistakable in its acoustical structure that even people with minimal exposure to music hear and feel. It is that common perception that music psychologists would like to explain on an acoustical basis.

Tension in Chords
If 2-tone dissonance does not completely explain the sonority of a triad, the next step is to examine the 3-tone configurations of chords. In his classic book, Emotion and Meaning in Music , psychologist Leonard Meyer suggested that the tension in certain chords arises from the equivalence of the two intervals contained within them (for example, 3–3, 4–4 or 5–5 semitone spacing). According to Meyer, when a chord or short melody of three notes contains two neighboring intervals of precisely the same size (in semitone units), the tonal focus becomes ambiguous and the music takes on an unsettled character. In other words, the listener perceives tension because it is unclear how to group the equally-spaced tones. In contrast, when a 3-tone combination has two unequal intervals and no dissonance (that is, 3–4, 4–5, 5–3, 4–3, 3–5, 5–4 semitone spacing), the listener hears stability.
   Of course, most people do not consciously think about the relative spacing or "grouping" of tones. Nevertheless, the human auditory system has evolved the ability to notice it subconsciously. In the conclusion of this article, we will discuss one possible reason why.

Inspired by Plomp and Levelt's approach to dissonance, we developed a psychophysical model of Meyer's theory by defining an abstract tension curve for triads (Figure 7) . The curve has a peak when the two intervals in the triad are equivalent. When one interval is greater than the other by at least a full semitone, breaking the symmetry, the tension drops to zero.
   The original version of Meyer's theory seems to have one flaw in it. Besides the augmented chord shown in Figure 7, many other triads also have an unsettled, tense character—for example the so-called inversions of the diminished chords. Yet these triads do not appear to satisfy Meyer's description of intervallic equivalence.

The resolution to this puzzle begins to become apparent when we bring the upper partials into consideration. In Figure 8, we have plotted the "total tension" in each chord, using the theoretical tension curve from Figure 7. When we compute the total tension using only the fundamental frequencies, as in Figure 8a, we see a ridge of high tension that corresponds to all the symmetric chords. The augmented chord (A), one of the diminished chords (d) and one of the suspended fourth chords (S) lie on this ridge; on the other hand, inversions of the diminished and suspended fourth lie in the blue valley of low tension. This figure is, in essence, a visual representation of Meyer's argument from 1956, including its apparent flaw.
   Figures 8b and 8c show how the addition of upper partials vindicates Meyer's theory. Even with only one upper partial, as in Figure 8b, all of the diminished and suspended-fourth chords lie on ridges of high tension. When we add more partials, as in Figure 8c, we find that the major and minor chords continue to lie in blue valleys of stability.

The Instability of Triads
The tension model indicates that the diminished, augmented and suspended-fourth triads have high tension—in all of their inversions and when played over one or two octaves. Thus, the total harmonic "instability" of triads is a consequence of two independent acoustical factors. The first is interval dissonance and has been acknowledged to be an important part of music perception at least since Helmholtz's experimental work in the 19th century. The second factor is triadic tension, which is explicitly a three-tone effect (Meyer, 1956).

Going one step further, we can estimate the total harmonic instability of any triad by adding together the dissonance and tension factors, while gradually including the effects of more and more of the upper partials. Figure 9 shows the results for 3-tone chords with up to four partials. Again, we see that the major and minor chords lie in regions of relative stability, in all of their inversions and when played over one or two octaves. The other, less commonly used chords lie on ridges or peaks of instability.
   This model implies that the Renaissance musicians of the 14th century were not simply the lucky inventors of a musical idiom that proved to be popular. On the contrary, they were discoverers—musicians who were sensitive to the symmetry or asymmetry in the acoustical patterns of three-tone configurations, whereas their medieval predecessors had remained enthralled by lower-level interval effects.
   Questions concerning the relative influence of intervals and chords in music are still debated passionately today, but it is clearly a misunderstanding to maintain that either effect alone explains harmony. When music employs intervals, consonance is the most important issue, and the tuning should seek the sweetest, most consonant combination of the two tones. But when music includes triads, the tuning of the chord as a chord becomes the primary perceptual event. It is then the relative spacing of the intervals, not the location of the tones relative to the tonic, which becomes of central concern. Thus, the Renaissance discoverers of harmony achieved a shift in focus—away from the "perfection" or "imperfection" of intervals and toward the symmetry or asymmetry of 3-tone configurations.

The Affective Valence of Triads
If interval dissonance and triad tension were the only factors determining the sonority of triads, we should expect that all of the major and minor chords would sound rather similar. Yet there is ample evidence that they do not. Children as young as three years old will associate pieces in a minor mode with a sad face, and pieces in a major mode with a smiling face. Casino operators fill their casinos with slot machines that play tones in C-major—hoping to create a comfortable, reassuring acoustic environment for gamblers. NBC-TV's signature three-tone cadence forms a major chord. Even the labels "major" and "minor" suggest something perceptually distinct about these two classes of chords. In English, French and Italian, the major/minor distinction suggests differences in size and strength. In German, Dur and Moll mean hard (durable) and soft (mollify).
   The emotional valence of major and minor chords can of course be suppressed and even reversed through rhythms, timbres or lyrics that tell a different story. However, if all else is held constant, major triads will be heard as "positive," whereas minor chords have a "negative" affect. That difference is one of the longest-standing puzzles of Western harmony. It is also one of the most important, because the emotions evoked by major and minor harmonies help give music its meaning. They distinguish music from the unfocused meandering of birdsong or the cacophony of a city street.
   We have seen that the relative size of the two intervals was the key to understanding triadic tension. Moreover, from a state of "intervallic equivalence" (with its inherent perceptual tension), there are only two directions of pitch movement that can reduce the tension. Either the lower interval can be made greater than the upper interval, in which case the chord resolves to a major triad, or the lower interval can be made smaller, which corresponds to a minor triad.
   This reasoning suggests that we should reformulate the tension model so that the direction of motion away from symmetry indicates the degree of "majorishness" or "minorishness" of any 3-tone chord.

Thus, in Figure 10a, we propose a modality curve to distinguish the two types of resolution. The horizontal axis shows the difference between the two intervals in the chord, in semitone units. When there is no difference, the chord is ambiguous, and its "valence score" on the vertical axis is zero. The valence score rises or falls to a maximum or minimum when the difference between the intervals is exactly 1.0 or –1.0 semitone (points a and b). The valence score falls to zero again if the difference is two or more semitones.
   As in the dissonance and tension models, considering only the fundamental frequencies leads to an overly simple picture that misclassifies certain chords. In Figures 10b and 10c, the triads composed of a 3-semitone interval and a 5-semitone interval (either 3–5 or 5–3) are located in regions that are neither orange nor blue—neither major nor minor, which contradicts what we know from musical experience.
   When we include the upper partials, however, the total valence scores are remarkably consistent with our perceptions of major and minor triads. Even with only the first set of upper partials (Figure 10d), we find peninsulas of positive (orange) and negative (blue) valence at all of the major and minor triads. The tension chords (d, A, S), on the other hand, fall in between the regions of positive or negative modality, as would be expected from traditional harmony theory.
   Thus, among the upper partials of all of the major chords, there is a predominance of triadic structures where the lower interval is one semitone larger than the upper interval. Minor chords show the opposite structural feature. The brain could, in theory, identify the major or minor nature of a chord simply by summing the valences of all the possible three-tone combinations of partials.

Now that we have a model of how listeners identify a chord as major or minor, we may take the final step and speculate as to why the acoustical valence carries an emotional valence as well.
   We contend that the emotional symbolism of major and minor chords has a biological basis. Across the animal kingdom, vocalizations with a descending pitch are used to signal social strength, aggression or dominance. Similarly, vocalizations with a rising pitch connote social weakness, defeat or submission. Of course, animals convey these messages in other ways as well, with facial expressions, body posture and so on—but all else being equal, changes in the fundamental frequency of the voice have intrinsic meaning.
   This same frequency code has been absorbed, though attenuated, in human speech patterns: A rising inflection is commonly used to denote questions, politeness or deference, whereas a falling inflection signals commands, statements or dominance. How might this translate to a musical context? If we start with a tense, ambiguous chord—for example, the augmented chord containing two 4-semitone intervals—and decrease any one of the three fundamentals by one semitone, the chord will resolve into a major key. It will then have a 5–4, 3–5, or 4–3 semitone structure. Conversely, if we resolve the ambiguous chord by raising any one of the three fundamentals by a semitone, we will obtain a minor chord. The universal emotional response to these chords stems, we believe, directly from an instinctive, pre-verbal understanding of the frequency code in nature. One of us (Cook) has explored this in more detail (see the bibliography).
   Individual tastes and musical styles vary widely. In the West, music has changed over the centuries from styles that employed predominantly the resolved major and minor chords to styles that include more and more dissonant intervals and unresolved chords. Inevitably, some composers have taken this historical trend to its logical extreme, and produced music that fanatically avoids all indications of consonance or harmonic resolution. Such surprisingly colorless "chromatic" music is intellectually interesting, but notably lacking in the ebb and flow of tension and resolution that most popular music employs, and that most listeners crave. Whatever one's own personal preferences may be for dissonance and unresolved harmonies, some kind of balance between consonance and dissonance, and between harmonic tension and resolution, seems to be essential—genre by genre, and individual by individual—to assure the emotional ups and downs that make music satisfying.
Acknowledgment (omitted)

Subject: Re: [rr] The Psychoacoustics of Harmony Perception -American 
Scientist -July/August2008
From: "Cook" 
Date: Sat, 5 Jul 2008 09:59:04 +0900
To: "Perry Bezanis" 

Dear Perry,

from the perspective of human psychology, I think you are on to [below] the 
MAIN ISSUE!... The terminology is still debated and uncertain - and worth 
spending our time on - but it is certainly higher-order relations of some kind. 
What is amazing to me is how so many of the hard-headed scientists in 
psychology are reluctant to face the higher-order topics... and want (it seems 
to me, rather desperately) to cling to quite low-level phenomena in trying to 
explain higher-level phenomena. I sympathize with that approach, in general, 
but am convinced that specifically 3-body (3-cue) effects need to be addressed -
 and not pretend that 3-body effects can be explained as the sum of multiple 
2-body effects. I think we have shown that in the realm of harmony perception, 
but there are many other topics of higher-order perception/cognition that still 
need to be addressed in a similar manner.

I will spend some time reading your homepage and get back to you!


----- Original Message ----- From: "Perry Bezanis" 
To: "Norman D. Cook" 
Cc: "Marco Iacoboni" ; "Erik Jay" ; 
"Frederick L. Coolidge" ; "Dave Scholler" 
; "Kay Stenberg" 
Sent: Saturday, July 05, 2008 6:01 AM
Subject: [rr] The Psychoacoustics of Harmony Perception -American Scientist -

> To: Norman D. Cook 
> and Takefumi Hayashi
> Others: FYI
> Your excellent article helped _immensely_ in my own arguments regarding the 
idea of relationals(* -below) and their successively higher-order role in 
evolving man's speech _and music_. As briefly as I can-
> 1 - Virtually all 'successively higher-order evolution' has been marked by 
'successively higher-order capability for response to successively higher-order 
relational properties of the environment'(*).
> 2 - Frequency and 'rhythm' then, were principals of such relationals in the 
eventual registration and 'use' of sound as generally manifest from first cold-
blooded vertebrates 'upward'.
> 3 - The successively evolving, frequency-sensitive cochlea (therefore) was 
principal to its co-evolution of 'meaning' -howls, purrs, grunts et cetera 
> 4 - The registration and 'use' of 'the higher-order relationals attaching 
birdsong' (for example) identifies 'successively higher-order response to 
relationals in _complexes_ of sound'.
> 5 - In hominid evolution then -marked by the evolution of distinctly human 
_deliberative capability_(*), it was only a matter of time (for music) before 
'successively higher-order response to relationals in sound' was manifest in 
being able to 'hold a note' -and other individuals holding different notes too 
at the same time!
> 5 - These 'still-higher-order relationals' then -discordant or otherwise, are 
precisely the half-tones, triads et cetera you identify in your excellent 
> Response would be nice :-)
> Perry Bezanis
> San Pedro CA
> ~~~~~~~~~~~~
> * - _relational_ (noun): a second-or-higher-order property which qualifies in 
a generally comparative way the relationship of a primitive or primary property 
common to two (or more) 'elements' of the configuration space: (eg) left-
right/up-down/front-back/ness or in/outside-ness of one thing with respect to 
another: difference/sameness, more/less-ness, absence (vs presence) of 
material/body, force, color, speed, _sound_, taste, smell, texture, dry/wetness 
et cetera; at 'higher levels of vertebrate development', geometric pattern or 
shape, repetition, temperament in 'anger', 'attention' et cetera, and time: now-
 then- and next-ness for example.
> ~~~~~~~~~~~~
> 'response to successively higher-order relationals' and 'deliberative 
capability' are discussed as two of the properties _kernel_ to human evolution 
in the _short_ essay-
>  Human Nature and Continuing Human Existence
> -best read from hard copy and 'on-line' at the same time for more detailed