Gary Tomlinson is John Hay Whitney Professor of Music & the Humanities and director of the Whitney Humanities Center at Yale University. He is the author of A Million Years of Music: The Emergence of Human Modernity.
Damon Krukowski: Something I love about A Million Years of Music is this idea of deep time. How did you move from studies of Monteverdi and opera to prehistory?
Gary Tomlinson: There are a couple of ways this happened. One is that it’s a return to my past, because, though I’ve been a musician from my childhood, I went to university thinking I was going to become a biochemist and spent my first three years working toward a biochemistry major. Then I came under the influence of a wonderful music teacher. I was playing in an orchestra and ensembles—mostly classical, with a little bit of acoustic rock and roll on the side. And suddenly I said, “Why am I in science when what I really want to be doing is thinking about music?” And so I went off to graduate school in musicology at UC Berkeley.
My interest in music history also was always anthropological in a general sense. It was the placement of music in culture, and in cultures of the past, that fascinated me, and I approached other cultures of the past in some ways like an anthropological fieldworker. And my sense of that anthropological purchase was not just to place music in a context but to understand how music helps to make the context that it’s a part of, so that there’s a real mutuality and reciprocal kind of interaction; I never saw those as separate things. The anthropological stuff took me off toward social theory and poststructuralist theory and cultural theories of various sorts. And it gradually turned toward Foucauldian work. The trajectory for me was a smooth one, in a way—even though my books seem to be on very different subjects: from Monteverdi as a part of the context of late Renaissance Italian culture, through opera as a manifestation of fundamentally shifting conceptions of the voice and its powers over four hundred years (in Metaphysical Song), to Aztec and Inca song (in The Singing of the New World)—an attempt to understand the really different ways in which cultures can come to appreciate the powers of music and voice. And the next stretch was in a way just leaping back and saying, “Well, I always was interested in evolutionary theory—how the hell did humans come to be armed with the capacities to do all these things in the first place?” So that’s the short answer. [Laughs]
DK: That makes a lot of sense—it’s an embedding of the voice in our culture and in our technology. What I also got from A Million Years of Music was this idea of the voice in our language making and in our music making as complementary but also quite distinct. A line near the end of your book, “the musical absences at the heart of language,” is so beautiful to me. I know instinctually that song is different than speech, but it’s a funny thing—it’s not something that we really spend a lot of time formulating. You investigate how that distinction relates to our existence as humans and to our technology.
GT: I look at music and language in their deep histories, reaching back to a point before there was any music or language in their modern forms. So we’re talking, say, a 500,000-year stretch, perhaps all the way back to Homo heidelbergensis. I see the antecedents of these things falling into place along parallel tracks that overlap one another but are not the same track, and I follow the parallelism and the distinctness of those tracks from a very deep period. Which is to say that what we are left with as human beings in the world today, as the product of those tracks, is in fact a set of overlapping yet distinct capacities, functions, and capabilities in dealing with our world and our environment and in our social interactions with each other.
And so these things are loaded into both language and music in very complex but different ways, in my view. I borrow from Robbins Burling and other anthropologists the notion of gesture calls and the paralinguistic things in our communication that don’t have to do exactly with syntax or grammar but instead have to do with the expressive halo around syntax and grammar that is absolutely essential to linguistic expression. Linguistics professors these days, in the wake of fifty years of Chomskyanism, tend to focus on syntax and grammar, and they often leave out these paralinguistic aspects in their thinking about the evolution of language. If you do that, of course you’re going to leave out exactly the overlapping part that is so important, the overlap with the parallel track that would coalesce eventually into musicking. And so then you miss the places in modern human capabilities that are fossils of that parallel evolution, the biocultural evolution that I’m trying to lay out in the book.
Whenever people think about the origins of music, they stack it up against language. Automatically they start with, “Well, what’s its relationship with the origin of language?” And to pry those two things apart was of course a very important agenda in my book, because when they’re put together either music is made to piggyback on language as something subservient to the origin of language, something that came along as a result of language—this is Steven Pinker’s view of music as “auditory cheesecake”—or else music is made into a romanticized, ur-emotional language from which we finally came to speak propositional notions, while the heart of music remained something emotional. I think both of those views of music are wrong, I think they’re incomplete, I think they’re silly in some ways. Music the language of emotions and language the language of propositions—this is so drastic a simplification of what we do as humans with both music and language.
DK: That prying apart makes me think of what’s going on technologically right now, because of the perceptual coding of speech and of music—particularly of speech, for example to make this technology we’re using together right now (we’re speaking over FaceTime) to encode our language, send it over the internet, and make it perceivable at the other end of this great distance. One of the ways I understand that engineers have accomplished that is by prying apart the musical aspects of language from what we might call the aspects of it as just purely speech. But the nonverbal qualities of our voices tend to be lost in that coding because they’re not considered crucial. So my words are reaching you and your words are reaching me. But whether the fullness of our voices are reaching each other is a question. Does that relate to your sense of the relationship between music and speech? The musical aspects of speech are very deep for us—it’s how we read all kinds of gestures into one another’s language.
GT: There is no question that that’s right. Linguists, when they do turn to talk itself rather than the structure of language, turn away from syntax and grammar and think about other things—they even talk about the melodies of speech, the tunes of speech. They immediately rely on some simple musical terms in order to turn that way.
And yet I want to go back to something else that you were asking about, about the newest technologies and even earlier voice-encoding technologies, all the way back to Alexander Graham Bell. There’s so much of speech left out of these. It’s interesting to think about what is absent from what gets reproduced across these technological systems, of which we have so many these days.
So one side of that is to wonder about what’s left out. But the other side is the miraculous way in which human beings fill it in. I’m hearing you across FaceTime. There’s a lot that is captured of what you are saying and how you say it across the technology. There’s no doubt other stuff that isn’t there. And yet I’m projecting into it somehow. I’m hearing it, I’m reconstructing you in a way, as you reconstruct me. That’s an extraordinary capacity. Even if you go all the way back to primitive telephone technology, we were always able to recognize individual voices almost instantaneously, even though we were getting a small portion of what those voices would give us if we were standing face-to-face with each other.
DK: This projective idea is fascinating to me because it’s one of the problems I think we’re facing now in digital communications—we’re always having to reconstruct, to some degree, because there are these gaps in the information. Yet we’re still trained to hear the fullness of voices, whether genetically or evolutionarily. Also just from practice: we’re sung to by our mothers and fathers, and we learn a voice in a very full way long before we hit this technology.
Another thing that struck me in your book is your use of the terms analog and digital—including the idea that digital dates back, what, 100,000 years? [Laughs]
GT: I talk about Neanderthal digitalization. That’s a slightly whimsical use of the term, but of course when I’m talking about analog, digital, way back there, I am in some ways talking about technology. Because what are digitalized in a sense in Neanderthal technology are the operational sequences whereby complex tools were constructed—sequences that fall into discrete parts that need to be put together in certain ways or else they’re not going to get the tool they’re hoping for. This is true in an even simpler way when you think about composite tools like a spear with a hafted stone point on it; the fact that it is made of discrete parts is what I refer to as Neanderthal digitalization. The importance of the analog-digital distinction is that it helps us think about how human cognition was changing and evolving so as to be able to master these hierarchically organized sequences of operations, for example in the making of a tool.
It also helps us to think about hierarchically ordered cognition that fell into place at some point, for example in things like discrete-pitch perception in music. The sensing of discrete pitches seems to me one of the basic human capacities that stands behind human musicking in general. It doesn’t have to be there in all musicking—some musicking is not pitched in that sense. But when musicking is pitched, this capacity comes online, it is accessed by the music, and we perceive according to discrete pitches. There’s a lot of evidence that this perception of discrete pitch doesn’t extend very far beyond humans in the world today. There are many animals that seem close to us, like chimpanzees and gorillas, but don’t seem to perceive discrete pitch in anything like the way we do. So how did such a perception—again, I’m thinking of it as a perception that comes online in each human lifetime automatically as a partly genetic and partly socialized and cultural phenomenon, just as language is partly genetic and partly socialized and cultural—how did such capacity come to be a universal thing? This is a fundamental question. And for me it has to do in some ways with the constraints of analog vocal expression or analog auditory perception. A part of our auditory perception takes the octave as a fundamental interval and then begins to parse it into smaller intervals, all of which seem to be related to one another in simple integer ratios, and so on. How did that all come about? That’s what I use the analog and discrete terminology to try to get at.
DK: Is it too much of an exaggeration to say that your point is that language is essentially analog and music is digital?
GT: That’s not quite right. Because the musical aspects of language, the prosody, the tunes of my speech—those are analog. They are not pitched in the same way that music is pitched. That’s true even for those many languages across the world that are tonal languages; they use tonality and discrete pitches in a very different way than music does. However, there’s the other aspect of language, the Chomskyan aspect, the combinatorial aspect of words and morphemes and phonemes and so on. That’s the digital aspect of language. Language is both digital and analog. Music has both its analog and digital sides also, but they are different digital and analog sides from those of language.
DK: So tell me about the different sides of music. What is music’s analog component?
GT: For example, when we hear a melody, we hear larger forms of coherence to that melody, whether it’s the four-bar phrases of a Mozartian melody or something much more complex or something much simpler. We not only hear that melody in two ways—and they’re automatic ways; this does not take a musical education, this takes a musical socialization but not a musical education for us to grow into—we hear on the one hand phrase structures, the ups and downs of the general shape of a melodic phrase. And on the other hand we hear the discrete pitches. The first of those is an analog aspect, and it’s very closely related to the tunes of the sentences that I’m saying, to the prosody, to the ups and downs of my phrases. The general shapes of melodies and music, the way we parse those and the way we cognize those, are very closely related to that aspect of language. Both of them, it seems to me, are analog. But for the discrete-pitch aspect of music and the metrical aspects of music—you as a drummer of course are a specialist in this sort of thing—there are digital elements as well. And these digital elements have to do with our ability to take in complex auditory stimuli and break them down; this is how we create in our minds discrete pitch. Keep in mind that there’s no such thing as discrete pitch in the world. This is a percept that human brains make out of auditory stimuli that come to us from the world. And we make it on the basis of a digital cognition of certain phenomena that we pick out of the auditory stimuli that come to us.
DK: So the combinatorial capacity or skills that we develop is the digital aspect in both language and in music.
GT: Exactly right. Even some of the most basic aspects of musicking are also in some overlapping way involved a little bit in language. We don’t stop very often to think about how complicated it is that we perceive pitch at all. Why is it that certain things come to us as noise and other things come to us as pitch? This is not, again, something that is simply given by the world. This is something that our brains do. We perceive the auditory stimuli that come to us with their overtones arrayed in simple integer ratios as something different from those that come to us with their overtones in a complete jumble of very complex ratios. And the first of those, the simple-integer ratios, is what we call pitch. That’s how we perceive it. It’s a real phenomenon in the world, but the notion of a brain perceiving it as different from another slightly different phenomenon in the world—that’s something that all human brains do. And it’s absolutely miraculous and fascinating to think about how we came to do that. Of course that’s part of the story of A Million Years of Music.
DK: As you mentioned, I’m a drummer, and one of the other wonderful things that I took from A Million Years of Music is how—you know, everyone in music makes fun of drummers. [GT laughs] They’re supposed to be the stupidest in the band. There are lots of jokes…
GT: Well, listen, you’re not alone in this—everybody in orchestras, they all make fun of oboists. I don’t know quite why.
DK: But you make this amazing point about beat-based processing and entrainment, that tasks and toolmaking actually follow or are developed together with this ability of ours to follow a beat. We can organize our instructions to one another socially through this basic skill of being able to follow one another in rhythm.
GT: When I go back a million years, I’m going back probably 900,000 years before anything like modern human musicking was being made by Homo sapiens. The reason I go back so far is because to lay the groundwork, to understand some of the perceptual capacities that hominin brains eventually came to have, I think we need to go back that far. And one of the oldest is the capacity for humans to come together in ways that are organized synchronously at least in some loose sense, alternating back and forth even in the way that you and I are alternating back and forth with answers and questions, the way human discourse does. But the only traces that we have of such things from the Middle Paleolithic period are the kinds of teaching and imitation that we think would’ve been necessary in order for them to pass along traditions of toolmaking. So that’s why I talk a lot about toolmaking in those early chapters in the book.
But entrainment, then, is a focusing of that back-and-forth exchange, and human entrainment is really quite focused. It’s another amazing capacity: it is the ability to follow, to mark, the regularity of aural stimuli that come to us, whether they’re pulses generated by some beat processor or whether they’re much more complex beats that we imagine as the grid behind most of the music that we listen to, the grid of meter. Most of the music we hear, and the default position for musicking across the world, tends to be metrical in some way, and that meter is precisely a measure of our brains’ capacity to take very complex stimuli and understand that there is a grid of regularity somehow governing them. How we came to do that and what the brain does in doing that is wonderful to think about. It is also deeply mysterious still.
There’s growing evidence that our beat-based processing, the ability to follow a beat, has to do again with simple integer ratios—maybe something as simple as neural networks in which some of the neurons are firing at twice the rate of other neurons, some are firing at four times the rate of other neurons, and so on. It could be an emergent phenomenon from a neural substrate. But we don’t understand it at all well. What we do know about human entrainment is that it’s another of those capacities that seems not to go very far out beyond humans in the world today. I’m the last person to diminish the immense and fascinating complexity of the things that animals do and the complexity of animals’ communication. But the ability of animals to follow a beat is hugely limited when we get past humans. There’s no sign of it in chimpanzees or gorillas, our closest relatives living in the world today; even with some complex prompting, they tend not to do it. There’s certainly no sign that they do it in the wild. A few years ago there was a white cockatoo named Snowball that would dance to the Beastie Boys on YouTube. [DK laughs] And it seemed pretty compelling that this particular bird—birds do amazing things of all sorts, of course—seemed to be entraining to the beat of the Beastie Boys. Well, maybe. But the next question is: Why this particular bird? Do other birds do it? Not very generally, as far as we can make out, and this bird was an elaborately domesticated bird. Many kinds of birds are intensely creative, imitative creatures, and domesticated birds can do all kinds of interesting things; Winston Churchill’s parrot used to swear at everybody who came into his office.
The point of all this is that entrainment is another of those capacities that we all grow into automatically. When people come and tell me that they are simply nonmusical or tone-deaf, I always laugh at them and say, “No, you’re not. That’s a notion that’s been passed on that you decided to believe, but it’s just not true.” There are very few human beings who are not musical beings. It’s just that in our culture we tend to think of musicians only as those people who are elaborately trained in music rather than as people who are simply socialized into being the musical creatures that we all are. But when they come to me and say they can’t follow a beat, I say, “No, unless there’s a basic incapacity of some sort that we’re not thinking about, you’re probably wrong.”
DK: What I also understand about entrainment from the book is that this is really a solution to the question, “How did humans without language, pre-language, pass on information that they developed—like how to make a tool?”
GT: It’s not so much that entrainment was absolutely necessary—and certainly not entrainment in any focused form—for these early toolmaking cultures. It’s not that, I think. It’s that the toolmaking cultures could have been passed on only through complex mimesis and imitation, and as the mimesis and imitation of gesture became a fundamental part of the cultures that were being passed on, the entrainment it involved came to be selected for, finally becoming a part of the genome shaped in the back and forth between culture and biology, the biocultural evolution of hominins. What eventually came about, long down the road, then, was the capacity of our brains to focus the synchronization that we enact in looser ways all the time in our social interactions with other human beings. And it’s the focusing of that ability into something that’s much more precise that beat-based following of music is all about.
DK: How did Van Morrison end up singing “That’s Entrainment”? That’s just such a funny thing.
GT: Isn’t that something? I only heard about that Van Morrison song after I had written A Million Years of Music; somebody put me onto it once when I gave a lecture. I’m not sure where Van Morrison came across the idea, but he’s been into interesting things for fifty years now, after all. [Laughs]
DK: Exactly—blowing apart another myth about musicians: they do read.
GT: [Laughs] Yeah, indeed.
DK: Dylan too. He comes up with the most obscure references sometimes—he’s obviously traveling around the world with a huge library of some kind.
GT: Dylan is always fascinating in this regard, and he was sopping up stuff even before he made it to New York City in 1961—it was unbelievable. You see it in all those early songs, all the lyrics; they’re filled with stuff that you wouldn’t have expected him to have quite absorbed the way he did.
DK: Maybe it’s entrainment—a sort of preternatural ability [laughs] to entrain.
GT: Well, there’s certainly something that’s preternatural about Bob Dylan.
DK: So let me ask you about the term indexing and its importance in the distinction between language and music. Is it that indexing is musical and symbolic thinking is linguistic?
GT: The index, the symbol, and the icon make up the famous triumvirate of different kinds of signs. They’re terms that were given to us by Charles Sanders Peirce back in the late nineteenth century and are basic terms for the kind of semiotic or sign-based understanding of things that I bring to a good deal of my work. My new book, which is going to be out next spring from Chicago, is called Culture and the Course of Human Evolution. It’s not specifically about music but works at a broader mapping of the model that I began to develop in A Million Years of Music for the last 200,000 years of our coming to be modern humans, and there’s a lot of Peircean semiotics in it.
Let’s just talk about index and symbol. Index is a sign that has a causal relation with its object, points to its object, or is proximate to its object. The most famous index of all is smoke as an index of fire: where there’s smoke, there’s fire. The smoke is caused by the fire; there’s a causal relationship. Smoke, then, is an index of a fire even if we don’t know where the fire is—we haven’t seen it. Smoke is an index that points in that indexical way to fire. A symbol is a different kind of sign, and it’s a more complicated one. Symbols are basic to all human language in the world today. Words are symbols. Symbols function as symbols not only by virtue of pointing to things that they symbolize but by virtue of their relationship to one another in sets of symbols, in systems of symbols. The words in human language are such a set of symbols, and they only function fully as symbols—to point even to the things that they name or objects and actions that they are signs of—by virtue of the complexities of their relationships to other words in the language. Symbols are absolutely associated with language; they are not associated with music. I once had somebody yell at me at the end of a lecture because I had said such a thing: “What do you mean there’s no symbolism in music? Of course there’s symbolism in music!” So let me stand back and say, yes, there’s no human musicking that is not surrounded by symbols, because humans are deeply symbolic creatures. We can’t help but surround all the activities we do with symbols.
But when we get down to the way in which music itself functions from note to note, from measure to measure, from moment to moment, and the way we process music, it is fundamentally not about symbols. It’s about the relationship of indexes with one another. It is about what this tonal gesture, or this metrical moment, or this rhythmic gesture leads us to expect. It’s about a sign pointing toward something that might be in fact forthcoming. Or it might not be—we might be denied it, and that sense of expectation not fulfilled is hugely important in musicking. People like to talk about music as a language, but it’s not a language that works with symbols; it is a language that works in highly elaborated, hugely complex ways with indexes.
DK: That explains so much to me about why people find music abstract. To go back to people saying, “Well, I’m not musical”—sometimes they’re saying, “I can’t appreciate the abstraction of music.” And yet this is a human skill. That’s what you argue in your book so convincingly. We can’t exist without that kind of abstract thinking. Our language is based on symbolic thinking, but then there’s this indexical thinking that is based on a different kind of intellectualizing of what we’re hearing. And that relates, as I understand it in your argument, to entrainment, and also to the somatic versus the semiotic. We’re being led along a path where we start to expect what follows next without anybody saying, “This means this, this means that.”
GT: That’s a beautiful way of putting it, Damon. It’s exactly what is happening to us in music. We’re led along a path. Sometimes the path doesn’t take the turn that we were led to expect it would. This is of course what makes music so exciting: the sense not only of expectation fulfilled, the great satisfaction of that, but of expectation not fulfilled. But all of those moments—from the tiniest moments, note to note, out to the biggest moments, such as an expectation of how a thirty-minute movement in a Mahler symphony is going to end—from the smallest to the largest, those are indexical aspects. They are not symbolic aspects. Now don’t get me wrong, language is both symbolic and highly indexically complicated—all those things we were talking about before, the paralinguistic tunes of language, the prosody, and all the things that are not about Chomskyan grammar and syntax: those are indexical aspects. But what differentiates language from music is a whole symbolic array that is fundamental to it. Music gets cast in the middle of human symbol systems but in and of itself has no symbolic array. This is why music seems so abstract. Think of music that is not associated with language, instrumental music of one sort or another. This seems to many people who try to think about it—and even to musicologists and music theorists who spend their lives analyzing and thinking about it—an immensely abstract thing. The reason it seems abstract is because all of our analysis and discussion of it is taking place in a symbolic medium that is distinct from its own medium, which is an indexical medium. And that distinction—between language and music, again—makes it seem abstract.
Explore Music, the Fall 2017 issue of Lapham’s Quarterly.