false
Catalog
2018 AANS Annual Scientific Meeting
613. The Roles of Rhythm and Prediction in Speech ...
613. The Roles of Rhythm and Prediction in Speech Perception
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Our next paper is the Roles of Rhythm and Prediction in Speech Perception to be presented by Dr. Nitin Tandon and discussed by Dr. Daniel Yosher. Good morning. I'd like to thank the Scientific Program Committee for giving us an opportunity to present this work. I'm interested in understanding how language works in the brain. And one important part of language is how, when I communicate, how does your brain know what I'm going to say, anticipate it, and then make sure that what I actually said matches your expectation. So neural computations, we know, are not just passive stimulus-driven responses, but indeed the final outcome is what drives their behavior. But where is this behavior embodied in the neural networks observing speech perception and comprehension? And are these mechanisms similar for acoustic rhythms and for speech perception per se? We know that both of these types of signals are rhythmic. So here, for example, is a piece of music that's particularly full of rhythm. This is Rachmaninoff's Piano Concerto No. 1, the first movement of the first theme. And as you can see here, the individual melodies are like syllables in words. The harmony is reflected by the words themselves. And the phrases of the music could be thought of as sentences in speech. This is actually played by Kiefer Forseth, who is a co-author on this work. So how does the brain allow us to predict what rhythm will follow next? And this concept, this understanding of how this might happen, is something that we refer to as entrainment, which is synchronization between an outside, external body, environmentally-driven input, and what happens in the brain in response to this. So such entrainment would allow us to pay attention to what we want to. So for example, you can attend to me rather than to your neighbor who wants to have another conversation. You can reduce your reaction times by knowing what I'm going to say next. And in an environment where, for example, you have poor cellular signal, you can still understand mostly what I'm saying, far better than a machine can, for example. So our task here is to try to evaluate how this happens in the brain. How long does it take before I start to say something or an oscillation from the outside entrains your brain to what to expect next? And how then does this entrainment continue even after the stimulus is over? And then perhaps most importantly, are these generalizable brain concepts? So to evaluate this, we studied a group of 31 patients who were given a rhythmic white noise task. White noise just sounds like, as it sounds, an oscillation of just noise. It's like this. And so very quickly, the brain will entrain to something like this, and we won't understand where that happens. We make sure that patients are attending to this stimulus by having them detect tones that are presented well after the oscillation is over and respond to those tones by the press of a button. So we use patients in whom electrodes are implanted. Most of our patients now undergo SCEG for localization of seizures when the seizure focus is not obvious by imaging. This goes back, of course, to the work of John Taylorack, but the distinction between how John Taylorack did his work with orthogonal placement of electrodes driven by arteriography and neo-stereo-electroencephalography, which is a term that we like to use for how we do this today in North America, is that this is no longer orthogonal. This is very much azimuth-based, and it's open to the idea of very three-dimensional implantations and guided really by imaging as much as by the semiology of the seizures. So we use a robot to place these probes. This is an example of a single individual with the electrodes implanted, and as you can see, they follow various different trajectories and are quite oblique compared to the traditional orthogonal implantations. So this is the study population with 31 individuals, most of whom had SCEG, a few had GRIDs, and they had a total of 3,500 electrodes implanted in language-dominant cortex. They underwent about 2,000 trials of rhythmic white noise, like you heard, and then also about 2,000 trials of natural human language perception. So this is the representation of all 3,000 implanted electrodes. Many of these electrodes are involved in seizures. They're noisy. They spike. So we exclude that from this analysis. We want to have as clean physiological data as possible. It leaves us with about 2,000 or so good electrodes, which are represented here in this coverage map. As you can see, the temporal lobe makes most of the seizures in our population. That's where the electrodes end up, and that then allows us to study language. So this is just a little pipeline of how the ECoG gets processed. It gets transformed into an analytical signal that allows us to extract high power in the gamma band between 60 and 120 hertz. And this gamma oscillation or gamma frequency is taught by most people to reflect local brain processes, local processes that reflect what is happening in that neuronal population. The lower frequency signals, between 2 and 15 hertz in our case, reflect what is happening in the region and how inter-regional synchronization is occurring. And so these two components of the signal give us related but slightly different pieces of information. This is just a set of nine patients showing these oblique implantations in the superior temporal gyrus, which are very useful also for localizing where language lives in the STG and serve a major clinical role in identifying how much of the superior temporal gyrus can be resected. This is an example of a single individual, which shows that you have very quickly, after the initial peak of the onset of the first wave of this oscillation, an oscillation that follows rhythm very, very closely with almost no perceptible lag, suggesting that very quickly after the first wave of information comes in, the brain already expects the same rhythm to follow. That entrainment occurs, in this case, in less than 300 milliseconds after the onset of the initial stimulus. This is a distribution of all of the electrodes in a single individual from front to back, and it makes another point here that the electrodes in the back have this very nice onset response, much more prominent than the electrodes in the front, and the electrodes in the middle have this entrainment response, much more prominent than the electrodes in the front or in the back. Looking at the low frequency activity in the same electrode, you see that there is now a high amount of coherence that forms just at 300 milliseconds and persists throughout the duration of the stimulus. This is a representation of coherence for all of the channels, and obviously this particular electrode and its neighbors are the best at modeling the input signal. So here is a representation of all of the patients with all of the brain activity in this superior temporal gyrus. As you can see, there are these waves of activation that spread from the medial to the lateral temporal surface and from back to front. We represent this here as onset electrodes in the planum temporale and entrainment electrodes in Heschel's gyrus more anterior to the onset electrodes. This can also be done in a completely unsupervised machine-based way where we just give all of the electrodes to our classifier. It arranges the onset electrodes much more posteriorly by about five centimeters compared to the more entrained electrodes that are more anterior in the superior temporal plane. This is true both for the low frequency phase organization as well as for the high frequency power organization. What about prediction? It turns out that when the stimulus ends here at three seconds, you still see this phase locking, persisting, even though at this point you're listening to just white noise. The brain, even though it does not receive an oscillation, still expects it to occur. This is what we characterize as prediction or lasting entrainment. What happens when we represent all of these patients together is here are the onset electrodes, here are the entrained electrodes, and here are the electrodes that have both responses. You can see this rostro-caudal organization quite clearly. The same is true for natural language. As I said, the same individuals were also presented with language stimuli. These are the stimuli, an example of the stimulus with the black line tracing the acoustic envelope and the acoustic edge being reflected in red. This is the onset response here and the entrainment response depicted here, showing that the same electrodes as the onset electrodes and the entrained electrodes in the rhythmic oscillation are also the language oscillation tasks electrodes. This is summarized again in another figure. To summarize, the entrainment process drives a set of cascaded neural oscillations for speech perception and multi-scale acoustic information is encoded by these varied cortical oscillations. I'd like to thank our funding sources and, of course, Kiefer and Greg Hickok, my collaborators in this project. Thank you for your time. Thank you. Thank you for the opportunity to comment on this work. So these images show the location of human auditory cortex on the dorsal surface of the temporal lobe buried within the sylveon fissure, and using an approach pioneered by Matt Howard and others, Forseth and Tan did recorded from electrodes in auditory responsive areas while presenting subjects with auditory stimuli. In this presentation, the investigators focused on a specific property of sound, its rhythm. Rhythm is important because it can enhance the perception of sounds and because it is hardwired in our brains. Auditory cortex is organized along two dimensions, tonotopy and periototopy. Most of us are familiar with the tonotopic organization that begins in the cochlea and is carried forward to auditory cortex, where neurons are organized according to the frequency of sound to which they respond best, ranging from low frequency to high frequency. In the center of the map is where the low frequency sounds are encoded, and this progresses radially to high frequency at its periphery. But a second dimension in auditory cortex is periototopy, or the rhythm of sound, which can be slow, ch-ch-ch, or fast, ch-ch-ch-ch. In their study, Forseth and Tan did use intracranial electrodes to measure ECOG responses to two distinct rhythmic auditory stimuli, a three hertz white noise stimulus and spoken sentences. Forseth and Tan then found sites with selective responses to both the simple rhythmic sounds and to the spoken sentences, which of course contain a rhythm themselves, and this is consistent with the periototopic organization that is known to exist in auditory cortex as in characterizing cats and in humans with fMRI. These responses have two components that are uniquely revealed by ECOG using the superior temporal resolution of ECOG, and that is a broadband high frequency response, characterized by an increase in power, and a low frequency response, characterized by phase reset at onset of the auditory stimulus. This work is important because our understanding of auditory cortex and language processing remains rudimentary, and using ECOG with its superior temporal resolution to measure responses to both noise and language stimuli provides valuable insights into the organization and function of auditory cortex, and Tan then has shown that low frequency phase reset and gamma power modulation in early auditory cortex may play significant roles in speech perception. Thank you.
Video Summary
In this video, Dr. Nitin Tandon discusses the roles of rhythm and prediction in speech perception. He explains that language relies on the brain's ability to anticipate and match expectations of what will be said. This concept is referred to as entrainment, which involves synchronization between external stimuli and the brain's response. Dr. Tandon conducted a study with 31 patients who participated in a rhythmic white noise task. Using electrodes implanted in their brains, he observed how the brain quickly entrained to the rhythm and continued to do so even after the stimulus ended. The study found that specific areas in the superior temporal gyrus were involved in this entrainment and played a role in speech perception. Dr. Daniel Yosher provides commentary on the research, emphasizing the importance of understanding auditory cortex and language processing.
Asset Caption
Nitin Tandon, MD, FAANS, Discussant - Daniel Yoshor, MD, FAANS
Keywords
speech perception
rhythm
prediction
entrainment
superior temporal gyrus
×
Please select your language
1
English