Overview

연구 배경: 청각은 단순한 압력 변화 감지를 넘어 외부 세계의 내부 모델을 구축하는 능동적 처리 과정이다. 인간 청각계는 약 0.00002 Pa에서 100 Pa 이상의 광범위한 압력 변화를 검출하며, 익숙한 멜로디나 목소리를 다양한 음향 환경에서 항상성(constancy) 있게 인식한다. 시각계와 달리 청각계는 시간 정보(빠른 주파수 변화, 모스 부호의 점·선)에 정교하게 조율되어 있으나, 공간 위치 추정에서는 시각보다 열등하다.

핵심 방법론:

달팽이관(cochlea): 액체로 채워진 내이 구조로, 기저막(basilar membrane)의 부위별 기계적 특성 차이를 이용해 주파수를 공간적으로 분리한다(von Bekesy, 1960).

단일 세포 기록: 핵심 영역(core)→띠 영역(belt)→띠주위 영역(parabelt)으로 갈수록 더 복잡한 음향 자질에 반응하는 위계적 처리 구조를 규명한다(Kaas et al., 1999).

fMRI sparse scanning (Hall et al., 1999): MRI 소음(약 130 dB)을 우회하기 위해 자극을 무음 구간에 제시한다.

불일치 부적성(MMN): ERP 성분을 활용해 주의 없이도 발생하는 청각적 변별 능력을 측정한다(Näätänen et al., 1978).

주요 기여:

무엇/어디 경로 이중 분리(Rauschecker & Tian, 2000): 앞쪽 띠 영역은 원숭이 발성(무엇)에, 뒤쪽 띠 영역은 공간 위치(어디)에 선택적으로 반응한다. 인간에서는 등쪽 경로가 추가로 운동/모방을 위한 “어떻게(how)” 경로로 분기될 가능성이 제시되었다(Hickok & Poeppel, 2004).

음악 인지 모델(Peretz & Coltheart, 2003): 음높이 조직(contour→interval→tonal encoding)과 시간 조직(rhythm, meter)을 분리된 처리 단계로 제안한다.

운동 이론(motor theory) 재조명: 거울 뉴런(mirror neurons) 발견 이후 음성 지각이 조음 운동 표상을 통해 매개된다는 가설이 부활했다(Liberman & Mattingly, 1985; Rizzolatti & Craighero, 2004).

실험 결과:

신경세포 수: 와우핵 9만 개, 내측슬상핵 50만 개, 청각피질 1억 개(Worden, 1971) — 상위 영역으로 갈수록 정보 처리 용량이 비약적으로 확대된다.

선천성 음치(congenital amusia): 인구의 약 4%에서 나타나며, 우반구 청각피질 및 하전두회의 백질·회백질 밀도 이상과 연관된다(Ayotte et al., 2002; Hyde et al., 2007).

McGurk 착시: 청각 “ba” + 시각 “ga” → 지각 “da”; STS 영역의 TMS 자극이 착시 감수성을 일시적으로 감소시킨다(Beauchamp et al., 2010).

음운적 범주 지각(Eimas, 1963): 0–80 ms voice onset time 연속체에서 중간값(예: 30 ms)도 “da” 또는 “ta” 중 하나로 범주적으로 지각된다.

의의 및 한계:

청각 처리는 단순 감각이 아니라 기억(MMN)·주의(칵테일 파티)·운동(McGurk, motor theory)과 긴밀히 결합된 다영역 협업 과정임을 보여준다.

음악의 진화적 기능은 짝짓기(Darwin, 1871), 사회적 결속(Huron, 2001), 언어 선구(Mithen, 2005), 또는 단순한 “청각 치즈케이크”(Pinker, 1997) 등 경쟁 가설이 병존하며 결론이 미정이다.

운동 이론의 강한 형태(motor 영역 손상 → 지각 심각 손상)는 환자 데이터로 지지받지 못하며, 운동 표상은 음향 신호가 모호할 때에만 보조적 역할을 한다는 약한 형태가 유력하다.

📋 목차

대단원 구조

Chapter 10 The hearing brain — 도입: 청각의 능동적 모델 구축, 시청각 비교
The Nature of Sound — 소리의 물리적·심리적 속성
From Ear to Brain — 외이·중이·내이·청각 경로 해부학
Basic Processing of Auditory Information — 청각 정보의 기본 처리
- 4.1 Feature processing in the auditory cortex — 청각피질의 특징 처리
- 4.2 “What” versus “where” — 무엇/어디 경로
- 4.3 Auditory memory and auditory stream segregation — 청각 기억과 흐름 분리
Music Perception — 음악 지각
- 5.1 Memory for tunes — 멜로디 기억
- 5.2 Rhythm — 리듬
- 5.3 Pitch — 음높이와 선천성 음치
- 5.4 Melody and musical syntax — 멜로디와 음악적 통사
- 5.5 Timbre — 음색
- 5.6 Music and emotion — 음악과 정서
Voice Perception — 목소리 지각
Speech Perception — 음성 지각
- 7.1 The nature of the speech signal — 음성 신호의 본질
- 7.2 Categorical perception — 범주적 지각과 McGurk 착시
- 7.3 The motor theory of speech perception — 음성 지각의 운동 이론
- 7.4 Auditory ventral and dorsal routes for “what” and “how” — 청각 복측·등측 경로

Chapter 10 The hearing brain

Summary

청각의 목적은 외부 세계의 문자적 재현이 아니라 해석·행동 가능한 내부 모델 구축이다. 인간 청각계는 0.00002 Pa부터 100 Pa 이상까지 방대한 압력 변화를 감지하며, 익숙한 곡을 다른 조(key)로 들어도, 익숙한 목소리를 전화·확성기 너머에서도 동일하게 인식하는 항상성(constancy) 추출 능력을 갖는다. 청각은 시간 정보에 정교하게 조율된 반면 공간 위치 추정에서는 시각에 비해 열등하다.

Sound originates from the motion or vibration of an object; for example, the vibration of the vocal chords, the plucking of a violin string, or the passing of an overhead aircraft. This manifests itself in the surrounding medium, normally air, as changes in pressure in which molecules are alternately squeezed together and stretched apart. The human auditory system is capable of detecting a huge range of changes in air pressure, from around 0.00002 to more than 100 Pascals. However, the role of the hearing brain is not merely to detect such changes. As with vision and other perceptual systems, the goal of hearing is not to create a literal depiction of the outside world, but rather to construct an internal model of the world that can be interpreted and acted upon.

This model is constructed not only from ongoing sensory information but also from previous sensory experiences. The hearing brain is also concerned with extracting “constancy” out of an infinitely varying array of sensory input. For example, we recognize a familiar tune when presented in a different key and we can recognize a familiar voice in a wide range of acoustic environments. If one is listening to a familiar song, such as The Rolling Stone’s “Satisfaction,” but there are gaps of 2–5 sec in the song, then auditory cortical areas are more active during the gaps relative to unfamiliar songs (Kraemer et al., 2005). Our musical and lyrical knowledge can fill in silent gaps in heard songs.

📊 그림 설명

시끄러운 환경에서 동시에 도달하는 여러 음원을 청각피질이 별개의 “흐름(streams)“으로 조직하는 과정을 보여주는 도식이다. 기타·발소리·개 짖음이 단일 귀로 입력되지만, 뇌는 들어오는 감각 정보와 학습된 지식(멜로디·목소리 음높이 범위)을 모두 활용해 음원 수와 정체를 추론한다. 청각 장면 분석의 핵심 원리를 시각화한 다이어그램이다.

One difference that exists between the auditory and visual senses is their sensitivity to temporal and spatial information. The auditory system is exquisitely tuned to detect temporal information, such as rapid changes in frequency that characterize certain speech sounds. The different time intervals associated with “dots” and “dashes” in Morse Code are much easier to process when heard than seen (Saenz & Koch, 2008). In contrast, it is generally much easier to locate an object in space with vision than with hearing (Bertelson & Aschersleben, 1998).

Key Terms

Pure tones

Sounds with a sinusoid waveform (when pressure change is plotted against time).

**순음(Pure tones)**은 압력 변화를 시간 축에 그렸을 때 정현파(사인파) 형태를 보이는 가장 단순한 소리이다. 일상에서는 거의 들리지 않으나 청각 실험의 통제 자극으로 널리 쓰이며, 한 가지 주파수와 한 가지 진폭만을 갖는다.

Pitch

The perceived property of sounds that enables them to be ordered from low to high.

**음높이(Pitch)**는 소리를 낮은 음에서 높은 음으로 순서지을 수 있게 해주는 심리적 속성이다. 물리적 주파수(Hz)와 밀접하지만 동일하지 않으며, 동일 주파수 음도 강도에 따라 음높이 지각이 달라진다(Stevens, 1935).

Loudness

The perceived intensity of the sound.

**음량(Loudness)**은 소리 강도의 주관적 지각이다. 음파의 진폭(물리량)과 연결되지만 분리 가능하며, 음높이와 마찬가지로 뇌가 완전히 독립적으로 처리하지 않는다. 시각의 색-파장 관계와 유사한 비대칭이 청각의 음높이-주파수 관계에도 존재한다.

주의

물리적 속성(주파수·강도)과 심리적 속성(음높이·음량)을 혼동하지 말 것. 시각에서 파장(물리)과 색(심리)이 분리되듯, 청각에서도 주파수와 음높이는 보통 강하게 연관되지만 동일하지 않다. 후이미지(after-image)에서 파장 없이 색을 보거나, 뇌색맹증(cerebral achromatopsia)에서 색 없이 파장을 처리하듯, 청각에서도 분리 가능한 사례가 존재한다.

The Nature of Sound

Summary

순음은 정현파 단일 주파수의 단순 음이지만 실제 환경에는 드물고, 대부분의 자연음은 여러 주파수의 정현파가 중첩된 복합음이다. 피아노의 220 Hz 음은 220, 440, 660 Hz 등의 정현파로 분해 가능하며, 최저 성분인 **기본 주파수(f₀)**가 지각된 음높이를 결정한다. 기본 주파수를 제거해도 음높이가 유지되는 **결손 기본음 현상(missing fundamental phenomenon)**은 청각이 단순한 주파수 검출이 아닌 능동적 추론임을 보여준다.

One of the simplest sounds has a sinusoid waveform and these sounds are termed pure tones. Pure tones have a characteristic pitch that is related to the frequency of the sound wave (measured in Hertz, i.e. vibrations per second). The human auditory system responds to sound frequencies between 20 Hz and 20,000 Hz. The intensity of the sound is related to the subjective experience of loudness. Pitch and loudness are regarded as psychological features of sounds, whereas frequency and intensity are physical properties.

In everyday life, pure tones are seldom heard. However, many sounds can be described in terms of combinations of superimposed sinusoids of different frequencies, intensities and phases. For example, musical notes typically contain a series of regularly spaced sinusoids. Thus, a piano note of 220 Hz can be described in terms of sinusoids at 220 Hz, 440 Hz, 660 Hz, and so on. The lowest component (in this example 220 Hz), termed the fundamental frequency (f₀), typically determines the perceived pitch of a musical note. However, if the fundamental frequency is missing from the series, then the pitch is still perceived as equivalent to 220 Hz. This is termed the missing fundamental phenomenon and is an example of pitch constancy.

📊 그림 설명

위쪽은 단일 정현파(순음)의 시간-압력 그래프와 단일 주파수 표시이다. 아래쪽은 자연음(악기 음)이 서로 다른 주파수의 정현파 다발로 구성되며, 지각된 음높이가 최저 주파수(기본 주파수, f₀)와 연결됨을 보여준다. 결손 기본음 현상의 물리적 토대를 시각화한다.

Key Terms

Fundamental frequency

The lowest frequency component of a complex sound that determines the perceived pitch.

**기본 주파수(Fundamental frequency, f₀)**는 복합음을 구성하는 정현파 중 가장 낮은 주파수 성분이며, 지각된 음높이를 결정하는 핵심 변수이다. 피아노 220 Hz 음의 경우 f₀ = 220 Hz이고, 그 정수배(440, 660 Hz)는 배음(harmonics)을 형성한다.

Missing fundamental phenomenon

If the fundamental frequency of a complex sound is removed, then the pitch is not perceived to change (the brain reinstates it).

**결손 기본음 현상(Missing fundamental phenomenon)**은 기본 주파수 성분이 제거된 복합음(예: 440, 660, 880 Hz)도 원래의 220 Hz 음높이로 지각되는 현상이다. 뇌가 배음 구조에서 누락된 기본음을 능동적으로 재구성함을 보여주며, **음높이 항상성(pitch constancy)**의 대표 사례이다.

Timbre

The perceptual quality of a sound enables us to distinguish between different musical instruments.

**음색(Timbre)**은 같은 음높이·음량의 음을 첼로와 색소폰에서 구분 가능하게 하는 지각적 자질이다. 음의 시간적 전개(공격·감쇠)와 각 주파수 성분의 상대 강도가 음색을 결정하며, 우반구 측두엽 손상 시 음높이 지각과 별개로 손상될 수 있다(Samson & Zatorre, 1994).

From Ear to Brain

Summary

귀-뇌 경로는 외이·중이·내이의 세 부분을 거쳐 4–5개의 시냅스를 통해 청각피질에 도달한다. 외이의 귓바퀴(pinna)는 음 위치 추정에 기여하고, 중이의 망치·모루·등자뼈는 공기 진동을 액체 진동으로 변환한다. 내이의 달팽이관은 기저막의 부위별 기계적 특성(난원창 쪽은 좁고 단단해 고주파, 중심부는 넓고 탄력적이어서 저주파)을 통해 주파수를 공간적으로 분리한다. 청각 경로의 신경세포 수는 와우핵 9만 개, 내측슬상핵 50만 개, 청각피질 1억 개로 상위로 갈수록 폭발적으로 증가한다(Worden, 1971).

The ear contains three main parts: the outer, middle, and inner ear. The outer ear contains the pinna and the auditory canal. The middle ear converts airborne vibrations to liquid-borne vibrations with minimal loss of energy. A series of three tiny bones (malleus, incus, and stapes) transfers the mechanical pressure on the eardrum to a smaller membrane, called the oval window, in the fluid-filled cochlea. The cochlea converts liquid-borne sound into neural impulses. A membrane within the cochlea, termed the basilar membrane, contains tiny hair cells linked to receptors. The basilar membrane is not uniform but has different mechanical properties at either end (von Bekesy, 1960). The end nearest the oval window is narrower and stiffer, and shows a maximal deflection to high-frequency sounds. The end nearest the center of its spiral shape is wider and more elastic and shows a maximal deflection to low frequency sounds.

There are four or five synapses in the auditory pathway from the ear to the brain, starting with projections from the auditory nerve to the cochlear nuclei in the brainstem, and ending with projections from the medial geniculate nucleus to the primary auditory cortex, also called A1. The primary auditory cortex is located in Heschl’s gyrus in the temporal lobes and is surrounded by adjacent secondary auditory cortical areas called the belt and parabelt regions (Kaas et al., 1999). Damage to the primary auditory cortex does not produce complete deafness but does lead to problems in identifying and locating sounds (Musiek et al., 2007). For example, while the cochlear nucleus has 90,000 neurons, the medial geniculate nucleus has 500,000 and the auditory cortex has 100,000,000 (Worden, 1971). In addition, there are descending, top-down, pathways that go as far back as the cochlea itself (Rasmussen, 1953).

📊 그림 설명

외이·중이·내이 구조 단면도. 귓바퀴(Pinna)와 외이도(External auditory canal)가 외이를, 망치·모루·등자뼈(Malleus·Incus·Stapes)와 고막(Ear drum)이 중이를, 달팽이관(Cochlea)·반고리관(Semicircular canals)·청각신경(Auditory nerve)이 내이를 구성한다. 등자뼈가 난원창(Oval window)을 진동시켜 달팽이관 내 액체로 에너지를 전달하는 구조이다.

📊 그림 설명

청각 상행 경로 도식: 청각 신경 → 배측·복측 와우핵(cochlear nucleus) → 상올리브핵(superior olivary nucleus) → 하구(inferior colliculus) → 내측슬상핵(medial geniculate nucleus) → 청각피질. 양측 청각피질로 모두 투사되어 양이(binaural) 정보가 통합된다. 단순 전달이 아니라 능동적인 정보 추출 단계임이 강조된다.

Key Terms

Cochlea

Part of the inner ear that converts liquid-borne sound into neural impulses.

**달팽이관(Cochlea)**은 액체로 채워진 나선형 내이 구조로, 음파의 기계적 에너지를 신경 임펄스로 변환한다. 내부의 기저막과 유모세포가 핵심 변환 장치이며, 와우핵·상올리브핵·하구·내측슬상핵을 거쳐 청각피질로 정보를 전달한다.

Basilar membrane

A membrane within the cochlea containing tiny hair cells linked to neural receptors.

**기저막(Basilar membrane)**은 달팽이관 내부에서 유모세포가 부착된 막이다. 난원창 쪽 끝은 좁고 단단해 고주파에 최대 진폭으로 반응하고, 중심부 쪽 끝은 넓고 탄력적이어서 저주파에 반응한다. 부위별 기계적 차이가 주파수의 공간적 분리(tonotopy)의 물리적 기초를 형성한다.

Primary auditory cortex

The main cortical area to receive auditory-based thalamic input.

**일차 청각피질(Primary auditory cortex, A1)**은 측두엽 Heschl 회에 위치하며, 내측슬상핵에서 시상-피질 입력을 직접 받는 핵심(core) 영역이다. 손상되어도 완전 난청은 발생하지 않으나 소리의 동정과 위치 추정에 결함이 생긴다.

Belt region

Part of secondary auditory cortex, with many projections from primary auditory cortex.

**띠 영역(Belt region)**은 핵심 영역을 둘러싸는 이차 청각피질로, 핵심 영역에서 다수의 투사를 받는다. 단순 주파수보다 복잡한 음향 자질(원숭이 발성, 음운 시작 같은 갑작스러운 주파수 변화)에 강하게 반응한다(Rauschecker et al., 1995).

Parabelt region

Part of secondary auditory cortex, receiving projections from the adjacent belt region.

**띠주위 영역(Parabelt region)**은 띠 영역에서 투사를 받는 더 상위의 청각피질이며, 청각 위계의 세 번째 단계를 이룬다. 복잡한 청각 객체와 의미 처리의 시작점이 된다.

Tonotopic organization

The principle that sounds close to each other in frequency are represented by neurons that are spatially close to each other in the brain.

**음위상 조직(Tonotopic organization)**은 인접한 주파수가 인접한 신경세포로 표상된다는 원리이다. 기저막의 부위별 주파수 선택성이 청신경(Kiang et al., 1965)을 거쳐 일차 청각피질(Formisano et al., 2003; Merzenich et al., 1973)까지 보존되며, 시각계의 망막위상 조직(retinotopy)에 대응된다.

Sparse scanning

In fMRI, a short break in scanning to enable sounds to be presented in relative silence.

**희박 스캐닝(Sparse scanning)**은 MRI 스캐너 소음(약 130 dB, 제트엔진 이륙과 유사)이 청각 자극을 가리는 문제를 해결하기 위한 fMRI 기법이다(Hall et al., 1999). 스캐닝을 잠시 멈춰 무음 배경에서 자극을 제시한 뒤 다시 스캔하며, 약 6초 후 정점에 도달하는 혈역학 반응의 느린 시간 상수를 활용한다.

시험 팁

Core / Belt / Parabelt 위계를 외울 때: C → B → P는 “Center → Border → Periphery”로 외워도 좋다. 핵심부는 좁은 주파수(예: 200 Hz)에 반응, 띠는 넓은 대역(200–300 Hz)에 반응, 띠주위는 더 복잡한 청각 객체에 반응. 시각 단순세포·복합세포 위계와 평행 관계로 이해하면 기억이 쉬워진다.

Basic Processing of Auditory Information

Summary

청각 정보의 기본 처리는 청각 자질(음높이·음량·시간 패턴) 추출에서 청각 객체 형성으로 진행되며, 자극 내용(음성·음악·환경음)과 과제 맥락(이해·식별·위치 추정)에 따라 동원되는 영역이 달라진다. 핵심부는 단순 주파수, 띠 영역은 복잡 자질, 띠주위는 객체 수준 정보를 처리하는 위계적 구조이다.

Beyond the early auditory cortical areas, there are many other routes and regions of the brain involved in auditory processing. The precise network of regions used depends on the stimulus content (e.g. human speech, voices, music, environmental noises) and the current context (e.g. whether one needs to understand speech, identify a speaker or locate a sound source).

Feature processing in the auditory cortex

Summary

청각피질의 자질 처리는 음높이·음량·공간을 부호화하는 다수의 뉴런 집단을 통해 이루어진다. 핵심부는 좁은 주파수(예: 200 Hz 단일음), 띠 영역은 더 넓은 대역(200–300 Hz 잡음)에 반응하며 위계적 통합이 일어난다(Kosaki et al., 1997). 결손 기본음으로 동일한 지각 음높이를 만드는 자극이 실제 주파수가 아닌 지각된 음높이에 반응하는 영역이 일차 청각피질 외부에 존재한다(Bendor & Wang, 2005).

Just as visual perception involves the processing of different features (color, shape, movement, texture), so too does auditory perception, although the features differ (e.g. pitch, loudness, tempo). As with vision, there is some evidence of hierarchical processing of auditory feature information such that earlier cortical regions (e.g. the “core” region containing the primary auditory cortex) codes for more simple features and later cortical regions (e.g. the belt and parabelt) codes more complex information.

Single-cell recordings in primates show that the neurons in the core region respond to narrowly defined frequencies (e.g. responding maximally to a pure tone of 200 Hz), whereas cells in the belt region respond to a broader band of frequencies (e.g. responding to noise between 200 Hz and 300 Hz; Kosaki et al., 1997). More recently, cells have been documented in primary auditory cortex that possess something akin to center-surround properties (Tian et al., 2013). Neurons in the belt region will also respond to other more complex tones, such as vocalizations, more vigorously than with pure tones (Rauschecker et al., 1995). Indeed some neurons do not respond to fixed frequencies but only to changes in frequency and even the direction of change of frequency (Kajikawa et al., 2008; Whitfield & Evans, 1965).

Clarey et al. (1994) recorded from neurons in the cat primary auditory cortex using noise bursts but varying loudness and sound location. More than a third of neurons respond to particular loudness levels and particular locations; for example, a neuron may produce a maximal response both if the sound is between 30 and 50 dB and if it is located between 20 and 40 degrees on a particular side of space.

”What” versus “where”

Summary

무엇/어디 이중 경로가 청각피질에도 존재한다. 앞쪽 띠 영역은 원숭이 발성(내용)에 선택적으로 반응하고 뒤쪽 띠 영역은 공간 위치에 선택적으로 반응한다(Rauschecker & Tian, 2000). 앞쪽 경로는 측두엽을 따라 음원 식별을, 뒤쪽 경로는 두정엽을 따라 음원 위치 추정을 담당한다. 인간에서는 등쪽 경로가 추가로 운동 표상과 연결되는 “어떻게(how)” 경로로 분기될 수 있다(Isenberg et al., 2012). 음원 위치 추정은 (1) 양이 시간차·강도차와 (2) **머리·귓바퀴에 의한 음파 왜곡(HRTF)**이라는 두 단서로 해결된다.

Within the auditory cortical areas, there is some degree of specialization for “what” versus “where.” That is, some neurons/regions are relatively specialized for coding the content of the sound, and other neurons/regions are relatively specialized for coding where the sound is coming from. Rauschecker and Tian (2000) found that neural responses in the anterior belt region showed a high degree of specialization for monkey calls (irrespective of their location), whereas the posterior belt region showed greatest spatial selectivity. They speculated that this may form the starting point for two routes: a dorsal route involving the parietal lobes that is concerned with locating sounds, and a ventral route along the temporal lobes concerned with identifying sounds. Functional imaging evidence from humans is largely consistent with this view (Barrett & Hall, 2006).

There are two broad solutions for identifying where a sound is located. Inter-aural differences: If a sound is lateralized it will tend to arrive at one ear before the other (inter-aural time difference) and will be less intense at the farthest ear because it lies in the “shadow” of the head (inter-aural intensity difference). Frequency-selective neurons in the core and belt regions adjust their responsiveness according to these inter-aural differences (Brugge & Merzenich, 1973). Distortions of the sound wave by the head and pinnae: Batteau (1967) placed microphones into the “ear canal” of casts of actual pinnae while playing sounds to these artificial ears from different locations. When participants listen to these recordings using headphones, they are able to localize the sounds. They cannot do so if the recordings were taken without the artificial ears attached. Performance is improved if sounds are recorded from participants’ own ear shapes rather than a generic ear (Wenzel et al., 1993). The brain develops an internal model of how sounds get distorted by the unique shape of one’s own ears and head (called a head-related transfer function, HRTF). Griffiths and Warren (2002) propose that a region called the planum temporale, lying posterior to the primary auditory cortex, is involved in integrating the sensory input with the learned head-related transfer function for different parts of space.

📊 그림 설명

좌측에서 음원이 도달할 때, 좌측 귀가 먼저 (양이 시간차) 그리고 더 크게 (양이 강도차) 수신한다는 도식이다. 위치 단서로 활용되는 두 가지 양이 정보를 단순화해 보여준다.

📊 그림 설명

머리 관련 전달함수(HRTF)의 작동 원리: 초기 음 패턴이 양쪽 귀와 머리 모양에 의해 왜곡된 채 청각피질로 도달하고, planum temporale에서 학습된 내부 모델(HRTF)과 입력 신호를 비교해 음원 위치를 추론한다. 양이 차이가 좌우 위치만 제공하는 반면 HRTF는 상하 위치까지 결정한다.

Key Terms

Head-related transfer function (HRTF)

An internal model of how sounds get distorted by the unique shape of one’s own ears and head.

**머리 관련 전달함수(HRTF)**는 자신의 귀·머리 형상이 음파를 어떻게 왜곡하는지에 대한 학습된 내부 모델이다. 양이 시간차·강도차가 좌우 위치만 제공하는 반면, HRTF는 상하 방향까지 위치 추정을 가능하게 한다(Batteau, 1967).

Planum temporale

A part of auditory cortex (posterior to primary auditory cortex) that integrates auditory information with non-auditory information, for example to enable sounds to be separated in space.

**측두평면(Planum temporale)**은 일차 청각피질의 후방에 위치하며, 청각 입력을 학습된 HRTF와 통합해 음원 위치를 산출한다. 헤드폰으로 듣는 “내부 음”보다 공간상 외부에 위치하는 것으로 지각되는 음에 더 강한 반응을 보인다(Hunter et al., 2003).

시험 팁

What vs Where 경로: 시각의 ventral(무엇)–dorsal(어디) 분리가 청각에도 적용된다. 복측(“what”) = 측두엽으로 진행 → 음원 식별. 등측(“where”) = 두정엽으로 진행 → 위치 추정. 인간에서는 등측이 추가로 “how” 경로로 분기 가능 — 음성 → 운동 표상(조음) 연결.

Auditory memory and auditory stream segregation

Summary

**청각 흐름 분리(auditory stream segregation)**는 칵테일 파티나 오케스트라처럼 복잡한 청각 장면을 음높이·멜로디·악기·공간에 따라 별개의 객체로 분해하는 과정이다. 청각 기억은 흐름 분리에 핵심적 역할을 하며 **불일치 부적성(MMN)**으로 측정된다(Näätänen et al., 2001). MMN은 표준 자극과 일탈 자극 사이 약 100–200 ms에서 발생하며 일차 청각피질 인근에서 생성된다(Alho, 1995).

Visual objects generally extend through time and are available for reinspection. Auditory objects tend not to hang around to be reinspected. Most models of hearing postulate an important role of a sensory memory store to integrate auditory information over brief time intervals. Perhaps the best developed model of auditory memory is that proposed by Näätänen and colleagues (Näätänen et al., 2001), who regard the primary function of this memory system to lie in early auditory stream segregation.

Much of the evidence in this area comes from studies of a human ERP component termed the mismatch negativity (MMN). The mismatch negativity occurs when an auditory stimulus deviates from previously presented auditory stimuli (Näätänen et al., 1978). It occurs between 100 and 200 ms after the onset of the deviant sound. The most simple example is a sequence of tones in which one tone has a deviant pitch (e.g. A-A-A-A-B where A = 1,000 Hz, B > 1,000 Hz). In one sense, the MMN can be considered as a “low level” phenomenon, because it occurs in the absence of attention. It is found in some comatose patients several days before waking (Kane et al., 1993) and when the stimulus is presented to the unattended ear of healthy participants (Alho et al., 1994). However, the MMN is also found for more complex auditory patterns, suggesting a more sophisticated underlying mechanism.

📊 그림 설명

표준 자극 1,000 Hz가 반복 제시되는 가운데(보라색) 일탈 자극(1004, 1008, 1016, 1032 Hz; 녹색)이 제시될 때 발생하는 EEG ERP. 일탈 자극 후 약 100–200 ms 시점에 두피에서 MMN(불일치 부적성)이 검출된다. 청각 기억 성분으로 해석되며 복잡한 청각 패턴에서도 발현된다.

Auditory stream segregation is unlikely to be limited to the auditory cortex. Parietal regions may be important too. Cusack (2005) used a perceptually ambiguous auditory stimulus of two alternating tones of different frequency that could either be interpreted as a single stream or as two streams. This manipulation found activity in the right intraparietal sulcus for two streams relative to one. The parietal lobes are also likely to play an important role in solving the classical cocktail party problem in which a single stream must be attended among competing streams. Kerlin et al. (2010) used EEG to show that selectively attending to speech in a multi-talker environment is linked to increased power of low frequency neural oscillations from the auditory cortex in addition to oscillatory changes over parietal sites in the alpha range.

Key Terms

Auditory stream segregation

The division of a complex auditory signal into different sources or auditory objects.

**청각 흐름 분리(Auditory stream segregation)**는 복합 청각 신호를 음원별 객체로 나누는 과정이다. 음높이·멜로디·악기·공간 위치 등을 단서로 사용하며, 칵테일 파티 환경에서 한 화자에 주의를 집중하기 위한 전제 조건이다.

Mismatch negativity (MMN)

An ERP component that occurs when an auditory stimulus deviates from previously presented auditory stimuli.

**불일치 부적성(MMN)**은 표준 자극의 반복 흐름에서 일탈 자극이 등장할 때 약 100–200 ms 후 발생하는 ERP 성분이다(Näätänen et al., 1978). 주의 없이도 발생하므로 무의식적 청각 변별의 지표로 쓰이며, 혼수 상태 환자에서 각성 며칠 전부터 검출되기도 한다(Kane et al., 1993).

Cocktail party problem

The problem of attending to a single auditory stream in the presence of competing streams—for instance, attending to one person’s voice in a noisy room of other voices.

**칵테일 파티 문제(Cocktail party problem)**는 다수의 경쟁 음원 중 단일 흐름(예: 한 화자의 목소리)에 선택적으로 주의를 두는 문제이다. 청각피질의 저주파 신경 진동 증강과 두정엽 알파 진동 변화가 동시에 관여한다(Kerlin et al., 2010).

Music Perception

Summary

음악 지각은 단순 문화 학습이 아니라 생물학적 기반을 가지며, 모든 인간 문화에서 보편적이고 어린 시기부터 형식 훈련 없이 출현한다(Peretz, 2006). 음악 시스템은 (1) 음높이의 이산적 집합화, (2) 지각 가능한 묶음·패턴 형성, (3) 협화·불협화 관계 등의 핵심 특성을 공유한다(Dowling & Harwood, 1986). 음악 처리 모델은 음높이 조직(contour→interval→tonal encoding)과 시간 조직(rhythm·meter)을 분리된 단계로 제안한다(Peretz & Coltheart, 2003).

Although music can rightfully be described as a form of art, this does not mean that it is purely a product of cultural learning. Many aspects of music perception have a biological basis and can be said to be “innate” in the same way as some argue language to be innate (Peretz, 2006). Namely, it is a universal phenomenon and it emerges early in life without formal training.

Peretz and Coltheart (2003) outlined a basic cognitive model of music processing that emphasizes different components of musical processing. The first distinction that they make is between processes that are shared between music and speech and those that are potentially specific to music. Thus, listening to someone singing “Happy Birthday” would evoke at least two routes: one concerned with the words and one concerned with the music. Within the domain of music, they then make a distinction between pitch organization (which includes pitch relations between notes) and temporal organization, including rhythm (the tempo of beats) and meter (the way beats are grouped).

📊 그림 설명

Peretz와 Coltheart(2003)의 음악 인지 모델 도식. 음향 분석에서 출발해 음높이 조직(tonal encoding · interval analysis · contour analysis)과 시간 조직(rhythm · meter analysis)으로 분기되며, 음악 어휘(musical lexicon)·정서 표현 분석 등이 통합된다. 가사 처리는 음향-음운 변환을 통해 음운 어휘에 도달하는 별도 경로로 표시되어 음악과 언어의 부분적 분리를 시각화한다.

Memory for tunes

Summary

멜로디 기억은 음높이 지각 및 가사 처리와 분리 가능하다. 양측 측두엽 손상 환자 CN은 음높이 지각에 일부 어려움이 있었으나 가장 두드러진 결손은 익숙한 멜로디 식별이었으며, 모델 상 “음악 어휘”에 해당하는 기억 성분의 손상으로 해석되었다(Peretz, 1996). 반대로 단어 식별을 잃었으나 멜로디 식별은 보존된 환자 사례도 존재한다(Mendez, 2001). 익숙한 곡 기억은 일화 기억보다 의미 기억의 일부로 저장되며, 의미치매 환자가 익숙한 곡 식별에 어려움을 보이는 정도는 우반구 앞 측두엽 손상 정도와 연관된다(Hsieh et al., 2011).

Some brain-damaged patients are unable to recognize previously familiar melodies despite being able to recognize songs from spoken lyrics. For example, case CN was a non-musician who suffered bilateral temporal lobe damage (Peretz, 1996). Although she had some difficulties with pitch perception, her most profound difficulty was in identifying previously familiar tunes and, as such, her damage was attributed to a memory component of music. There is evidence that memory for familiar tunes is stored as part of semantic memory rather than episodic memory. Patients with semantic dementia, who have general impairments in semantic memory, have difficulty in recognizing previously familiar tunes and the degree of impairment is linked to the amount of damage in the right anterior temporal lobes (Hsieh et al., 2011).

Rhythm

Summary

리듬 처리는 음높이 처리와 독립적이다. 후천성 음치 환자가 음높이 멜로디는 정상이나 리듬을 청각으로 식별하지 못한 사례가 보고되었으며, 시각 입력으로는 가능했다(Di Pietro et al., 2004). 선천성 발성 장애를 가진 KE 가계는 리듬 생성·지각에서 결함을 보이는 반면 음높이 멜로디 생성·지각은 정상이며, 기저핵 구조 이상이 동반된다(Alcock et al., 2000a). 리듬 지각은 청각계와 운동계의 상호작용에 의존하며, 규칙적 리듬을 듣는 것만으로도 전운동피질·보충운동영역·소뇌가 활성화된다(Bengtsson et al., 2009).

Disorders of rhythm can occur independently of disorders of pitch. Di Pietro et al. (2004) report a case of acquired amusia who could process pitch-based melody but could not identify rhythm from auditory input. He could do so from visual input, suggesting the problem wasn’t in general time perception. Members of the KE family with a congenital speech disorder also have problems in rhythm production and rhythm perception but perform as well as controls in pitch-based melody production and melody perception (Alcock et al., 2000a). The KE family is known to have structural abnormalities within the basal ganglia.

Evidence from functional imaging of normal listeners implicates interactions between the auditory system and the motor system in both rhythm perception and production. Passive listening to regular rhythms, relative to irregular ones, is linked to activity in the premotor cortex, supplementary motor area and the cerebellum (Bengtsson et al., 2009). Activity in the basal ganglia is greatest when participants have to maintain a beat relative to the initial finding of the beat (Grahn & Rowe, 2013).

Pitch

Summary

**선천성 음치(congenital amusia, tone-deafness)**는 뇌 손상 같은 신경학적 원인 없이 음높이 관계 지각에 어려움을 보이는 발달적 장애로, 인구의 약 4%에서 관찰된다(Ayotte et al., 2002). 우반구 청각피질 및 우측 하전두회의 백질·회백질 밀도 이상과 연관되며(Hyde et al., 2007), 다섯 음 시퀀스에서 음높이 일탈은 검출하지 못하나 박자 일탈은 검출 가능하다(Hyde & Peretz, 2004). 만다린 중국어 화자의 선천성 음치는 어휘 성조 변별에도 어려움을 보여 언어와의 부분적 공유를 시사한다(Nan et al., 2010).

Some people have good perception and production of rhythm but are impaired on pitch-based aspects of music. One recently studied group is those individuals who are said to be “tone deaf” or have so-called congenital amusia, because there is no known neurological cause such as brain damage. This can occur in up to 4 percent of the population and is not associated with difficulties in other domains, such as general intelligence (Ayotte et al., 2002). It is associated with right-hemisphere abnormalities in white and gray matter density, both in the right auditory cortex and the right inferior frontal gyrus (Hyde et al., 2007).

Zatorre and Baum (2012) argue that while music and speech share common mechanisms in pitch processing, there are important differences too. In speech, pitch is processed on a continuous scale and relative changes in pitch are important. In music, pitch is arranged into discrete notes and a small change of the pitch of a note in a melody can be perceived as “wrong” even if the relative pitch contour of the music is the same. They claim that fine-grained pitch changes are more dependent on the right hemisphere network and this tends to be selectively impaired in congenital amusia.

📊 그림 설명

선천성 음치 집단(파란색)과 통제 집단(보라색)의 우측 하전두회(IFG)와 우측 상측두회(STG) 피질 두께 잔차를 보여주는 산점도(Hyde et al., 2007). 음치 집단은 회백질 증가가 종합 음악 점수 저하와 상관되며, 우반구 전두-측두 네트워크의 구조적 이상이 음치의 신경학적 기반임을 시사한다.

Key Terms

Amusia

An auditory agnosia in which music perception is affected more than the perception of other sounds.

**음치(Amusia)**는 다른 청각 자극보다 음악 지각이 선택적으로 손상된 청각 실인증이다. 후천성(뇌 손상 후) 또는 선천성(발달적)으로 발생하며, 음높이·리듬·음색 등 음악 자질별로 분리되어 손상될 수 있다.

Tone-deafness (or congenital amusia)

A developmental difficulty in perceiving pitch relationships.

**선천성 음치(Tone-deafness)**는 발달 과정에서 음높이 관계 지각이 어려운 상태이다. 인구의 약 4%에서 관찰되며 일반 지능과 무관하다(Ayotte et al., 2002). 우반구 청각피질·하전두회 백질·회백질 밀도 이상과 연결된다.

Melody and musical syntax

Summary

멜로디는 윤곽 분석(contour) → 간격 분석(interval) → 조성 부호화(tonal encoding)의 단계로 처리되며, **음악적 통사(musical syntax)**는 특정 음이 다른 음보다 등장 확률이 높은 규칙성을 가리킨다(Koelsch & Siebel, 2005). 음악적 통사 일탈은 양반구 하전두 영역, 특히 좌반구 Broca 영역을 포함한 영역의 활성화와 연결되며(Maess et al., 2001), EEG에서 초기 우측 전두 부적성(ERAN) 성분으로 측정된다(Sammler et al., 2011).

The model of Peretz and Coltheart (2003) contains different stages of pitch processing in music: contour analysis (general up–down structure), interval analysis (precise relationship between successive notes), and tonal encoding (construction of melody). This rule-like aspect of music has been referred to as musical syntax (Koelsch & Siebel, 2005). Whereas both random pitch sequences and tonal melodies activate the bilateral auditory cortex and surrounding temporal regions (Patterson et al., 2002), musical syntactic deviations are associated with activation of inferior frontal regions (Maess et al., 2001). This tends to be bilateral and stronger on the right but includes Broca’s area on the left, which has been considered as specific to language. Brain lesions in this area disrupt an event-related potential component measured using EEG (the ERAN, Early Right Anterior Negativity) that is linked to processing of musical syntactic deviations (Sammler et al., 2011).

Key Terms

Melody

Patterns of pitch over time.

**멜로디(Melody)**는 시간에 걸친 음높이의 패턴이다. 윤곽(전체적 상승·하강)·간격(연속한 음 사이의 정확한 거리)·조성(허용 가능한 음들의 집합)의 세 수준으로 분해되며, 각 수준이 부분적으로 분리된 신경 기반을 갖는다.

Timbre

Summary

음색은 첼로와 색소폰의 동일 음을 구분 가능하게 하는 자질로, 음의 시간 전개(공격·감쇠)와 주파수 성분의 상대 강도로 결정된다. 우반구 측두엽 손상은 음높이 멜로디 지각과 분리되어 음색 지각에 선택적 손상을 일으킨다(Samson & Zatorre, 1994).

One notable omission from the model of Peretz and Coltheart (2003) is timbre. This perceptual quality of a sound enables us to distinguish between different musical instruments. The same note played on a cello and a saxophone will sound very different even if they are matched for pitch and loudness. Different instruments can be distinguished partly on the basis of how the note evolves over time (e.g. the attack and decay of the note) and partly on the basis of the relative intensity of the different frequency components of the note. Timbre perception is particularly affected by lesions of the right temporal lobe and can be dissociated from some aspects of pitch-related perception such as melody (Samson & Zatorre, 1994).

Music and emotion

Summary

음악-정서 연결은 빠른 템포·장조의 행복, 느린 템포·단조의 슬픔, 불협화의 긴장, 빠르고 규칙적 자극(예: Jaws 음악)의 공포 같은 음악적 관습에 의존하며, 서구 음악에 노출되지 않은 아프리카 Mafa 부족도 행복·슬픔·공포를 인식한다(Fritz et al., 2009). 정서 음악은 다른 정서 자극과 동일한 뇌 회로 및 보상 회로를 활성화하며(Blood & Zatorre, 2001), 편도체 손상으로 두려운 표정을 인식하지 못하는 환자는 공포 음악 인식에도 결함을 보인다(Gosselin et al., 2007).

Music has a special ability to tap into our emotional processes. This may rely on certain musical conventions such as happy music tending to be a faster tempo than sad music; happy being in major keys, and sad being in minor keys; dissonance between notes to create tension; and fast and regular to create scary music. A native African group, the Mafa, have been shown to be able to recognize happy, sad and fear in Western music despite no cultural exposure to these musical styles (Fritz et al., 2009).

Functional imaging shows that emotional music activates the same circuitry as other emotional stimuli and even the brain’s reward circuitry (Blood & Zatorre, 2001; Koelsch et al., 2006). Patients with acquired difficulties in emotion processing, such as in recognizing fearful faces, may show comparable deficits in recognizing scary music (Gosselin et al., 2007).

시험 팁

음악의 진화적 기능 가설 정리:

Darwin (1871): 짝짓기 신호 — 새의 노래와 유사한 성 선택 기원.

Huron (2001): 사회적 결속 — 집단 응집을 통한 생존 이점.

Mithen (2005) The Singing Neanderthals: 언어 선구 — 음악이 언어보다 먼저 진화.

Pinker (1997): “청각 치즈케이크” — 적응적 기능 없는 부산물; 단지 뇌의 여러 부분을 즐겁게 자극할 뿐.
시험에서는 “음악은 보편적이지만 진화적 기능은 불명확”이라는 점이 핵심.

Voice Perception

Summary

목소리는 얼굴처럼 풍부한 사회적 정보를 전달한다. 청자는 화자의 성별·체격·연령·정서 상태를 목소리만으로 추론할 수 있다(Scherer et al., 2001). 성인 남성의 성대(17–25 mm)는 성인 여성(12.5–17.5 mm)보다 길어 음높이가 낮다. Belin et al. (2000)은 양측 상측두구(STS)에서 발성음(말·웃음)에 더 강하게 반응하는 영역을 발견했으며, 우반구 STS 앞쪽이 화자 정체성에, TMS 자극 시 짧은 목소리 검출이 차단되었다(Bestelmeyer et al., 2011).

Voices, like faces, convey a large amount of socially relevant information about the people around us. It is possible to infer someone’s sex, size, age and mood from their voice. Physical changes related to sex, size and age affect the vocal apparatus in systematic ways. Larger bodies have longer vocal tracts and this leads to greater dispersion of certain frequencies. Adult men have larger vocal folds (17–25 mm) than adult women (12.5–17.5 mm), resulting in a lower pitched male voice. One can also infer the current emotional state from a voice even in an unfamiliar language (Scherer et al., 2001). Familiar people can also be recognized from their voice but this is generally more difficult than recognizing them from their face (Hanley et al., 1998).

Belin et al. (2000) claimed to have identified a voice-selective area in the human brain. They found three regions in the bilateral superior temporal sulcus that respond to vocal sounds (speech and non-speech such as laughs) more than non-vocal sounds of comparable acoustic complexity. In particular, the right superior temporal region anterior to auditory cortex appears to be important for speaker identity (Belin & Zatorre, 2003; Warren et al., 2006). TMS over this region disrupts the ability to detect the presence of a briefly heard voice, but not loudness judgments of the same stimuli (Bestelmeyer et al., 2011). A recent fMRI study with macaque monkeys has identified a homologous region that responds not only to vocalizations from their own species but is also affected by changes in identity between different vocalizers (Petkov et al., 2008).

📊 그림 설명

인간(좌)과 마카크(우) 우측 측두엽의 목소리 선택적 영역 위치 및 fMRI BOLD 반응 그래프. 동일 자극에서 화자가 바뀔 때(녹색) 동일 화자에서 자극이 바뀔 때(보라색)보다 신호 변화가 크게 나타나, 이 영역이 화자 정체성을 부호화함을 보여준다.

Speech Perception

Summary

음성 지각의 핵심 문제는 좌반구 편재화가 어느 처리 단계에서 발생하는지 식별하는 것이다. 일차 청각피질은 음성과 비음성을 동등하게 처리하지만(Binder et al., 2000), 측두엽 복측 “무엇” 경로에서 좌반구가 우세해지며, 명료한 음성이 모호한 음성보다 좌측 측두 영역을 더 강하게 활성화한다(Scott et al., 2000). 좌반구 손상 후 환경음과 음악은 식별 가능하나 음성은 식별 불가한 순수 단어 농(pure word deafness) 환자가 보고된다(Takahashi et al., 1992).

At what stage of processing, if any, does the brain treat speech sounds differently from other kinds of auditory stimuli? This question often reduces to identifying the stage in speech processing that is left lateralized. Functional imaging studies have shown that the primary auditory cortex of both left and right hemispheres responds equally to speech and other types of auditory stimuli (Binder et al., 2000). This suggests divergence at a later cortical stage. For example, Scott et al. (2000) report increased activity in a left temporal region in intelligible relative to unintelligible speech of comparable acoustic complexity. The right hemisphere homologue did not show this preference but was more responsive to dynamic pitch variation (Zatorre et al., 2002). Moreover, a specific type of acquired auditory agnosia called pure word deafness is found following damage to the left hemisphere (Takahashi et al., 1992). These patients are able to identify environmental sounds and music but not speech.

Key Terms

Pure word deafness

Type of auditory agnosia in which patients are able to identify environmental sounds and music but not speech.

**순수 단어 농(Pure word deafness)**은 환경음·음악 식별은 보존되나 음성 식별이 불가능한 청각 실인증이다. 좌반구 손상에서 발생하며 환자는 음성을 산출할 수는 있으나 들리는 음성이 “너무 빠르거나” “왜곡되어” 보인다고 보고한다(Takahashi et al., 1992).

The nature of the speech signal

Summary

음성 신호는 단어 사이에 물리적 공백이 없고(“I scream” vs “ice-cream”) 자음의 조음에 따른 침묵 구간이 단어 경계로 오인된다. 전 세계 언어를 기술하는 데 100개 미만의 **음소(phoneme)**가 필요하며 영어는 약 44개를 사용한다. 동일 음소의 음향 변이(예: 영어 “p”의 유기·무기)는 **이음(allophone)**이며, 태국어처럼 다른 언어에서는 별개 음소가 될 수 있다. 모음은 자유 기류로 산출되며 포먼트(formant) 줄무늬로 표시되고, 자음은 기류 차단과 **유성(voicing)**으로 구분된다.

To appreciate the difficulties faced by the auditory system during speech perception, consider a typical spectrogram for the sentence “Joe took father’s shoe bench out.” A spectrogram plots how the frequency of sound (on the vertical y axis) changes over time (on the horizontal x axis) with the intensity of the sound represented by level of darkness. Although there are gaps in the spectrogram, these typically correspond to the articulation of certain consonants (e.g. “t”, “b”, “f”) rather than gaps occurring between words. Although we are used to seeing gaps between words in written language, they do not exist in speech (one famous example being “I scream” versus “ice-cream”).

The basic segments of speech are called phonemes and, perhaps surprisingly, fewer than 100 phonemes describe all the languages of the world. The International Phonetic Alphabet (IPA) contains one written symbol for each phoneme; English contains around 44 phonemes. Phonemes are formally defined as minimal contrastive units of spoken language. The “p” sound of “pin” is more associated with an outward expulsion of air (called aspiration), and “peg” with less. These are two allophones of the single “p” phoneme. In Thai, “paa” aspirated means “to split”; whereas “paa” unaspirated means “forest.” These are separate phonemes in Thai, but allophonic variants in English.

📊 그림 설명

“Joe took father’s shoe bench out.” 문장의 스펙트로그램. 가로축은 시간, 세로축은 주파수, 어두움은 강도를 나타낸다. 모음은 수평 줄무늬(포먼트)로 표시되며, 단어 사이에는 공백이 없으나 “t”·“b”·“f” 같은 폐쇄 자음에서 기류 차단으로 인한 짧은 공백이 나타난다. 음성 신호의 연속성과 단어 분절의 어려움을 시각화한다.

Key Terms

Spectrogram

Plots the frequency of sound (on the y-axis) over time (on the x-axis) with the intensity of the sound represented by how dark it is.

**스펙트로그램(Spectrogram)**은 음성의 주파수·시간·강도를 동시에 시각화하는 그래프이다. 모음의 포먼트, 자음의 폐쇄 구간, 유성 자음의 수직 진동선 등 음향 자질이 한눈에 드러나 음성 지각 연구의 표준 도구로 사용된다.

Allophones

Different spoken/acoustic renditions of the same phoneme.

**이음(Allophones)**은 동일 음소의 다른 음향적 실현 변이이다. 영어 “p”의 유기·무기 발음은 인지적으로 동일 음소로 묶이지만, 태국어 같은 일부 언어에서는 의미 차이를 만드는 별개 음소로 분리된다.

Formants

Horizontal stripes on the spectrogram produced with a relative free flow of air (e.g. by vowels).

**포먼트(Formants)**는 모음 산출 시 자유로운 기류와 성도(vocal tract) 공명으로 발생하는 스펙트로그램의 수평 줄무늬이다. 포먼트 주파수 패턴은 모음 정체성을 결정하며, 화자의 체격(성도 길이)을 반영한다.

Voicing

Vibration of the vocal cords that characterizes the production of some consonants.

**유성(Voicing)**은 일부 자음 산출에 동반되는 성대 진동이다. “zzzz”는 유성, “ssss”는 무성이며, 스펙트로그램에서 좁은 간격의 수직선들로 표시된다. 유성 시작 시간(VOT)이 범주적 지각의 핵심 차원이 된다.

Categorical perception

Summary

**범주적 지각(Categorical perception)**은 연속적 음향 변화가 이산적 지각 범주로 매핑되는 현상이다. “da”(유성)와 “ta”(무성)는 유성 시작 시간(VOT)이 0 ms와 80 ms로 다르며, 30 ms 같은 중간값을 제시해도 청자는 항상 둘 중 하나로만 지각한다(Eimas, 1963). **공조음(co-articulation)**으로 인한 음향 신호의 변이를 처리하는 한 방법이며, McGurk 착시(McGurk & MacDonald, 1976)는 청각 “ba” + 시각 “ga” → 지각 “da”로 다감각 통합이 음성 지각에 결정적임을 보여준다.

One way in which the brain deals with variability in the acoustic input is by using categorical perception. Categorical perception refers to the fact that continuous changes in input are mapped on to discrete percepts. For example, the syllables “da” and “ta” are identical except that the phoneme “t” is unvoiced. It is possible to experimentally manipulate the onset of voicing along a continuum from 0 ms (perceived as “da”) to 80 ms (perceived as “ta”). But what happens at intermediate values such as 30 ms? Listeners will always perceive it as one phoneme or the other, albeit to varying degrees of certainty (Eimas, 1963). Categorical perception also provides one way of dealing with variability in the acoustic signal due to co-articulation, which refers to the fact that the production of a phoneme is influenced by the preceding and proceeding phonemes.

Although we may not think of ourselves as good lip-readers, we all are capable of using visual information to supplement what we hear. Visual cues from lip-reading are particularly important when the auditory input becomes less reliable, such as in noisy settings (Sumby & Pollack, 1954). One striking example is the so-called McGurk illusion (McGurk & MacDonald, 1976). To create the illusion, one dubs together a separate auditory stream saying one thing (e.g. “baba”) with visual lip-movements saying another (e.g. “gaga”). Participants often subjectively report hearing a third syllable—in this example, it is “dada.” Applying TMS to the left posterior superior temporal region temporarily reduces the susceptibility to the illusion (Beauchamp et al., 2010) and people who are particularly prone to perceiving the illusion show greater activity in this region to mismatching audio-visual stimuli during fMRI (Nath & Beauchamp, 2012). Skipper et al. (2007) found, using fMRI, that an illusory “da” stimulus resembles a real “da” stimulus in motor regions.

📊 그림 설명

McGurk 착시 도식: 청각으로 “ba”, 시각으로 “ga”가 동시에 제시되면 청자는 “da”로 지각한다. 청각계와 시각계가 다감각 통합 단계에서 모순된 입력을 “최선의 추측”으로 융합하는 과정을 보여주며, 좌측 후방 STS의 TMS 자극이 착시를 약화시켜 이 영역이 통합 지점임을 시사한다.

Key Terms

Co-articulation

The production of one phoneme is influenced by the preceding and proceeding phonemes.

**공조음(Co-articulation)**은 한 음소의 산출이 인접 음소의 영향을 받는 현상이다. 동일 음소도 맥락에 따라 음향적으로 변하므로, 청각계는 범주적 지각과 다감각 통합을 통해 이 변이를 처리한다.

McGurk illusion

An auditory percept derived from a fusion of mismatching heard speech and seen speech.

**McGurk 착시(McGurk illusion)**는 청각과 시각의 모순된 음성 정보가 융합되어 제3의 음성으로 지각되는 현상이다(McGurk & MacDonald, 1976). 음성 지각이 순수 청각이 아닌 다감각 통합 과정임을 결정적으로 보여준다.

주의

범주적 지각과 McGurk 착시의 차이: 범주적 지각은 청각 단일 채널 내 연속체에서 이산 범주로의 매핑(예: VOT 0→80 ms). McGurk 착시는 청각·시각 두 채널의 모순된 입력이 융합되어 어느 채널에도 없는 새 음(예: 제3의 “da”)이 생성되는 다감각 통합 현상. 두 현상 모두 음성 지각이 입력의 단순 재현이 아니라 능동적 추론임을 보여주지만 작동 수준이 다르다.

The motor theory of speech perception

Summary

**운동 이론(motor theory)**은 음성 지각이 청각 자극을 자기 음성 산출의 운동 표상에 대응시키는 과정이라고 주장한다(Liberman & Mattingly, 1985). 거울 뉴런 발견(Rizzolatti & Craighero, 2004) 이후 부활했으나, 강한 형태는 운동 영역 손상 환자가 음성 지각에서 가장 경미한 결함만 보인다는 사실로 반박된다(Hickok et al., 2011). 운동 표상은 청각 신호가 모호할 때에만 보조적으로 기여하는 약한 형태가 유력하다(D’Ausilio et al., 2012).

The motor theory of speech perception proposes that the auditory signal is matched on to motor representations for producing one’s own speech rather than matching to an acoustic template (Liberman & Mattingly, 1985; Liberman & Whalen, 2000). In this account, phonemes are recognized by inferring the articulatory movements that would have been necessary to produce these sounds. The motor theory has enjoyed a renaissance in recent years owing to the discovery of mirror neurons in the premotor and inferior frontal cortices, including parts of Broca’s area (Rizzolatti & Craighero, 2004).

The strongest form of the motor theory of speech perception would predict that damage to these motor/mirror regions in humans would result in severe difficulties in speech perception. However, this is not the case. Patients with lesions in this area have the mildest of impairments in speech perception as assessed by tasks such as syllable discrimination (Hickok et al., 2011). Virtual lesions using TMS suggest that the premotor region only contributes to speech perception when the auditory signal is hard to disambiguate (D’Ausilio et al., 2012). Similarly, there is evidence from fMRI that the motor/mirror system tends to be more activated when a phoneme is perceived correctly relative to when it is misperceived (Callan et al., 2010). In such cases the motor system appears to make contact with the auditory system via the dorsal, rather than ventral, auditory route (Chevillet et al., 2013).

Auditory ventral and dorsal routes for “what” and “how”

Summary

음성을 위한 등측 경로는 두 갈래로 분기된다. 복측(“what”) 경로는 측두엽 앞쪽을 따라 진행하며 명료한 음성일수록 더 앞쪽에서 활성화된다(Scott & Wise, 2004). 등측(“how”) 경로는 상측두엽 후방과 하두정엽(각회 포함)을 따라 진행해 음성을 운동 표상에 연결한다(Hickok & Poeppel, 2004). 등측 경로의 손상은 청각 이해는 보존되나 따라말하기에 결함을 일으키며(Baldo et al., 2012), Baddeley의 **음운 루프(articulatory loop)**의 신경 기반으로 제안된다.

The general distinction between an auditory ventral route (“what”) and an auditory dorsal route (“where”) was introduced earlier in the chapter. One further claim is that, for speech sounds, there is a further branch within the dorsal pathway that comprises a “how” route that links speech sounds with motor representations for producing speech (Hickok & Poeppel, 2004; Rauschecker & Scott, 2009).

The “what” stream runs anteriorly along the temporal lobe and the more speech-like (or intelligible) the auditory stimulus is, the more anterior the activity tends to be when measured with fMRI (Scott & Wise, 2004). The “how” stream runs posteriorly along the superior temporal lobe and the inferior parietal lobe (including the angular gyrus). The parietal and frontal parts of this pathway are assumed to be connected by the white matter tract known as the arcuate fasciculus.

📊 그림 설명

음성 지각·반복의 이중 경로 도식. 측두극의 의미 지식, 전방 STS의 음성 인식, Heschl 회의 일차 청각피질이 핵심 처리 영역으로 표시된다. 복측 “what” 경로(어휘-의미)와 등측 “how” 경로(운동-감각 음성 루프)가 활꼴다발(arcuate fasciculus)을 통해 Broca 영역(음성 산출 계획)·각회(음운 버퍼)와 연결된다.

Hickok and Poeppel (2004) have suggested that the how route may be the neuroanatomical basis for the articulatory loop (or phonological loop) proposed by Baddeley (1986; Baddeley et al., 1984). This system is a short-term memory store for verbal material and the information in the store is refreshed by subvocal articulation, as in the example of retaining a phone number between looking it up and dialing. Repetition of speech places significant demands on verbal working memory and, as such, seems to depend heavily on the “how” route. Lesions along the “how” pathway, particularly in the posterior STS and angular gyrus, tend to result in deficits in repetition but good auditory comprehension (Baldo et al., 2012; Kuemmerer et al., 2013).

Key Terms

Arcuate fasciculus

A white matter bundle that connects the temporoparietal region to the frontal lobes.

**활꼴다발(Arcuate fasciculus)**은 측두-두정 영역과 전두엽을 연결하는 백질 다발이다. 음성 지각·산출 사이의 정보 교환, 음운 단기 기억의 신경 회로 기반을 이루며, 손상 시 단어 따라말하기 결함을 초래한다.

Articulatory loop

A short-term memory store for verbal material that is refreshed by subvocal articulation.

조음 루프(Articulatory loop) 또는 음운 루프(phonological loop)는 언어 자료의 단기 저장소로, 무성의 속말 조음을 통해 정보를 갱신한다(Baddeley, 1986). 청각 등측 “how” 경로가 이 루프의 신경학적 구현으로 제안되며, 좌측 두정 영역이 저장 성분을 담당한다(Buchsbaum et al., 2011).

시험 팁

청각 삼중 경로 외우기: What = Ventral = 측두엽 앞쪽 = 의미 / Where = Dorsal = 두정엽 = 공간 / How = Dorsal 분기 = 측두-두정-전두(arcuate fasciculus) = 운동. 시각의 이중 경로(what/where)에 청각만의 how가 추가된 점이 핵심. how 경로 손상 → 이해 정상, 따라말하기 결함(전도성 실어증 유사).

Summary and Key Points of the Chapter

Summary

청각계는 시각계와 유사하게 자질(음높이·음량) 추출 → 객체 분리(흐름 분리) → 의미 분석의 위계적 처리를 거친다. 이차 청각피질의 일부 세포는 음 내용(“무엇”)과 위치(“어디”)에 분리된 선택성을 가지며, 이는 두정엽 등측·측두엽 복측 두 경로의 출발점이 된다. 음악 지각은 리듬·음높이·멜로디·음색·정서의 부분 분리 가능한 기제들로 구성되며, 우측 측두엽 영역이 목소리 인식에 특화된다. 음성 지각은 무한 변이 입력에서 범주적 정보를 추출하는 과정이며, 음향 처리(저장된 청각 템플릿과의 대응)와 운동 처리(저장된 조음 템플릿과의 대응)가 모두 관여한다. 음성 인식·반복은 복측 “what” 경로(의미 경유)와 등측 “how” 경로(낯선 단어·축자 반복)를 동시에 활용한다.

시각 지각과 마찬가지로 청각도 감각 신호에서 자질(음량·음높이)을 추출해 다른 “객체”(예: 시끄러운 방에서 화자 분리)로 분리하는 작업을 수행한다.
이차 청각피질의 세포는 음의 내용(“무엇”)과 위치(“어디”)에 대해 서로 다른 정도의 특화를 가질 수 있다. 이는 두정엽으로 향하는 청각 등측/”where” 경로와 측두엽을 따라가는 복측/”what” 경로(음성에서는 좌반구 우세)의 출발점이 된다.
음악 지각은 리듬·시간, 음높이 지각, 멜로디(음높이 패턴 지각) 등 여러 기제를 동반한다. 이들 구성 요소는 fMRI와 병변 연구로 부분적으로 분리된 신경 기반을 가짐이 드러났다.
목소리 인식에 특화된 영역이 (주로 우측) 측두엽에 존재한다는 일부 증거가 있다.
음성 인식은 무한 변이 감각 입력(화자별 음높이·억양·조음 차이)에서 범주적 정보를 추출하는 과정이다. 이는 음향 처리(저장된 청각 템플릿과의 대응)와 운동 처리(저장된 조음 템플릿과의 대응) 모두를 통해 달성될 수 있다.
음성 인식(및 반복)은 의미를 경유하는 복측 “what” 경로와 낯선 단어·축자 반복을 위한 등측 “how” 경로(조음 루프 활용 가능성) 양쪽에 의존할 수 있다.

학습 점검

다음 문항으로 이해도를 확인하라:

청각계가 직면하는 도전과 시각계가 직면하는 도전은 어떻게 비슷하고 다른가?

단일 세포 기록 연구는 뇌의 청각 정보 표상에 대해 어떤 지식을 제공했는가?

청각의 “무엇·어디·어떻게” 경로에 대한 증거는 무엇인가?

음악 지각은 다른 청각 자극의 지각과 다른 뇌 기제에 의존하는가?

음성 지각과 음악 지각이 다른 이유는 무엇인가?

음성 지각의 운동 성분에 대한 증거는 무엇인가?

Chapter 10. The hearing brain

📋 목차

대단원 구조

Chapter 10 The hearing brain

Key Terms

The Nature of Sound

Key Terms

From Ear to Brain

Key Terms

Basic Processing of Auditory Information

Feature processing in the auditory cortex

”What” versus “where”

Key Terms

Auditory memory and auditory stream segregation

Key Terms

Music Perception

Memory for tunes

Rhythm

Pitch

Key Terms

Melody and musical syntax

Key Terms

Timbre

Music and emotion

Voice Perception

Speech Perception

Key Terms

The nature of the speech signal

Key Terms

Categorical perception

Key Terms

The motor theory of speech perception

Auditory ventral and dorsal routes for “what” and “how”

Key Terms

Summary and Key Points of the Chapter

그래프 뷰

목차

Properties

백링크