MiniTest 4 Lecture Summary

Table entries that have not yet been adjusted for this year's particular class sequence are in yellow.

Date Lecture topics Equations
Mon 10/29
  • Relationships between pitches are described by musical intervals: two notes with a specific frequency ratio.
    • This matches Fechner's Law and logarithmic sense of pitch.
  • The most basic musical interval is an octave. Two such notes sound very similar to us, so much so that in music they are given the same name. The frequency of the higher note is double the frequency of the lower note.
  • Two other basic musical intervals are a semitone (white to black key on piano) and a whole tone.
    • Going up(down) by a semitone means multilpying(dividing) the frequency by 1.05946.
    • To go up 2 semitones (1 whole tone), you do multiply twice, so that the frequency ratio for a whole tone interval is 1.059462=1.12246.

octave: f1/f2 = 2

semitone: f1/f2 = 1.05946

whole tone: f1/f2 = 1.12246

Fri 11/2
  • "Adding" musical intervals always corresponds to multiplying frequencies.
    • Changing pitch by some number of intervals corresponds to multiplying by a power of the appropriate ratio.
  • Two tones that are close in pitch (or frequency):
    • To distinguish them when played separately, they need to be different enough so that the peaks of the cochlear excitations are significantly different. This minimum difference is the frequency just noticable difference or f-jnd.
      • Figure 7.2 shows how this depends on the frequency. It is small, as little as 1/12 of a semitone.
      • f-jnd is often expressed as a frequency difference, but it is best treated as a musical interval, that is, a ratio of frequencies.
        • In-class example: at 1000Hz, f-jnd is 5Hz, do f-jnd ratio is 1005Hz/1000Hz=1.005
      • Don't confuse with the loudness jnd (from several lectures ago).  That's for two sounds with the same pitch but slightly different intensities.  The loudness jnd is about 1dB SIL.
    • Notes that can be distinguished when played separately may not be distinguishable when played simultaneously. If the two cochlear excitations overlap too much, then together they will seem like just one excitation.
      • Apparent frequency is the average of the two actual frequencies.
      • If the simultaneous tones are less than 15 Hz apart, we hear beats (covered eariler).
      • Beyond 15Hz difference, we can't hear the beats. However, the two tones may still be indistiguishable until the frequency difference reaches the fusion frequency difference. At this point, the cochlear excitations are far enough apart to distinguish the two peaks.
        • Thus, the fusion frequency is always less than the critical band.
        • The fusion frequency difference is roughly between a semitone and 2 semitones.
    • Sometimes when two close pitches are played together, you can hear a very low-pitch difference tone (also called a Tartini tone).
      • The difference tone frequency formula is the same as the beat frequency formula.
      • Can't hear a difference tone below 20Hz, since that is the low end of the audible range.
      • There are also other, similar combination tones.

Structure of the Ear I: Periodicity Theory of Pitch

  • The fact that we can hear difference tones defies the place theory of pitch.  Another part of the story is that there is a periodicity theory of pitch.
    • Ears do detect sound phase, and will percieve pitches based on repitition of patterns.
 
Mon 11/5
  • A complex sound can have a period T and a frequency f=1/T, but have a spectrum with no peak at that frequency f.
    • In this situation, the frequency f will be perceived anyhow, and called a virtual pitch.
    • Such a spectrum will have peaks at multiples of f.  Thus, it appears to be harmonic (equally spaced peaks) except with no fundamental.  It is said to have a missing fundamental.
    • In such a spectrum, the difference tones from the existing peaks match the missing fundamental frequency.  This is one possible explanation for the perception of virtual pitches.
  • A sound which reaches an ear via two different-length paths can result in a pitch perception even if the sound is noise (no spectrum peaks at all).
    • Any feature in the time graph of the sound will arrive at the ear twice, with a time delay Δt determined by the difference in path.
    • A pitch will be perceived at a frequency 1/Δt, called a repetition pitch. That is, the brain interprets the time delay as a period.

Vocal Tract Structure

  • The human voice is produced in the vocal tract, for which you should know the following parts: vocal folds, larynx, pharynx, oral cavity, and nasal cavity.
  • We will focus only on continuous vocal sounds, not the ones that last a short time.  For the most part, this means only vowel sounds.
  • Vocal folds (sometimes called vocal chords) are not chords! They are more like lips.
    • Held part way closed, air makes a whooshing sound (noise: no peaks in spectrum). This is the beginning of a whisper.
    • Vocal folds vibrating open and closed makes periodic puffs of air. Versus time, the airflow comes in triangular peaks. The spectrum of this is harmonic with lots of overtones. This is the beginning of regular speech or singing.
  • The spectrum of sound from the vocal folds can only be adjusted to a very limited extent.  The formation of different vowel sounds results from the way the various cavities (pharynx, oral cavity, and nasal cavity) change that sound.

Modifying Timbre: Transducers and Amplifiers

  • As sound signals are moved around (including when we preserve them in recordings or play them on electronic systems), they are modified into different forms.
  • A signal is information that varies with time. Thus it can be represented as a graph of (somthing) vs. time. We often deal with signals being transmitted.
  • During signal transmission, any piece of equipment that changes the form of the signal (i.e., the vertical axis on the vs. time graph) is a transducer.
  • During signal transmission, any component that keeps the signal in the same form, but changes the vertical size of the graph is an amplifier.
  • For both objects, the response of the piece of equipment is defined as how large the output is relative to the input, expressed as a ratio.
  • We reviewed the relationship between Intensity and Power.
  • There are many different possible units for response, depending on how you measure the input and output. We will simplify things by always measuring either the power or the intensity. That way, the response ratio is unitless (see equation).
  • Response is sometimes also called...
    • efficiency in the context of transducers.  Ideally their response would be 1 (no power loss). But in reality, it is always less.  Efficiency is often expressed as a percentage.
    • sensitivity in the context of microphones.
    • gain in the context of amplifiers, because their response would usually be larger than 1 (power increased).  However, there may be times when gain is less than 1.
  • The symbol we'll use for response will be g, from gain.
  • Ideally, response of transducers and amplifiers would not depend on the shape of the input. In reality, not true. How can we characterize how far from ideal some component is? There are too many possible inputs to try them all. First and most important method: consider pure tones (sine waves) of all different frequencies.
  • Thus, we want to look at the response/gain as a function of frequency. A graph of this is a response curve. The ideal for acurate sound reproduction is to have a flat response curve.

f = 1/Δt (repitition pitch)

I = W/A

g=Wout/Win

 

Fri 11/9 After MiniTest 3...

An Interlude: Speakers and Resonators

  • The part of a speaker that vibrates to make sound is a diaphragm.  For many speakers, the diaphragm is shaped like a very wide cone.
  • A small speaker usually has a response curve that emphasizes high frequencies.  A large speaker usually has a response curve that emphasizes low frequencies.
  • By placing a baffle (a flat surface) around a small speaker, we can both significantly increase the sound output, and improve its performance at lower frequencies.
    • The reason is that the baffle prevents air from moving between the front and back of the speaker.  That kind of air movement allows the speaker diaphragm to move without producing any sound.
  • Acoustic suspension speakers take the baffle idea to an extreme: the back of the diaphragm is completely enclosed in a sealed box, so that front-to-back air movement is completely impossible.
    • The down side: to make the diaphragm move, the inside air must be compressed and expanded.  This requires a lot of work, but does not produce sound.  Thus, acoustic suspension speakers have very low efficiencies.
  • A Helmholtz resonator is a cavity full of air, with a cylindrical neck opening to the outside air.
    • This is really an example of a mass and spring.  The plug of air in the neck has some (very small) mass, and can oscillate by moving in and out of the neck.  The cavity of air acts as a spring, pushing or pulling the neck-plug-air.
    • Therefore, the Helmholtz resonator has a natural frequency of vibration.  If a sound of the right frequency passes the opening, it will cause the (air in the neck of the) resonator to vibrate.
      • Recall "driven oscillations," such as shaking a mass on a spring. The amplitude of motion was biggest when the shaking is at the natural frequency of the mass & spring. We saw this when talking about the structure of the ear and called it resonance.
    • Adding a small hole opposite the neck, one can hold it to your ear and hear sounds as filtered by the resonator.  The response curve has one strong peak, at the resonator's natural frequency.
    • This is how the frequencies in sounds were analyzed before the advent of microphones, oscilloscopes, and FFTs.
  • High quality speaker cabinets will typically combine several speakers of different sizes, each responsible for the frequency ranges where they perform best.
  • A bass reflex speaker takes advantage of the Helmholtz resonator idea to improve upon the acoustic suspension design.
    • A tube is inserted through a hole in the speaker cabinet, called a port.  This allows air in and out of the cabinet, so that efficiency is improved.
    • The port acts like the neck of a Helmholtz resonator, while the speaker cabinet acts like the cavity.  The natural frequency of this combination is tuned to be very low, thus enhancing the response of the speaker where is otherwise would be dropping off.  See Rossing Figure 19.17.
    • In order to have the low resonance frequency, the speaker cabinet has to be fairly large.
  • The speaker we looked at also had a big piece of fiberglass insulation in it.  Why?
    • Speakers have lots of masses on springs:
      • Bass reflex have the air mass-and-spring of the port and cavity
      • All speaker diaphragms have mass, and must be supported by something springy.
    • Once set to vibrating, masses on springs tend to keep vibrating.  But when a note in music stops, you need your speaker to stop too.
    • The fiberglass is there to absorb energy, so that the speaker stops vibrating when it is supposed to.
    • This is one reason that speakers are intrinsically low efficiency.  In order to have changing sound, you continually have to absorb and discard the energy from the previous sound.
 
Mon 11/12 And now, back to...

Modifying Timbre: Transducers and Amplifiers

  • Characterizing anything that transmits, amplifies, or transduces a sound signal...
    • For complex sounds (not pure tones), use Fourier's theorem to conceptually break inputs into many pure tones (sine waves).  If the component is linear, then we can consider each partial tone to be coming through independantly.  So we can use the response curve idea.
    • Any difference between input and output that is worse than a frequency-dependent response is called distortion.
    • While a non-flat frequency response curve can be corrected for (with an equalizer), distortion is extremely difficult or impossible to remove, once introduced to a signal.
  • A sound signal can be characterized by a power spectrum.  This is exactly the same as an intensity spectrum, except that all vertical-axis numbers have been multiplied by some area (for example, the area of a microphone, or ear, or mouth).
  • Given an input power spectrum and a response curve, we can find the output power spectrum.  Each input partial is simply multiplied by the response at the appropriate frequency.  output=input*gain
  • Response, especially when it is called gain, is often expressed in dB instead of a ratio. Unfortunately, it is called the same thing ("gain") whether it is measured as a ratio or a dB difference — you have to figure out which is meant by context.
    • Sound power level (LW) has a definition extremely similar to sound intensity level.
    • In fact, for a given receiver (e.g., microphone, ear) or sound producer (mouth, speaker), power level and intensity level only differ by a constant: LW = LI+constant.
    • Gain or response are expressed in dB using an equation that looks very much like the "comparing intensities" decibel equation.
    • If input and gain are both specified in dB, then output = input + gain

Vocal Tract Function

  • Each cavity in the vocal tract (pharynx, vocal cavity, nasal cavity) is just like a Helmholtz resonator.
    • They each have a resonant frequency.  We can control that resonant frequency by changing the volume (expecially in the mouth).
    • Thus each cavity has a response curves with one peak.  The resonance peak is rather wide and not very tall peaks. These are called formants.
  • When combined, the response curve of the whole vocal tract has all three peaks.
    • The vocal fold signal passes through the cavities, and the spectrum is shaped by this response curve.
    • The effect of this is exactly as with the response curves discussed before: the output sound spectrum has peaks with frequencies determined by the vocal fold vibration frequency , but heights mainly determined by the formants of the response curve.
  • The placement (on the frequency axis) and relationship between the formants determines what vowel sound is being made. We looked at a graph showing the formants of the different vowels for men, women, and children.

Modyfying Sounds

  • Four main characteristics of a continuous sound are Duration, Loudness, Pitch, and Timbre.  How would you change each one while not affecting the others?
  • Changing Duration:
    • On a displacement-time graph, you can't just stretch the sound out to longer times, because that also changes the period.
    • Instead, need to change the number of cycles.
  • Changing Loudness:
    • Scale the vertical size of a graph, either displacement (on time graph) or amplitude (in a spectrum)
    • Must scale uniformly, so that the shape of either type of graph is the same.
  • Changing Timbre:
    • On a spectrum, change the relative heights of the peaks.
    • To keep the same overall intensity, the sum of all the intensities of the partials should stay the same.
  • Changing Pitch (the hard part is keeping the timbre the same):
    • Must change the frequency of the fundamental.
    • Generally, should "stretch along the frequency axis", that is, multiply the frequency of all peaks by the same factor.
    • However, only stretching along the frequency axis does NOT preserve the same timbre.  Doing that to human voices makes the "Alvin and the Chipmunks" sound.
    • Time-averaged spectrum: average the spectra of many pitches from the same source.  This represents the timbre of the source without regard to pitch.
    • Roughly, a specific sound from a musical source will have a spectrum with peaks whose frequencies are determined by the pitch, and whose heights are determined by the time averaged spectrum.
    • So to vary pitch while preserving timbre, you need to know the time averaged spectrum as well.

LW=(10dB) log(W/W0)

W= W0 10^(LW/10 dB)

W0=10-12 W

LW = LI+constant

g[dB] =(10dB)log(Wout/Win) =(10dB) log(g)

g= 10^(g[dB]/10 dB)

LW,out = LW,in +g[dB]

Fri 11/16

General Properties of Waves

  • Sound travels through the air as a wave. One example showing this is the way that 2 sounds can cancel each other, rather than re-enforcing one another in loudness.
  • A wave is a disturbance, normally of some stuff, which persists over long periods of time but moves the stuff very little, if at all. The stuff is called the medium.
    • An oscillation (for example, a bouncing ball) is not a wave.  A wave is something that has different displacements at different places.
    • Besides sound, specific examples of interest to us are a slinkly wave and a rope wave.
    • The medium needs to have an equilibrium state, and a restoring force (e.g., rope tension, air pressure)
  • There are many ways to categorize waves:
    • traveling waves versus standing waves
    • longitudinal/compression waves versus transverse waves (versus torsional waves versus circular waves versus ...)
    • wave pulse versus continuous wave/wave train, and everything in between such as wave packets
    • one dimensional versus two dimensional versus three dimensional meidum.
  • If all shapes moves at the same speed, the medium is said to be non-dispersive. This implies that the sound speed depends only on the medium.
    • Example of dispersive medium: surface waves on water (short ripples travel faster than long swells)
    • Nondispersive is very common, and will be all we do in this class.
  • The equations expressing the speed of waves in media always have the general structure sqrt((restoring force)/(massiveness)). This is also very similar to the equation for frequency of SHM.
    • For the specific case of a transverse wave on a rope/string/spring, the restoring force part is the tension force in the medium, and the massiveness part is the linear mass density, denoted by μ, which is the mass per unit length.
    • Often, you must convert g to kg when calculating linear mass denisties.

v = sqrt(FT/μ)

μ=m/L

Equations

HINT: It is more effective to memorize these as relations between concepts.  It is less effective to memorize these as strings of letters.

  Old New
To memorize:
  • s=dt
  • ssound=340 m/s
  • f = 1/T
  • Δ(anything)=(thing)final-(thing)start
  • savg=dt
  • vavgxt
  • aavgvt
  • F = m a
  • A = App/2
  • Δφ∝Δt
  • fn = n f1
  • fheard = 0.5 (f1 +f2)
  • fbeat = |f1 -f2|
  • W = Et
  • I = W/A
  • IN = NI1
  • Ad2
  • circle: A = πr2
  • I0 = 10-12 W/m2
  • {E, W, I}∝ {A, App, vmax}2
  • SIL individual threshold) = (SIL normal threshold) + HL
  • octave: f1/f2 = 2
  • f = 1/Δt (repitition pitch)
  • W0=10-12 W
  • LW = LI+constant
  • LW,out = LW,in +g[dB]
On Equation Sheet:
  • π = 3.1415
  • g = 9.81 m/s2
  • f = N t
  • F = –k Δx
  • Δx = A cos(φ)
  • Δx = A cos[(360°/Tt +φ0]
  • vavg,p-p = 4A /T
  • vmax = (π/2) vavg,p-p= 2π A /T
  • f = 1/(2π) sqrt(k/m)
  • Acomb= A1 +A2  (in phase case)
  • Acomb= |A1 -A2|  (out of phase case)
  • PE = 0.5 kx)2
  • KE = 0.5 m v2
  • E = PEmax = 0.5 k A2
  • sphere: A = 4πr2
  • hemisphere: A = 2 πr2
  • I ∝ 1/r2
  • LI = (10 dB) log(I/I0)
  • I = I0 10^(LI/10 dB)
  • ΔLI = LI1-LI2 = (10 dB)log(I1/I2)
  • I1/I2 = 10^(ΔLI/10 dB)
  • semitone: f1/f2 = 1.05946
  • whole tone: f1/f2 = 1.12246
  • g=Wout/Win
  • LW=(10dB) log(W/W0)
  • W= W0 10^(LW/10 dB)
  • g[dB] =(10dB)log(Wout/Win) =(10dB) log(g)
  • g= 10^(g[dB]/10 dB)
  • v = sqrt(FT/μ)
  • μ=m/L