Pitch detection algorithm


A pitch detection algorithm is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or oscillating signal, usually a digital recording of speech or a musical note or tone. This can be done in the time domain, the frequency domain, or both.
PDAs are used in various contexts and so there may be different demands placed upon the algorithm. There is as yet no single ideal PDA, so a variety of algorithms exist, most falling broadly into the classes given below.
A PDA typically estimates the period of a quasiperiodic signal, then inverts that value to give the frequency.

General approaches

One simple approach would be to measure the distance between zero crossing points of the signal. However, this does not work well with complicated waveforms which are composed of multiple sine waves with differing periods or noisy data. Nevertheless, there are cases in which zero-crossing can be a useful measure, e.g. in some speech applications where a single source is assumed. The algorithm's simplicity makes it "cheap" to implement.
More sophisticated approaches compare segments of the signal with other segments offset by a trial period to find a match. AMDF, ASMDF, and other similar autocorrelation algorithms work this way. These algorithms can give quite accurate results for highly periodic signals. However, they have false detection problems, can sometimes cope badly with noisy signals, and - in their basic implementations - do not deal well with polyphonic sounds.
Current time-domain pitch detector algorithms tend to build upon the basic methods mentioned above, with additional refinements to bring the performance more in line with a human assessment of pitch. For example, the YIN algorithm and the MPM algorithm are both based upon autocorrelation.

Frequency-domain approaches

Frequency domain, polyphonic detection is possible, usually utilizing the periodogram to convert the signal to an estimate of the frequency spectrum
. This requires more processing power as the desired accuracy increases, although the well-known efficiency of the FFT, a key part of the periodogram algorithm, makes it suitably efficient for many purposes.
Popular frequency domain algorithms include: the harmonic product spectrum; cepstral analysis and maximum likelihood which attempts to match the frequency domain characteristics to pre-defined frequency maps ; and the detection of peaks due to harmonic series.
To improve on the pitch estimate derived from the discrete Fourier spectrum, techniques such as spectral reassignment or Grandke interpolation can be used to go beyond the precision provided by the FFT bins. Another phase-based approach is offered by Brown and Puckette

Spectral/temporal approaches

Spectral/temporal pitch detection algorithms, e.g. the YAAPT pitch tracking, are based upon a combination of time domain processing using an autocorrelation function such as normalized cross correlation, and frequency domain processing utilizing spectral information to identify the pitch. Then, among the candidates estimated from the two domains, a final pitch track can be computed using dynamic programming. The advantage of these approaches is that the tracking error in one domain can be reduced by the process in the other domain.

Speech pitch detection

The fundamental frequency of speech can vary from 40 Hz for low-pitched male voices to 600 Hz for children or high-pitched female voices.
Autocorrelation methods need at least two pitch periods to detect pitch. This means that in order to detect a fundamental frequency of 40 Hz, at least 50 milliseconds of the speech signal must be analyzed. However, during 50 ms, speech with higher fundamental frequencies may not necessarily have the same fundamental frequency throughout the window.