The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa. Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains. The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution. Specifically, assume that follows an exponential distribution with scale or rate : with density Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W: The pmf of a geometric distribution is for. The Yule–Simon pmf is then the following exponential-geometric compound distribution: The maximum likelihood estimator for the parameter given the observations is the solution to the fixed point equation where are the rate and shape parameters of the gamma distribution prior on. This algorithm is derived by Garcia by directly optimizing the likelihood. Roberts and Roberts generalize the algorithm to Bayesian settings with the compound geometric formulation described above. Additionally, Roberts and Roberts are able to use the Expectation Maximisation framework to show convergence of the fixed point algorithm. Moreover, Roberts and Roberts derive the sub-linearity of the convergence rate for the fixed point algorithm. Additionally, they use the EM formulation to give 2 alternate derivations of the standard error of the estimator from the fixed point equation. The variance of the estimator is the standard error is the square root of the quantity of this estimate divided by N.
Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon distribution is defined as with. For the ordinary Yule–Simon distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.