The likelihood principle formalized by Birnbaum in his famous theorem

  • Theorem
    The formal likelihood principle follows from the weak 
    conditional probability and sufficiency principle 
    

There, Birnbaum, defines a symbol called \(Ev \left( E,x \right)\) that tries to give a more formal view of the evidential meaning of a specified sample \(\left( E, x \right)\). Basically, he was able to use this symbol to define the essential properties of evidence defined by the evidence of \(x\) of an experiment \(E\).

Then, he went to define an equivalence between different samples on different experiments by \(Ev\left(E, x\right) = Ev\left(E', y\right)\). This allowed to define a principle called of sufficiency.

  • The Principle of Sufficiency

    Given a experiment with outcomes \(x\) and a sufficient statistic \(t = t\left( x \right)\) summarizing the basic properties of the samples \(x\) and if you have \(E'\) is the experiment, derived from \(E\), which uses \(t\) to represent any outcome \(x \in E\), then \(Ev \left( E,x \right) Ev \left( E,t \right)\) for all \(x\).

An interesting part of the principle of sufficiency is the part of derivation because such word can signify many possibles way to derive the experiment. Thus, a sufficient statistics can be seen as a connection between any experiments that could be generated. If we decide to go Bayesian, and we have a distribution based on \(E'\) called \(f_{E'}\), we could say

\[p_{E'}\left(x\right \vert \theta') = p_{E'}\left(t \vert \theta' \right) = p_{E'}\left(t \right)\]

Thus, if this \(t\) is found at the experiment \(E\), Is there an interest to do an extension? After all, once we know the \(t\) we have a good summary of the data. Making any interest on making an extension of \(E\) to diminish greatly given the cost of such experiments. For example, you could be a company storing data about errors in manufacturing, and you could decide to build an estimator about such errors. It seems to be that you could be quite happy given that you could believe that such estimator is sufficient. However, somebody, as always, comments that given that the the production lines do not repeat models, How such sufficient statistic can exist? Yes, it exist for the past and present data samples, but not for the future data samples. Such situation is quite problematic, given that the Holly Grail of estimation, from the Gaussian point of view, is to be able to summarize the data.

Actually, Pitman–Koopman–Darmois theorem points to such problem. This theorem restricts sufficiency to the exponential family. But as anybody can testify, chaotic events, tend to point to a non-Gaussian Universe i.e the variance simply is not bounded. A natural question arises, under such situations, How well the Likelihood principle can work? Is there a way to measure well? A fascinating problem to say the least.