Question: [28] The model class of computable probability mass functions (probability models) consists of the set of functions P : {0, 1} [0, 1] with

• [28] The model class of computable probability mass functions (probability models) consists of the set of functions P : {0, 1}∗ →

[0, 1] with P(x) = 1. ‘Computable’ means here that there is a Turing machine TP that given x and a positive rational , computes P(x) within precision . The prefix complexity K(P) of a computable (possibly partial) function P is defined by K(P) = mini{K(i) : Turing machine Ti computes P}. A string x is typical for a distribution P if the randomness deficiency δ(x|P) = log 1/P(x)− K(x|P) is small. The conditional complexity K(x|P) is defined as follows. Say that a function A approximates P if |A(y, ) − P(y)| <  for every y and every positive rational . Then K(x|P) is the minimum length of a program that given every function A approximating P as an oracle, prints x. Similarly, P is c-optimal for x if K(P) + log 1/P(x) ≤ K(x) + c.

(a) Show that, for every x and every finite set S x there is a computable probability mass function P with log 1/P(x) = log d(S), δ(x|P) =

δ(x|S) + O(1), and K(P) = K(S) + O(1).

(b) Show that there is a constant c such that for every string x, the following holds: For every computable probability mass function P there is a finite set S x such that log d(S) < log 1/P(x) + 1, δ(x|S) ≤
δ(x|P) + K(log 1/P(x)) +

c, and K(S) ≤ K(P, log 1/P(x)) + c.

(c) Show that we can restrict consideration to models P such that log 1/P(x) ≤ l(x) + 1, in which case the additive terms in Item (b)
are O(log l(x)).
Comments. Hint for Item (b): Let m = log 1/P(x), that is 2−m−1 <
P(x) ≤ 2−m. Define S = {y : P(y) > 2−m−1}. Then, d(S) < 2m+1 ≤
2/P(x), which implies the claimed value for log d(S). To list S it suffices to compute all consecutive values of P(y) to sufficient precision until the combined probabilities exceed 1 − 2−m−1. That is, K(S) ≤
K(P, m) + O(1). Finally, δ(x|S) = log d(S) − K(x|S) < log 1/P(x) −
K(x|S)+1 = δ(x|P) +K(x|P)−K(x|S)+1 ≤ δ(x|P) +K(S|P) +O(1).
The term K(S|P) can be upper bounded by K(m) + O(1), which implies the claimed bound for δ(x|S). Hint for Item (c): for every n and every string x of length n, for every P consider P defined as P
(x) = (P(x)+2−n)/2 (the arithmetic mean between P and the uniform distribution on strings of length n). The model P is not worse than P as an explanation for x, with respect to the considered parameters complexity, deficiency, and likelihood. All results that hold for finite set models extend, up to a logarithmic additive term, to computable probability models. Since the results for the finite set models hold only up to additive logarithmic term anyway, this means that all of them equivalently hold for the model class of computable probability mass function models. Instead of the data-to-model code length log d(S) for finite set models, we consider the data-to-model code length log 1/P(x). The value log 1/P(x) measures also how likely x is under the hypothesis P. The mapping x "→ Pmin where Pmin minimizes log 1/P(x) over P with K(P) ≤ i is a constrained ML estimator. This justifies naming of the hx(i) function the ML estimator, in particular its probability-model version hx(i) = minP {log 1/P(x) : P(x) > 0, K(P) ≤ i}, with P a computable probability mass function. The results thus imply that the ML estimator always returns a hypothesis with almost minimum randomness deficiency. In classical statistics, unconstrained maximal likelihood is known to perform badly for model selection, because it tends to want the most complex models possible. This is closely reflected in our approach: unconstrained maximization will result in the computable probability distribution of complexity about K(x) that concentrates all probability on x. But the structure function hx(i) tells us the stochastic properties of data x at all model complexities. Source: These and related results occur in [A.K. Shen Soviet Math. Dokl., 28:1(1983), 295–299; The Comput. J., 42:4(1999), 340–342; V.V. Vyugin, SIAM Theory Probab.
Appl., 32(1987), 508–512; N.K. Vereshchagin and P.M.B. Vit´anyi, IEEE Trans. Inform. Theory, 50:12(2004), 3265–3290]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Elementary Probability For Applications Questions!