Question: [28] The model class of computable probability mass functions (probability models) consists of the set of functions P : {0, 1} [0, 1] with

• [28] The model class of computable probability mass functions (probability models) consists of the set of functions P : {0, 1}∗ →

[0, 1] with P(x) = 1. ‘Computable’ means here that there is a Turing machine TP that given x and a positive rational , computes P(x) within precision . The prefix complexity K(P) of a computable (possibly partial) function P is defined by K(P) = mini{K(i) : Turing machine Ti computes P}. A string x is typical for a distribution P if the randomness deficiency δ(x|P) = log 1/P(x)− K(x|P) is small. The conditional complexity K(x|P) is defined as follows. Say that a function A approximates P if |A(y, ) − P(y)| < for every y and every positive rational . Then K(x|P) is the minimum length of a program that given every function A approximating P as an oracle, prints x. Similarly, P is c-optimal for x if K(P) + log 1/P(x) ≤ K(x) + c.

(a) Show that, for every x and every finite set S x there is a computable probability mass function P with log 1/P(x) = log d(S), δ(x|P) =

δ(x|S) + O(1), and K(P) = K(S) + O(1).

(b) Show that there is a constant c such that for every string x, the following holds: For every computable probability mass function P there is a finite set S x such that log d(S) < log 1/P(x) + 1, δ(x|S) ≤
δ(x|P) + K(log 1/P(x)) +

c, and K(S) ≤ K(P, log 1/P(x)) + c.

(c) Show that we can restrict consideration to models P such that log 1/P(x) ≤ l(x) + 1, in which case the additive terms in Item (b)
are O(log l(x)).
Comments. Hint for Item (b): Let m = log 1/P(x), that is 2−m−1 <
P(x) ≤ 2−m. Define S = {y : P(y) > 2−m−1}. Then, d(S) < 2m+1 ≤
2/P(x), which implies the claimed value for log d(S). To list S it suffices to compute all consecutive values of P(y) to sufficient precision until the combined probabilities exceed 1 − 2−m−1. That is, K(S) ≤
K(P, m) + O(1). Finally, δ(x|S) = log d(S) − K(x|S) < log 1/P(x) −
K(x|S)+1 = δ(x|P) +K(x|P)−K(x|S)+1 ≤ δ(x|P) +K(S|P) +O(1).
The term K(S|P) can be upper bounded by K(m) + O(1), which implies the claimed bound for δ(x|S). Hint for Item (c): for every n and every string x of length n, for every P consider P defined as P
(x) = (P(x)+2−n)/2 (the arithmetic mean between P and the uniform distribution on strings of length n). The model P is not worse than P as an explanation for x, with respect to the considered parameters complexity, deficiency, and likelihood. All results that hold for finite set models extend, up to a logarithmic additive term, to computable probability models. Since the results for the finite set models hold only up to additive logarithmic term anyway, this means that all of them equivalently hold for the model class of computable probability mass function models. Instead of the data-to-model code length log d(S) for finite set models, we consider the data-to-model code length log 1/P(x). The value log 1/P(x) measures also how likely x is under the hypothesis P. The mapping x "→ Pmin where Pmin minimizes log 1/P(x) over P with K(P) ≤ i is a constrained ML estimator. This justifies naming of the hx(i) function the ML estimator, in particular its probability-model version hx(i) = minP {log 1/P(x) : P(x) > 0, K(P) ≤ i}, with P a computable probability mass function. The results thus imply that the ML estimator always returns a hypothesis with almost minimum randomness deficiency. In classical statistics, unconstrained maximal likelihood is known to perform badly for model selection, because it tends to want the most complex models possible. This is closely reflected in our approach: unconstrained maximization will result in the computable probability distribution of complexity about K(x) that concentrates all probability on x. But the structure function hx(i) tells us the stochastic properties of data x at all model complexities. Source: These and related results occur in [A.K. Shen Soviet Math. Dokl., 28:1(1983), 295–299; The Comput. J., 42:4(1999), 340–342; V.V. Vyugin, SIAM Theory Probab.
Appl., 32(1987), 508–512; N.K. Vereshchagin and P.M.B. Vit´anyi, IEEE Trans. Inform. Theory, 50:12(2004), 3265–3290]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Elementary Probability For Applications Questions!

Probability and Statistics - Problem Set c Keith M. Chugg October 2, 2015 1 Preliminaries, Combinatorics, Set Probability 1.1. A number of bats are in a cave. 2 bats can see out of their left eye. 3...

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all events in S, is said to be a probability function...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

"There is a growing interdependence between a firm's ability to use information technology and its ability to implement corporate strategies and achieve corporate goals." "To fully understand...

re Regular Languages and Finite Automata (a) Let L be the set of all strings over the alphabet {a, b} that end in a and do not contain the substring bb. Describe a deterministic finite automaton...

Exercises Chapter 2 2.1 Marginal and conditional probability: The social mobility data from Section 2.5 gives a joint probability distribution on (Y1 , Y2 )= (father's occupation, son's occupation)....

1. A player throws a fair die and simultaneously flips a fair coin. If the coin lands heads, then she wins twice, and if tails, then one-half of the value that appears on the die. Determine her...

s1 educated (SSE) student for every three public school educated (PSE) students. Reasoning that students are not very dissimilar from threads, he suggests the following entry and exit routines be...

dee complete please help Complexity Theory (a) Defifine the set of Boolean expressions 2CNF and the language 2SAT over them. (b) For a Boolean expression in 2CNF, let G() be the directed graph with...

The Basics of Financial Mathematics Spring 2003 Richard F. Bass Department of Mathematics University of Connecticut c These notes are 2003 by Richard Bass. They may be used for personal use or class...

Journalize the entries to record the following selected bond investment transactions for Capital Trust: a. Purchased for cash $250,000 of Belmont City 4% bonds at 100 plus accrued interest of $1,500....

Kelso's is valued at $5.8 million.Has riskless debt of $2.3 million outstanding, and has an equity beta of 1.81. What is the asset beta if there are no taxes?

5 Multiple Choice 1 0 points A European call and a European put on a stock have the same strike price and time to maturity. At 1 0 : 0 0 am on a certain day, the price of the call is $ 3 and the...

a. Financial statements should aid the user of the statements in making decisions. In your opinion, would the user of the statements be aided if there were a distinction between financial reporting...