New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
elementary probability for applications
An Introduction To Kolmogorov Complexity And Its Applications 4th Edition Ming Li, Paul Vitányi - Solutions
[44] Show that simulating a linear-time 2-tape deterministic Turing machine with one-way input by a 1-tape nondeterministic Turing machine with one-way input requires Ω(n2/((log n)2 log log n)) time.Comments. Hint: Let S be a sequence of numbers from {0,...,k − 1}, where k = 2l for some l.
[38] A k-pushdown store machine is similar to a k-tape Turing machine with one-way input except that the k work tapes are replaced by k pushdown stores. Prove: simulating a linear-time 2-pushdown store deterministic machine with one-way input by a 1-tape nondeterministic Turing machine with one-way
[42] Extend the proof of Theorem 6.10.1 to prove the following:Simulating a linear-time 2-tape deterministic Turing machine by a 1-tape deterministic Turing machine requires Ω(n2) time. (Both machines of the one-way input model.)Comments. Hint: Set the block size for xi to be a large constant, and
[33] Consider the 1-tape Turing machine as in Section 6.1.1, page 450. Let the input be n/ log n integers each of size O(log n), separated by # signs. The element-distinctness problem is to decide whether all these integers are distinct. Prove that the element-distinctness problem requires Ω(n2/
[31] Let I be an index structure supporting text search in O(l(P))-bit probes to find pattern P in text T as a substring.(a) If each query requires the location of P, then the size of I is Ω(l(T )).(b) Even if each query asks only whether a substring P is in T , the size of I is still Ω(l(T
[32] Consider a singly linked list L of n items, where the ith item has a pointer pointing to the (i + 1)st item, with the last pointer being nil. Let > 0, prove:(a) Every sequence of t(n) ≥ n steps of going backward on L can be done in O(t(n)n) steps, without modifying L or using extra memory
[35] (a) A k-pass DFA is just like a usual DFA except that the input head reads the input k times, from the first symbol to the last symbol, moving right only during each pass. Use incompressibility to show that a k-pass DFA is exponentially more succinct than a (k − 1)-pass DFA. In other words,
[38] A k-head PDA (k-PDA) is similar to a pushdown automaton except that it has k input heads. Prove that k + 1 heads are better than k heads for PDAs. That is, prove that there is a language that is accepted by a (k + 1)-PDA but not by any k-PDA.Comments. Conjectured by M.A. Harrison and O.H.
[40/O45] Refer to Exercise 6.9.1 for the definition of k-DFAs, prove the following. Let L = {x#y : x is a substring of y}.(a) No 2-DFA can do string-matching, that is, no 2-DFA accepts L.(b) No 3-DFA accepts L.(c) No k-DFA accepts L, for any integer k.(d) [Open] No k-DFA with sensing heads accepts
[28] A k-head deterministic finite automaton, abbreviated kDFA, is similar to a deterministic finite automaton except that it has k, rather than one, one-way read-only input heads. In each step, depending on the current state and the k symbols read by the k heads, the machine changes its state and
[19] The probability that the universal prefix machine U halts on self-delimiting binary input p, randomly supplied by tosses of a fair coin, is Ω (0 < Ω < 1). Let v1, v2,... be an effective enumeration without repetitions of Σ∗. Define L ⊆ Σ∗ such that vi ∈ L iff Ωi = 1. Section 3.5.2
[23] Assume the terminology in Exercise 6.8.7.Consider χ defined in the proof for Item (ii) of Barzdins’s lemma, Theorem 2.7.2.Essentially, χi = 1 if the ith bit of U(i) < ∞ is 0, and χi = 0 otherwise.Here U is the reference universal Turing machine of Theorem2.1.1.Let A be the language with
[35] We have characterized the regular languages using Kolmogorov complexity. It is immediately obvious how to characterize computable languages in terms of Kolmogorov complexity. If L ⊆ Σ∗ andΣ∗ = {v1, v2,...} is an effective enumeration, then we define the characteristic sequence χ =
[37] A deterministic CFL (DCFL) language is a language that is accepted by a deterministic pushdown automaton.(a) Show that {xxR : x ∈ Σ∗} and {xx : x ∈ Σ∗} are not DCFL languages, using an incompressibility argument.(b) Similar to Lemma 6.8.1, the following is a criterion separating DCFL
[20] Prove that L = {x#y#z : xy = z} is not regular.
[20] Prove that L = {x#y : at least half of x is a substring in y} is not regular.
[18] Prove that L = {x#y : x appears (possibly nonconsecutively) in y} is not regular.
[10] Prove that {0n1m : m>n} is not regular.
[10] The KC-regularity lemma can be generalized in several ways. Prove the following version. Let L be regular and Lx = {y : xy ∈L}. Let φ be a partial computable function depending only on L that enumerates strings in Σ∗. For each x, if y is the nth string in the complement of Lx enumerated
[39] Consider the SCS problem defined in Section 6.7. Prove by incompressibility the following: Let S ⊆ Σ∗ be a set of n sequences of length n, and let δ = √2/2 ≈ 0.707. Let scs(S) be the length of an SCS of S. The following algorithm majority-merge produces a common supersequence of
[35/O41] (a) Prove that the expected length of the longest common subsequence of two random binary sequences of length n is bounded above by 0.867n.(b) Open: Obtain tight bounds on expected length of the longest common subsequence of two random binary sequences of length n.Comments. Hint: Use the
[22] Consider two variants of p-pass Shellsort. In each pass, instead of fully sorting every sublist, we make only one pass of Bubblesort, or two such passes in opposite directions, for every sublist. In both cases the sequence may stay unsorted, even if the last increment is 1. A final phase, a
[36/O39] Consider the following algorithm.QuickSort( Array π[1..n] ): If n = 1 then return π; p := π[1]; πL :=(x ∈ π, x < p) in stable order; πR := (x ∈ π, x > p) in stable order;QuickSort(πL); QuickSort(πR); π := πLpπR.(a) Use the incompressibility method to show that the average
[22/O46] Sorting by stacks. The input is a permutation of n integers. These integers, one at a time, pass through a sequence of m first-in-last-out stacks S1,...,Sm, from S1 to Sm. If an integer k is to be pushed on Si, then this stack can pop some integers from the top down, pushing them on Si+1
[10] Use the idea in the proof for Theorem 6.6.2 to obtain Ω(n2)average-case lower bounds for Bubblesort, Selection sort, and Insertion sort.
[17] Set h0 = n. Use Theorem 6.6.3 to show that the averagecase time complexity of p-pass Shellsort for the increment sequence(a) h1 = n1/3 and h2 =1(p = 2) is Ω(n5/3);(b) h1 = n7/15, h2 = n1/5, and h3 =1(p = 3) is Ω(n23/15), together with the known upper bound O(n23/15) this shows T =
[40] Prove Theorem 6.6.3.Comments. Source: [P.M.B. Vit´anyi, Ibid.]. Hint: The proof uses the fact that most permutations of n keys have high Kolmogorov complexity.Since the number of inversions in the Shellsort process is not easily amenable to analysis, a simpler process is analyzed. The lower
[O48] (a) Prove or disprove that there is a number of passes p and an increment sequence such that Shellsort has average-case time complexity O(log n).(b) Find a better lower bound on average-case time complexity of Shellsort than Theorem 6.6.3 on page 490 (if there is one); give a good or optimal
[41] Show that the worst-case time complexity of p-pass Shellsort of n items is at least Ω(n log2 n/(log log n)2) for every number p of passes and every increment sequence.Comments. This shows that the best possible average-case time complexity of Shellsort for any number of passes and all
[25] Improve the log n! − 5n bound in Equation 6.7, page 486, by reducing 5n via a better encoding and more precise calculation.
[40] In computational biology, evolutionary trees are represented by unrooted unordered binary trees with uniquely labeled leaves and unlabeled internal nodes. Measuring the distance between such trees is useful in biology. A nearest neighbor interchange (nni) operation swaps two subtrees that are
[25] Consider the following game. Carole chooses a number from{1, 2,...,n}. Paul has to guess the secret number using only “yes/no”questions. Prove the following lower bounds on the number of questions needed for Paul to determine the number: (i) log n if Carole answers every question
Consider a computer network consisting of n computers connected in a ring by bidirectional communication channels. The message transmission takes unknown time, but messages do not overtake each other. The computers are anonymous, that is, they do not have unique identities. To be able to discuss
Consider routing schemes for n-node graphs G = (V,E), V ={1,...,n}, with maximal node degreed. Choose the most convenient labeling to facilitate compact routing schemes.(a) Show that for every d ≥ 3 there are networks for which any shortestpath routing scheme requires a total of Ω(n2/ log n)
[30] In interval routing on a graph G = (V,E), V = {1,...,n}, each node i has for each incident edge e a (possibly empty) set of pairs of node labels representing disjoint intervals with wraparound. Each pair indicates the initial edge on a shortest path from i to any node in the interval, and for
[29] In a full-information shortest-path routing scheme, the routing function in u must, for every destination v, return all edges incident to u on shortest paths from u to v. These schemes allow alternative shortest paths to be taken whenever an outgoing link is down. Show that for
[34] Show that for shortest-path routing in graphs that are o(n)-random, if the neighbors are not known, then the complete routing scheme requires at least n2/32 − o(n2) bits to be stored. This holds also under a slightly weaker model.
[31] Prove the following: For shortest-path routing on c log nrandom graphs, if nodes know their neighbors and nodes may be relabeled by arbitrary identifiers (which therefore can code information), then with labels of size at most (1 + (c + 3) log n) log n bits the local routing functions can be
[22] (a) Show that routing with any stretch factor > 1 in c log nrandom graphs can be done with n − 1 − (c + 3) log n nodes with local routing functions stored in at most log(n + 1) bits per node, and 1 +(c + 3) log n nodes with local routing functions stored in 6n bits per node (hence the
[19] Show that there exist labeled graphs on n nodes such that each local routing function must be stored in at least 1 2n log 1 2n − O(n)bits per node (hence the complete routing scheme requires at least(n2/2) log 1 2n − O(n2) bits to be stored).Comments. Source: [H.M. Buhrman, J.H. Hoepman,
[26] Show that almost every labeled tree on n nodes has maximum degree of O(log n/ log log n).Comments. Hint: represent a labeled tree by a binary sequence of length(n − 2) log n (the Pr¨ufer code). Prove a one-to-one correspondence between labeled trees and binary sequences of such length. Use
[20] In Section 2.6 we investigated up to which length l all blocks of length l occurred at least once in every δ(n)-random string of length n.Let δ(n)=2√2 log n/2/4 log n and G be a δ(n)-random graph on n nodes.Show that for sufficiently large n, the graph G contains all subgraphs on√2 log
[27] Use Exercise 6.4.1 to prove Theorem 6.4.1.Comments. Hint: similar to the proof of Theorem 2.6.1, with the labeled graph G in the part of the overall string, and cover elements (subsets of labeled nodes inducing subgraphs) taking the part of the blocks. Source:[H.M. Buhrman, M. Li, J.T. Tromp,
[M40] Use the terminology of Theorem 6.4.1.A cover of G is a set C = {S1,...,SN } with N = n/k, where the Si’s are pairwise disjoint subsets of V and N distinct covers of G, every cover consisting of N = n/k disjoint subsets. That is, every subset of k nodes of V belongs to precisely one
[25] An (n,d, α,c) OR-concentrator is a bipartite graph G(L+R, E) on the independent vertex sets L and R with d(L) = d(R) = n such that (i) every vertex in L has degreed, and (ii) every subset S ⊆ L with d(S) ≤ αn is connected to at least cn neighbors (in R). Show that there exist (n, 9.48, 1
[37] An (n,d, m)-graph is a bipartite multigraph with n vertices on the left side and m vertices on the right side, with every vertex on the left having degreed, and every vertex on the right having degree dn/m (assuming m|dn). An (n,d, m)-graph is (α, β)-expanding if every subset S of αn
[39] Let L ⊂ {0, 1}2n be a language to be recognized by two parties P and Q with unlimited computation power. Party P knows the first n bits of the input and party Q knows the last n bits. P and Q exchange messages to recognize L according to some bounded-error two-way probabilistic protocol. An
[35] Given an n-dimensional cube and a permutation π of its nodes, each node v wants to send an information packet to node π(v) as fast as possible. Label every edge in the cube with its dimension from{1,...,n}. A route (v1 → v2 → ··· → vk) is ascending if (vi, vi+1) has higher dimension
[36] From among n 3triangles with vertices chosen from n points in the unit square, let Tn be the one with the smallest area, and let An be the area of Tn. Heilbronn’s triangle problem asks for the maximum value Δn assumed by An over all choices of n points. We consider the average case: Show
[25] Consider a random directed graph whose n2 nodes are on the intersections of a two-dimensional n by n grid. All vertical edges(the grid edges) are present and directed upward. For every pair of horizontally neighboring nodes, we flip a three-sided coin; with probability p < 1 2 we add an edge
[36] Let K(N) denote the complete undirected graph of n nodes N = {1,...,n}. If A and B are disjoint subsets of N, then K(A, B) denotes the complete bipartite graph on sets A and B. A set C = (K(A1, B1),...,K(Aj , Bj )) is called a covering family of K(N) if for every edge {u, v} ∈ K(N) there
[17] Let G = (V,E) with V = {1,...,n} be an undirected graph on n nodes with C(G|n, p) ≥ n(n−1)/2, where p is a fixed program to be used to reconstruct G. A clique of a graph is a complete subgraph of that graph. Show that G does not contain a clique on more than 1 +2 log nnodes.Comments.
[25] Let T be a tournament on N = {1,...,n}. Define a ranking R as an ordering of N. For (i, j) ∈ T , if R(i) < R(j), we say that R agrees with (i, j). Otherwise, it disagrees with that edge. We are interested in a ranking that is most consistent with T , that is, such that the number of edges
[17] Let w(n) be the largest integer such that for every tournament T on N = {1,...,n} there exist disjoint sets A and B, each of cardinality w(n), in N such that A × B ⊆ T . Prove w(n) ≤ 2log n.Comments. Hint: add 2w(n)log n bits to describe nodes, and save w(n)2 bits on edges. Source of
[15] Give a simple algorithm that multiplies two n × n Boolean matrices in O(n2) average time under uniform distribution. Use an incompressibility argument to show the time complexity.Comments. Source: The original proof, without incompressibility, is given in [P.E. O’Neil and E.J. O’Neil,
[10] (Converting NFA to DFA) A DFA A has a finite number of states, including a distinguished start state and some distinguished accepting states. At every step, A reads the next input symbol and changes its state according to the current state and the input symbol. If A has more than one
[26/M30] Let the Turing machine in Section 6.1.1 be probabilistic, which means that the machine can flip a fair coin to determine its next move.(a) Assume that the machine is not allowed to err. Prove that such a machine still requires on average order n2 steps to accept the palindrome language L =
• [27] Construct an example of candidate explanations (p0, S0)and (p1, S1) for data x, with pi a program computing set Si x (i =0, 1), such that (i) the two-part MDL codes satisfy l(p1) + log d(S1) δ(x|S0).Comments. The example shows that shorter MDL code does not necessarily mean a better
[25] In many cases, such as the case of grammatical inference, the data is not a single string but a set of strings, say a subset of {0, 1}n for some given n. Develop the theory in Section 5.5 for this case of multiple data, and indicate the differences with the case of singleton data.Comments.
[25] Consider the model class of total computable functions of Exercise 5.5.21.(a) Define the structure functions βx, hx, λx, the sufficient statistic, sufficiency line, and minimal sufficient statistic for a string x in this setting.(b) The prefix complexity of the minimal sufficient statistic
• [27] The model class of total computable functions consists of the set of total computable functions p : {0, 1}∗ → {0, 1}∗. The(prefix-) complexity K(p) of a total computable function p is defined by K(p) = mini{K(i) : Turing machine Ti computes p}. In place of log d(S) for finite set
• [28] The model class of computable probability mass functions (probability models) consists of the set of functions P : {0, 1}∗ →[0, 1] with P(x) = 1. ‘Computable’ means here that there is a Turing machine TP that given x and a positive rational , computes P(x) within precision . The
[O39] It is unknown whether there is an algorithm that, for every x, lower semicomputes a nonincreasing function f(i) that follows the shape of βx(i) with error O(log n) (or even o(n)), in the sense of Definition 5.5.8 on page 415.Comments. The analogous question concerning upper semicomputability
[35] (a) Prove that the function βx(i) is not upper semicomputable to within precision l(x)/4 (there is no upper semicomputable function f(i) such that |f(i) − βx(i)| ≤ l(x)/4).(b) Prove that there is no algorithm that for every n and every x of length n upper semicomputes a nonincreasing
[42] Consider βx(i) as a two-argument function as in Example 5.5.12.(a) Show that the function βx(i) is computable from x, i given an oracle for the halting problem.(b) Show that the function βx(i) is upper semicomputable from x, i, K(x)up to a logarithmic error.(c) Show that the set {(x, S, β)
Show that it is impossible to approximate the complexity of the minimal sufficient statistic of x, even if we are given both x and K(x).Comments. Source: [N.K. Vereshchagin and P.M.B. Vit´anyi, Ibid.].
[43] Consider λx(i) as a two-argument function as in Example 5.5.12.(a) Show that λx(i) is upper semicomputable, but not computable.(b) Show that λx is not computable, given x, K(x), even in an approximate sense: There is no function λ that is computable given x, K(x), such that λx(i) follows
[20] Consider strings x of length n. Define the conditional variant of Definition 5.5.6 on page 413 as hx(i|y) = minS{log d(S) : S x, d(S) < ∞, K(S|y) ≤ i}. Since S1 = {0, 1}n is a set containing x and can be described by O(1) bits (given n), we find that hx(i|n) ≤ n for i = K(S1|n) = O(1).
[35] Give a general uniform construction of the finite sets Si,l witnessing the structure functions λx(i) and βx(i), at each argument i, in terms of indexes of x in the enumeration of strings of given complexity.That is, for every x there is a sequence l1 ≤ l2 ≤ ··· ≤ lK(x) ≤ n +O(log
[37] (a) Show that there are strings x of length n such that the algorithmic minimal sufficient statistic is essentially the singleton set consisting of the string itself. Formally, there are constantsc, C such that for every given k
[34] Let x be string and α, β natural numbers. Kolmogorov called a string (α, β)-stochastic if there is a finite set A ⊆ N and x ∈ A such that x ∈ A, K(A) ≤ α, K(x|A) ≥ log d(A) − β.The first inequality (with α not too large) means that A is sufficiently simple. The second
[34] Prove Theorem 5.5.2.Comments. A less precise result of the same nature is given in Theorem 8.1.6 on page 641, and its proof deferred to Exercise 8.1.8 on page 647, for arbitrary distortion measures. In the terminology used there, the Kolmogorov structure function is the distortion-rate
[23] Use the terminology of Section 5.5.1.Apart from parameters i, β, γ, there is a fourth important parameter, K(S|x∗), reflecting the determinacy of model S by the data x. Prove the equality log d(S) +K(S) − K(x) = K(S|x∗) + δ(x|S) + O(log nΛ(S)). Conclude that the central result
[31] (a) Show that there is a string x of length n and complexity about 1 2n for which βx(O(log n) = 1 4n + O(log n).(b) Show that for the set A0 = {0, 1}n we have K(A0) = O(log n) and K(x|A0) = 1 2n + O(log n), and therefore K(A0) + K(x|A0) = 1 2n +O(log n) is minimal up to a term O(log n).(c)
[35] Let x be a string. A prediction strategy P is a mapping from the set of strings of length less than l(x) into the set of rational numbers in the segment [0, 1]. The value P(x1 ...xi) (i
• [25] (a) Show that if we recode data x by its shortest program x∗, then this can change the structure functions.(b) Let f be a computable permutation of the set of strings (one-to-one, total, and onto). Show that the graph of hf(x) follows the shape of hx with error at most K(f) +
[39] (a) Show that for every set A x there is a set S x with K(S) ≤ K(A) + O(log Λ(A)) and log d(S) = log d(A) − K(A|x) +O(log Λ(A)).(b) Show that for every set A x there is a set S x with K(S) ≤K(A) − K(A|x) + O(log Λ(A)) and log d(S) = log d(A). Recall thatΛ(A) = K(A) + log
[39] The complexity profile of a string x is the set of positive integer pairs Px = {(m, l) : A x, C(A) ≤ m, log |A| ≤ l}. A string x of length n and C(x) = k is -nonstochastic if for all (m, l) ∈ Px either m>k − or m + l>n − .(a) Show there exist nonstochastic strings for each n and
[30] Prove the difficult side of Lemma 5.5.2 on page 415.Comments. Source: [N.K. Vereshchagin and P.M.B. Vit´anyi, IEEE Trans.Inform. Theory, 50:12(2004), 3265–3290].
This is far less than the number of functions h available for them (potential structure functions hx) as computed in Item (a).
[21] (a) Compute the number of different integer functions h defined on 0, 1,...k for some k ≤ n − log n satisfying h(0) ≤ n and h(i) + i is nonincreasing.(b) Conclude that the number in Item (a) is far greater than the number of x’s of length n and complexity k ≤ n − log n.Comments.
[25] Let x be a string and let S be a finite set containing x. Define S to be strongly typical for x if log d(S) − K(x|S∗) = O(1), where S∗ is the first shortest program to print S in lexicographic lengthincreasing order. (The standard Definition 5.5.3 on page 410 is slightly weaker since it
This prior is an objective and computably invariant form of Occam’s razor: A simple hypothesis H (with K(H) l(H)) has high mprobability, and a complex or random hypothesis H (with K(H) ≈ l(H))has low m-probability 2−l(H). Note that all hypotheses are random with respect to the distribution
[26] Let α(P, H) in Theorem 5.4.1 be small (for example, a constant) and prior P := m. Show that Theorem 5.4.1 is satisfied iff the data sample D is Pr(·|Hmdl)-random. Show also that this has probability going to one for the binary length n of the data increasing unboundedly (and the lim sup of
[23] Show that if log 1/ Pr(D|H) + log 1/P(H) = K(D|H) +K(H) + O(1), then H is P-random up to K(Pr(·|H)) − K(P) + O(1), and D is Pr(·|H)-random up to K(P) − K(Pr(·|H)) + O(1). (Negative randomness deficiencies correspond to 0.)
Show that this probabilty goes to one as m and n grow unboundedly. Moreover, the lim sup of that probability exceeds 1 − O(1/ min{m, n}).
[27] Show that the probability that for data of binary length n, the hypotheses of binary length m that are selected by the Bayesian maximum a posteriori and MDL principles, respectively, are close in the sense of satisfying the relations of Theorem
[20] Let R(n) denote the fraction of binary strings of length n that are sufficiently Pr(·|H)-random as in Equation 5.14. Show that R(n)=1−O(1/2K(H,n)) and goes to 1 for n → ∞. Moreover, lim supn→∞R(n)=1 − O(1/n).Comments. Source for this and the next three exercises: [P.M.B. Vit´anyi
[M36] Show that the continuous concept class C defined in Example 5.3.2 is pac-learnable under all simple measures but not paclearnable (that is, under all measures).Comments. Source: [M. Li and P.M.B. Vit´anyi, SIAM J. Comput., 20:5(1991), 911–935].
[32] Show that the class of deterministic finite-state automata(DFA) whose canonical representations have logarithmic Kolmogorov complexity is polynomially pac-learnable under m.Comments. It is known that both exact and approximate (in the pac sense) identification of DFA is NP-hard. Source: [R.
[35] A Boolean formula is monotone if no literal in it is negated.A k-term DNF is a DNF consisting of at most k monomials.(a) Show that pac-learning monotone k-term DNF requires more than polynomial time unless RP=NP.(b) Show that the class of monotone k-term DNF is polynomially paclearnable under
[O35] Are any of log-DNF, simple DNF, log-decision list polynomially pac-learnable?
[33] A k-decision list over n variables is a list of pairs L =(m1, b1),...,(ms, bs), where mi is a monomial of at most k variables and bi ∈ {0, 1}, for 1 ≤ i ≤ s, except that always ms = 1. A decision list L represents a Boolean function fL defined as follows: For each example v ∈{0, 1}n,
[25] Show that log-DNF of Example 5.3.1 does not contain and is not contained in simple DNF of Exercise 5.3.3.
[30] Consider DNF over n variables. A DNF formula f is simple if for each term m off, there is a vector vm ∈ {0, 1}n that satisfies m but does not satisfy any other monomials of f even by changing one bit and K(vm) = O(log n). Simple DNFs can contain many highprefix-complexity terms as opposed to
[24] Prove Equation 5.9. Can you improve this bound?
[27] Consider discrete distributions. Show that there is a simple distribution that is not lower semicomputable, and there is a distribution that is not simple.Comments. Therefore, the simple distributions properly contain the lower semicomputable distributions but do not include all distributions.
[26] We apply Lemma 5.2.4 on page 372 to obtain insight into the relation between the number k of mistakes in the first m predictions and the complexity of the m predicted target-function values. Define x = f(1)...f(m).(a) Show that log m k+K(m, k)+O(1) = k log m k +m 1 − k mlog 1 1−k/m +O(log
[27] We continue Exercise 5.2.11.Let f1, f2,... be the standard enumeration of the partial computable functions.(a) Give the implicit effective enumeration of the hypotheses for the prior P defined by P(Hi)=1/(i(i + 1)) and give the implicit incomputable enumeration according to the universal prior
[14] Recall Equation 5.6 on page 370. Show that as the number m of examples D = e1,...,em grows, the inferred probability Pr(Hk|D)is either monotonically nondecreasing to a limit or it suddenly falls to 0 and stays there thereafter, for every k. Argue why this process is called‘learning in the
Showing 500 - 600
of 3340
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last
Step by Step Answers