New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
management and artificial intelligence
Artificial Intelligence: Foundations Of Computational Agents 3rd Edition David L. Poole , Alan K. Mackworth - Solutions
The aim of this question is to get practice writing simple logic programs.(a) Write a relation remove(E, L,R) that is true if R is the list resulting from removing one instance of E from list L. The relation is false if E is not a member of L.(b) Give all of the answers to the following queries:ask
Consider the following logic program:ap(emp, L, L).ap(c(H, T), L, c(H,R)) ← ap(T, L,R).adj(A, B, L) ← ap(F, c(A, c(B, E)), L).(a) Give a top-down derivation (including all substitutions) for one answer to the query ask adj(b,Y, c(a, c(b, c(b, c(a,emp))))).(b) Are there any other answers? If so,
Consider the following logic program:rd(cons(H, cons(H, T)), T).rd(cons(H, T), cons(H,R)) ← rd(T,R).Give a top-down derivation, showing all substitutions for the query ask rd(cons(a, cons(cons(a, X), cons(B, cons(c,Z)))), W).What is the answer corresponding to this derivation?Is there a second
List all of the ground atomic logical consequences of the following knowledge base:q(Y) ← s(Y,Z) ∧ r(Z). p(X) ← q(f(X)). s(f(a), b).s(f(b), b). s(c, b). r(b).
For each of the following pairs of atoms, either give a most general unifier or explain why one does not exist:(a) p(X,Y,a, b, W) and p(E,c, F, G, F)(b) p(Y,a, b,Y) and p(c, F, G, F)(c) foo(Z, [a, z|X], X) and foo([a, m|W], W, [i, n, g])(d) ap(F0, c(b, c(B0, L0)), c(a, c(b, c(a,emp)))) and ap(c(H1,
Give a most general unifier of the following pairs of expressions:(a) p(f(X), g(g(b))) and p(Z, g(Y))(b) g(f(X),r(X), t) and g(W,r(Q), Q)(c) bar(val(X, bb),Z) and bar(P, P)
What is the result of the following applications of substitutions?(a) f(A, X,Y, X,Y){A/X,Z/b,Y/c}.(b) yes(F, L) ← append(F, c(L, nil), c(l, c(i, c(s, c(t, nil))))){F/c(l, X1),Y1/c(L, nil), A1/l,Z1/c(i, c(s, c(t, nil)))}.(c) append(c(A1, X1),Y1, c(A1,Z1)) ← append(X1,Y1,Z1){F/c(l, X1),Y1/c(L,
Consider the following knowledge base:has access(X, library) ← student(X).has access(X, library) ← faculty(X).has access(X, library) ← has access(Y, library) ∧ parent(Y, X).has access(X, office) ← has keys(X).faculty(diane). faculty(ming). student(william).student(mary). parent(diane,
Consider the following knowledge base:r(a). r(e). p(c).q(b). s(a, b). s(d, b).s(e, d). p(X) ← q(X) ∧ r(X). q(X) ← s(X,Y) ∧ q(Y).Show the set of ground atomic consequences derivable from this knowledge base.Use the bottom-up proof procedure (page 662) assuming, at each iteration, the first
Consider a domain with two individuals (✂ and ☎), two predicate symbols (p and q), and three constants (a,b, and c). The knowledge base KB is defined by p(X) ← q(X).q(a).(a) Give one interpretation that is a model of KB.(b) Give one interpretation that is not a model of KB.(c) How many
The stochastic policy iteration algorithm of Figure 14.10 (page 634)is based on SARSA (page 596)). How could it be modified to be off-policy as in Q-learning (page 590)? [Hint: Q-learning updates using the best action, SARSA updates by the one using the policy and stochastic policy iteration
Consider the following alternative ways to update the probability P in the stochastic policy iteration algorithm of Figure 14.10 (page 634).(i) Make more recent experiences have more weight by multiplying the counts in P by (1 − β), for small β (such as 0.01), before adding 1 to the best
In Example 14.12 (page 624), what is the Nash equilibrium with randomized strategies? What is the expected value for each agent in this equilibrium?
For the hawk–dove game of Example 14.11 (page 624), where D > 0 and R > 0, each agent is trying to maximize its utility. Is there a Nash equilibrium with a randomized strategy? What are the probabilities? What is the expected payoff to each agent? (These should be expressed as functions of R and
Consider the game of Tic-Tac-Toe (also called noughts and crosses), which is played by two players, an “X” player and an “O” player who alternate putting their symbol in a blank space on a 3 × 3 game board. A player’s goal is to win by placing three symbols in a row, column, or diagonal;
Modify Figure 14.5 (page 617) to include nature moves. Test it on a (simple) perfect information game that includes randomized moves (e.g., coin toss or roll of a dice). Recall (page 612) that in an extensive form of a game, each internal node labeled with nature has a probability distribution over
In Example 13.6 (page 601), some of the features are perfectly correlated (e.g., F6 and F7). Does having such correlated features affect what functions are able to be represented? Does it help or hurt the speed at which learning occurs? Test this empirically on some examples.
In SARSA with linear function approximation, using linear regression to minimize r + γQw(s, a) − Qw(s,a) gives a different algorithm than Figure 13.8 (page 602). Explain what you get and why what is described in the text may be preferable (or not). [Hint: what should the weights be adjusted to
The grid game of Example 13.6 (page 601) included features for the x-distance to the current treasure and are the y-distance to the current treasure.Chris thought that these were not useful as they do not depend on the action. Do these features make a difference? Explain why they might or might
The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state having a transition to a “nirvana” state, which has very high Q-value (but which will never be reached in practice, and so the probability will
Consider four different ways to derive the value of αk from k in Qlearning (note that for Q-learning with varying αk, there must be a different count k for each state–action pair).(i) Let αk = 1/k.(ii) Let αk = 10/(9 + k).(iii) Let αk = 0.1.(iv) Let αk = 0.1 for the first 10,000 steps, αk
For the following reinforcement learning algorithms:(i) Q-learning with fixed α and 80% exploitation.(ii) Q-learning with fixed αk = 1/k and 80% exploitation.(iii) Q-learning with αk = 1/k and 100% exploitation.(iv) SARSA learning with αk = 1/k and 80% exploitation.(v) SARSA learning with αk =
Compare the different parameter settings for Q-learning for the game of Example 13.2 (page 585) (the “monster game” in AIPython (aipython.org))In particular, compare the following situations:(i) step size(c) = 1/c and the Q-values are initialized to 0.0.(ii) step size(c) = 10/(9 +c) varies, and
For the plot of the total reward as a function of time as in Figure 13.4(page 594), the minimum and zero crossing are only meaningful statistics when balancing positive and negative rewards is reasonable behavior. Suggest what should replace these statistics when zero reward is not an appropriate
Suppose a Q-learning agent, with fixed α and discount γ, was in state 34, did action 7, received reward 3, and ended up in state 65. What value(s)get updated? Give an expression for the new value. (Be as specific as possible.)
Explain how Q-learning fits in with the agent architecture of Section 2.1.1 (page 53). Suppose that the Q-learning agent has discount factor γ, a step size of α, and is carrying out an -greedy exploration strategy.(a) What are the components of the belief state of the Q-learning agent?(b) What
How can variable elimination for decision networks, shown in Figure 12.14 (page 546), be modified to include additive discounted rewards? That is, there can be multiple utility (reward) nodes, to be added and discounted. Assume that the variables to be eliminated are eliminated from the latest time
In a decision network, suppose that there are multiple utility nodes, where the values must be added. This lets us represent a generalized additive utility function. How can the VE for decision networks algorithm, shown in Figure 12.14 (page 546), be altered to include such utilities?
Consider the MDP of Example 12.31 (page 557).(a) As the discount varies between 0 and 1, how does the optimal policy change?Give an example of a discount that produces each different policy that can be obtained by varying the discount.(b) How can the MDP and/or discount be changed so that the
Explain why we often use discounting of future rewards in MDPs.How would an agent act differently if the discount factor was 0.6 as opposed to 0.9?
What is the main difference between asynchronous value iteration and standard value iteration? Why does asynchronous value iteration often work better than standard value iteration?
Consider the following decision network:(a) What are the initial factors? (Give the variables in the scope of each factor, and specify any associated meaning of each factor.)(b) Give a legal splitting order, and the order that variables can be evaluated (similar to Example 12.20 (page 543)).(c)
(page 455). When an alarm is observed, a decision is made whether or not to shut down the reactor.Shutting down the reactor has a cost cs associated with it (independent of whether the core was overheating), whereas not shutting down an overheated core incurs a cost cm that is much higher than
Consider the belief network of
This exercise is to compare variable elimination and conditioning for the decision network of Example 12.16 (page 539).(a) For the inverse of the variable ordering for search used in Example 12.20(page 543) (i.e., from Leaving to Report) show the sequence of factors removed and created for variable
In Example 12.16 (page 539), suppose that the fire sensor was noisy in that it had a 20% false positive rate P(see smoke|report ∧ ¬smoke) = 0.2 and a 15% false negative rate P(see smoke|report ∧ smoke) = 0.85.Is it still worthwhile to check for smoke?
How sensitive are the answers from the decision network of Example 12.16 (page 539) to the probabilities? Test the program with different conditional probabilities and see what effect this has on the answers produced. Discuss the sensitivity both to the optimal policy and to the expected value of
Suppose that, in a decision network, there were arcs from random variables “contaminated specimen” and “positive test” to the decision variable“discard sample.” You solved the decision network and discovered that there was a unique optimal policy:Contaminated Specimen Positive Test
Suppose that, in a decision network, the decision variable Run has parents Look and See. Suppose you are using VE to find an optimal policy and, after eliminating all of the other variables, you are left with a single factor:Look See Run Value true true yes 23 true true no 8 true false yes 37 true
Some students choose to cheat on exams, and instructors want to make sure that cheating does not pay. A rational model would specify that the decision of whether to cheat depends on the costs and the benefits. Here we will develop and critique such a model.Consider the decision network of Figure
Students have to make decisions about how much to study for each course. The aim of this question is to investigate how to use decision networks to help them make such decisions.Suppose students first have to decide how much to study for the midterm.They can study a lot, study a little, or not
One of the decisions we must make in real life is whether to accept an invitation even though we are not sure we can or want to go to an event. Figure 12.24 gives a decision network for such a problem. Suppose that all of the decision and random variables are Boolean (i.e., have domain {true,
Consider the following two alternatives:(i) In addition to what you currently own, you have been given $1000. You are now asked to choose one of these options:50% chance to win $1000 or get $500 for sure.(ii) In addition to what you currently own, you have been given $2000. You are now asked to
Prove that the completeness and/or transitivity axioms (page 519), imply the following statements. What axiom(s) do your proofs rely on?(a) o2 o1 is equivalent to o1 ! o2(b) if o1 ! o2 and o2 ! o3 then o1 ! o3(c) if o1 ! o2 and o2 o3 then o1 ! o3(d) if o1 o2 and o2 o3 then o1 o3.
Consider a two-variable causal network with Boolean variables A and B, where A is a parent of B, and the following conditional probabilities:P(a) = 0.2 P(b |a) = 0.9 P(b | ¬a) = 0.3.Consider the counterfactual: “B is observed to be true; what is the probability of B if A was false?”Draw the
Suppose someone provided the source code for a recursive conditioning (page 409) program that computes conditional probabilities in belief networks. Your job is to use it to build a program that also works for interventions, that is for queries of the form P(Y | do(a1,..., ak), b1,..., bm). Explain
Bickel et al. [1975] report on gender biases for graduate admissions at UC Berkeley. This example is based on that case, but the numbers are fictional.There are two departments, which we will call dept#1 and dept#2 (so Dept is a random variable with values dept#1 and dept#2), which students can
Consider the causal network of Figure 11.12. The following can be answered intuitively or using the do-calculus. Explain your reasoning:(a) Does P(I | B) = P(I | do(B))?(b) Does P(I | G) = P(J | do(G))?(c) Does P(I | G, B) = P(J | do(G), B)?(d) Does P(B | I) = P(B | do(I))?
Consider the causal network of Figure 11.12 (page 513). For each part, explain why the independence holds or doesn’t hold, using the definition of d-separation. The independence asked needs to hold for all probability distributions (which is what d-separation tells us).(a) Is J independent of A
Exercise 9.2 (page 451) asked to intuitively explore independence in Figure 9.37. For parts (c), (d), and (e) of Exercise 9.2, express the question in terms of conditional independence, and use d-separation (page 495) to infer the answer. Show your working.
Suppose Kim has a camper van (a mobile home) and likes to keep it at a comfortable temperature and noticed that the energy use depended on the elevation. Kim knows that the elevation affects the outside temperature. Kim likes the camper warmer at higher elevation. Note that not all of the variables
Consider the code for decision trees in Example 10.7 (page 472), and the Bayesian information criteria (BIC) (page 473) for decision trees. Consider the three cases: the BIC, the decision tree code with a 32-bit representation for probabilities, and the decision tree code that uses log2(|Es|) bits
To initialize the EM algorithm in Figure 10.8 (page 480) consider two alternatives:(a) allow P to return a random distribution the first time through the loop(b) initialize cc and fc to random values.By running the algorithm on some datasets, determine which, if any, of these alternatives is better
Suppose the k-means algorithm is run for an increasing sequence of values for k, and that it is run for a number of times for each k to find the assignment with a global minimum error. Is it possible that a number of values of k exist for which the error plateaus and then has a large improvement
Consider the unsupervised data of Figure 10.5 (page 477).(a) How many different stable assignments of examples to classes does the kmeans algorithm find when k = 2? [Hint: Try running the algorithm on the data with a number of different starting points, but also think about what assignments of
Suppose you have designed a help system based on Example 10.5(page 469) and much of the underlying system which the help pages are about has changed. You are now very unsure about which help pages will be requested, but you may have a good model of which words will be used given the help page.How
Consider the help system of Example 10.5 (page 469).(a) Using the c1 and wij counts in that example, give the probability of P(H | q), the distribution over help pages given q, the set of words in a user query.Note that this probability needs to also depend on the words not in q.(b) How can this be
Consider designing a help system based on Example 10.5 (page 469).Discuss how your implementation can handle the following issues, and if it cannot, whether it is a major problem.(a) What should be the initial uij counts? Where might this information be obtained?(b) What if the most likely page is
Try to construct an artificial example where a naive Bayes classifier can give divide-by-zero error in test cases when using empirical frequencies as probabilities. Specify the network and the (non-empty) training examples. [Hint:You can do it with two features, say A and B, and a binary
How well does particle filtering work for Example 9.48 (page 449)?Try to construct an example where Gibbs sampling works much better than particle filtering. [Hint: Consider unlikely observations after a sequence of variable assignments.]
Which of the following algorithms suffers from underflow (real numbers that are too small to be represented using double precision floats): rejection sampling, importance sampling, particle filtering? Explain why. How could underflow be avoided?Exercise 9.16(a) What are the independence assumptions
Consider the problem of filtering in HMMs (page 426).(a) Give a formula for the probability of some variable Xj given future and past observations. You can base this on Equation (9.6) (page 426). This should involve obtaining a factor from the previous state and a factor from the next state and
Suppose Sam built a robot with five sensors and wanted to keep track of the location of the robot, and built a hidden Markov model (HMM) with the following structure (which repeats to the right):...(a) What probabilities does Sam need to provide? You should label a copy of the diagram, if that
Extend Example 9.30 (page 420) so that it includes the state of the animal, which is either sleeping, foraging, or agitated.If the animal is sleeping at any time, it does not make a noise, does not move, and at the next time point it is sleeping with probability 0.8 or foraging or agitated with
(page 228).(a) Explain what knowledge (about physics and about students) a belief-network model requires.(b) What is the main advantage of using belief networks over using abductive diagnosis or consistency-based diagnosis in this domain?(c) What is the main advantage of using abductive diagnosis
This exercise continues
In a nuclear research submarine, a sensor measures the temperature of the reactor core. An alarm is triggered (A = true) if the sensor reading is abnormally high (S = true), indicating an overheating of the core (C = true).The alarm and/or the sensor could be defective (S ok = false, A ok = false),
Explain how to extend VE to allow for more general observations and queries. In particular, answer the following:(a) How can the VE algorithm be extended to allow observations that are disjunctions of values for a variable (e.g., of the form X = a ∨ X = b)?(b) How can the VE algorithm be extended
Sam suggested that the recursive conditioning algorithm only needs to cache answers resulting from forgetting, rather than all answers. Is Sam’s suggestion better (in terms of space or search space reduced) than the given code for a single query? What about for multiple queries that share a
Consider the following belief network:with Boolean variables (A = true is written as a and A = false as ¬a, and similarly for the other variable) and the following conditional probabilities:P(a) = 0.9 P(b) = 0.2 P(c |a, b) = 0.1 P(c |a, ¬b) = 0.8 P(c | ¬a,b) = 0.7 P(c | ¬a, ¬b) = 0.4 P(d |b) =
(page 225). Figure 5.14 (page 226) depicts a part of the actual DS1 engine design.Consider the following scenario:• Valves are open or closed.• A value can be ok, in which case the gas will flow if the valve is open and not if it is closed; broken, in which case gas never flows; stuck, in which
In this question, you will build a belief-network representation of the Deep Space 1 (DS1) spacecraft considered in
(page 225) using a belief network. Show the network structure. Give all of the initial factors, making reasonable assumptions about the conditional probabilities (they should follow the story given in that exercise, but allow some noise). Give a qualitative explanation of why the patient has spots
Represent the same scenario as in
Kahneman [2011, p. 166] gives the following example.A cab was involved in a hit-and-run accident at night. Two cab companies, Green and Blue, operate in the city. You are given the following data:• 85% of the cabs in the city are Green and 15% are Blue.• A witness identified the cab as Blue.
Consider the belief network of Figure 9.38 (page 453), which extends the electrical domain to include an overhead projector. Answer the following questions about how knowledge of the values of some variables would affect the probability of another variable.(a) Can knowledge of the value of
Consider the belief network of Figure 9.37. This is the “Simple diagnostic example” in AIPython (aipython.org). For each of the following, first predict the answer based on your intuition, then run the belief network to check it. Explain the result you found by carrying out the inference.(a)
Using only the axioms of probability and the definition of conditional independence, prove Proposition 9.2 (page 384).
Take the text of some classic work, such as can be found on gutenberg.org. Repeat the experiment of Example 8.11 (page 359) with that text. Increase the number of hidden nodes from 512 to 2048 and double the number of epochs. Is the performance better? What evidence can you provide to show it is
(page 359), the LSTM was character-based, and there were about 3.5 million parameters.(a) How many parameters would there be in an LSTM if it was word-based with a vocabulary of 1000 words and a hidden state of size 1000?(b) How many parameters would there be if the vocabulary had 10,000 words and
In
Give the pseudocode for Conv1D, for one-dimensional convolutions(the one-dimensional version of Figure 8.9). What hyperparameters are required?This pseudocode does not include all of the hyperparameters of Keras or PyTorch.For two of the hyperparameters of one of these, show how the pseudocode can
The Conv2D code of Figure 8.9 does not include a stride (page 349).Show how a stride can be incorporated into the pseudocode, where the stride is a pair of numbers, one for each dimension. Implement it in AIPython (aipython.org).
Adam (page 340) was described as a combination of momentum and RMS-Prop. Using AIPython (aipython.org), Keras, or PyTorch (see Appendix B.2), find two datasets and compare the following:(a) How does Adam with β1 = β2 = 0, differ from plain stochastic gradient descent without momentum? [Hint: How
Run the AIPython (aipython.org) neural network code or an other learner on the “Mail reading” data of Figure 7.1 (page 268) with a single hidden layer with two hidden units.(a) Suppose that you decide to use any predicted value from the neural network greater than 0.5 as true, and any value
(page 323) and Example 8.1 (page 330). You need to think about how many units need to be in the hidden layer.]
Give the weights and structure of a neural network with a sigmoid output activation and one hidden layer with an ReLU activation, that can represent the exclusive-or function (⊕) of two Booleans, which is true when the inputs have different truth values; see Figure 7.13 (page 293). Assume true is
Given the parameterizations of Example 7.22 (page 316):(a) When the features area, b, andc, what decision tree will the decision-tree learning algorithm find to represent t (assuming it maximizes information gain and only stops when all examples at a leaf agree)?(b) When the features area, b, andc,
It is possible to define a regularizer to minimize ∑e(loss(Y(e),Y(e)) +λ ∗ regularizer(Y)) rather than formula (7.5) (page 303). How is this different than the existing regularizer? [Hint: Think about how this affects multiple datasets or for cross validation.]Suppose λ is set by k-fold
Consider how to estimate the quality of a restaurant from the ratings of 1 to 5 stars as in Example 7.18 (page 301).(a) What would this predict for a restaurant that has two 5-star ratings? How would you test from the data whether this is a reasonable prediction?(b) Suppose you wanted not to
Suppose you want to optimize the mean squared loss (page 270)for the sigmoid of a linear function.(a) Modify the algorithm of Figure 7.12 (page 292) so that the update is proportional to the gradient of the squared error. Note that this question assumes you know differential calculus, in
Consider minimizing Equation (7.1) (page 289), which gives the error of a linear prediction. This can be solved by finding the zero(s) of its derivative. The general case involves solving linear equations, for which there are many techniques, but it is instructive to do a simple case by hand.(a)
Show how gradient descent can be used for learning a linear function that minimizes the absolute error. [Hint: Do a case analysis of the error; for each example the absolute value is either the positive or the negative of the value.What is appropriate when the value is zero?]
Give the weights for a logistic regression model that can approximate the following logical operations. Assume true is represented as 1, and false as 0. Assume that sigmoid(5) is a close enough approximation to 1, sigmoid(−5) is close enough to 0.(a) and: x1 ∧ x2 (b) or: x1 ∨ x2 (c) negation:
Implement a decision tree learner that handles input features with ordered domains. You can assume that any numerical feature is ordered. The condition should be a cut on a single variable, such as X ≤ v, which partitions the training examples according to the value v. A cut-value can be chosen
The aim of this exercise is to determine the size of the space of decision trees. Suppose there are n binary features in a learning problem. How many different decision trees are there? How many different functions are represented by these decision trees? Is it possible that two different decision
In the decision tree learner of Figure 7.9 (page 284), it is possible to mix the leaf predictions (what is returned by leaf value) and which loss is used in sum loss. For each loss in the set {0–1 loss, absolute loss, squared loss, log loss}, and for each leaf choice in {empirical distribution,
Consider the decision tree learning algorithm of Figure 7.9 (page 284)and the data of Figure 7.1 (page 268). Suppose, for this question, the stopping criterion is that all of the examples have the same classification. The tree of Figure Example Comedy Doctors Lawyers Guns Likes e1 false true false
Suppose you need to define a system that, given data about a person’s TV-watching likes, recommends other TV shows the person may like. Each show has features specifying whether it is a comedy, whether it features doctors, whether it features lawyers, and whether it has guns. You are given the
In the context of a point estimate of a feature with domain {0, 1}with no inputs, it is possible for an agent to make a stochastic prediction with a parameter p ∈ [0, 1] such that the agent predicts 1 with probability p and predicts 0 otherwise. For each of the following error measures, give the
Showing 200 - 300
of 4756
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last
Step by Step Answers