New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
management and artificial intelligence
Artificial Intelligence Foundations Of Computational Agents 2nd Edition David L. Poole, Alan K. Mackworth - Solutions
4. Consider the unsupervised data of Figure 10.3.(a) How many different stable assignments of examples to classes does the k-means algorithm find when k = 2? [Hint: Try running the algorithm on the data with a number of different starting points, but also think about what assignments of examples to
3. Suppose you have designed a help system based on Example 10.5 and much of the underlying system which the help pages are about has changed. You are now very unsure about which help pages will be requested, but you may have a good model of which words will be used given the help page. How can the
2. Consider designing a help system based on Example 10.5. Discuss how your implementation can handle the following issues, and if it cannot whether it is a major problem.(a) What should be the initial uij counts? Where might this information be obtained?(b) What if the most likely page is not the
1. Try to construct an artificial example where a naive Bayes classifier can give divide-by-zero error in test cases when using empirical frequencies as probabilities. Specify the network and the(non-empty) training examples. [Hint: You can do it with two features, say A and B, and a binary
• Bayesian learning replaces making a prediction from the best model with finding a prediction by averaging over all of the models conditioned on the data.
• Missing values in examples are often not missing at random. Why they are missing is often important to determine.
• The probabilities and the structure of belief networks can be learned from complete data. The probabilities can be derived from counts. The structure can be learned by searching for the best model given the data.
• EM and k-means are iterative methods to learn the parameters of models with hidden variables(including the case in which the classification is hidden).
• Bayes’ rule provides a way to incorporate prior knowledge into learning and a way to trade off fit-to-data and model complexity.
18. How can variable elimination for decision networks, shown in Figure 9.13, be modified to include additive discounted rewards? That is, there can be multiple utility (reward) nodes, to be added and discounted. Assume that the variables to be eliminated are eliminated from the latest time step
17. In a decision network, suppose that there are multiple utility nodes, where the values must be added. This lets us represent a generalized additive utility function. How can the VE for decision networks algorithm, shown in Figure 9.13, be altered to include such utilities?
16. Consider a 5 × 5 grid game similar to the game of the previous question. The agent can be at one of the 25 locations, and there can be a treasure at one of the corners or no treasure.Assume the “up” action has same dynamics as in the previous question. That is, the agent goes up with
15. Consider a game world:The robot can be at any one of the 25 locations on the grid. There can be a treasure on one of the circles at the corners. When the robot reaches the corner where the treasure is, it collects a reward of 10, and the treasure disappears. When there is no treasure, at each
14. Consider the MDP of Example 9.29.(a) As the discount varies between 0 and 1, how does the optimal policy change? Give an example of a discount that produces each different policy that can be obtained by varying the discount.(b) How can the MDP and/or discount be changed so that the optimal
13. Explain why we often use discounting of future rewards in MDPs. How would an agent act differently if the discount factor was 0.6 as opposed to 0.9?
12. What is the main difference between asynchronous value iteration and standard value iteration? Why does asynchronous value iteration often work better than standard value iteration?
11. Consider the following decision network:(a) What are the initial factors? (Give the variables in the scope of each factor, and specify any associated meaning of each factor.)(b) Show what factors are created when optimizing the decision function and computing the expected value, for one of the
10. Consider the belief network of Exercise 12. When an alarm is observed, a decision is made whether or not to shut down the reactor. Shutting down the reactor has a cost cs associated with it (independent of whether the core was overheating), whereas not shutting down an overheated core incurs a
9. In Example 9.15, suppose that the fire sensor was noisy in that it had a 20% false positive rate, P(see_smoke | report ∧ ¬smoke) = 0.2, and a 15% false negative rate, P(see_smoke | report ∧ smoke) = 0.85 .Is it still worthwhile to check for smoke?
8. How sensitive are the answers from the decision network of Example 9.15 to the probabilities?Test the program with different conditional probabilities and see what effect this has on the answers produced. Discuss the sensitivity both to the optimal policy and to the expected value of the optimal
7. Suppose that, in a decision network, there were arcs from random variables “contaminated specimen” and “positive test” to the decision variable “discard sample.” You solved the decision network and discovered that there was a unique optimal policy:What can you say about the value of
6. Suppose that, in a decision network, the decision variable Run has parents Look and See.Suppose you are using VE to find an optimal policy and, after eliminating all of the other variables, you are left with a single factor:(a) What is the resulting factor after eliminating Run? [Hint: You do
5. Some students choose to cheat on exams, and instructors want to make sure that cheating does not pay. A rational model would specify that the decision of whether to cheat depends on the costs and the benefits. Here we will develop and critique such a model.Consider the decision network of Figure
4. Students have to make decisions about how much to study for each course. The aim of this question is to investigate how to use decision networks to help them make such decisions.Suppose students first have to decide how much to study for the midterm. They can study a lot, study a little, or not
3. One of the decisions we must make in real life is whether to accept an invitation even though we are not sure we can or want to go to an event. Figure 9.23 gives a decision network for such a problem.Suppose that all of the decision and random variables are Boolean (i.e., have domain {true,
2. Consider the following two alternatives(a) In addition to what you currently own, you have been given $1000. You are now asked to choose one of these options:50% chance to win $1000 or get $500 for sure(b) In addition to what you currently own, you have been given $2000. You are now asked to
1. Prove that transitivity of ⪰ implies transitivity of ≻ (even when only one of the premises involves ⪰ and the other involves ≻ ). Does your proof rely on other axioms?
• A dynamic decision network allows for the representation of an MDP in terms of features.
• A fully observable MDP can be solved with value iteration or policy iteration.
• An MDP can represent an infinite stage or indefinite stage sequential decision problem in terms of states.
• A decision network can represent a finite stage partially observable sequential decision problem in terms of features.
• Utility is a measure of preference that combines with probability.
21. How well does particle filtering work for Example 8.46? Try to construct an example where Gibbs sampling works much better than particle filtering. [Hint: Consider unlikely observations after a sequence of variable assignments.
20. Suppose you get a job where the boss is interested in localization of a robot that is carrying a camera around a factory. The boss has heard of variable elimination, rejection sampling, and particle filtering and wants to know which would be most suitable for this task. You must write a report
19.(a) What are the independence assumptions made in the naive Bayes classifier for the help system of Example 8.35.(b) Are these independence assumptions reasonable? Explain why or why not.(c) Suppose we have a topic-model network like the one of Figure 8.26, but where all of the topics are
18. Which of the following algorithms suffers from underflow (real numbers that are too small to be represented using double precision floats): rejection sampling, importance sampling, particle filtering? Explain why. How could underflow be avoided?
17. Consider the problem of filtering in HMMs.(a) Give a formula for the probability of some variable Xj given future and past observations. You can base this on Equation 8.2. This should involve obtaining a factor from the previous state and a factor from the next state and combining them to
16. Suppose Sam built a robot with five sensors and wanted to keep track of the location of the robot, and built a hidden Markov model (HMM) with the following structure (which repeats to the right):(a) What probabilities does Sam need to provide? You should label a copy of the diagram, if that
15. The aim of this exercise is to extend Example 8.29. Suppose the animal is either sleeping, foraging or agitated.If the animal is sleeping at any time, it does not make a noise, does not move and at the next time point it is sleeping with probability 0.8 or foraging or agitated with probability
14. Suppose Kim has a camper van (a mobile home) and likes to keep it at a comfortable temperature and noticed that the energy use depended on the elevation. Kim knows that the elevation affects the outside temperature. Kim likes the camper warmer at higher elevation. Note that not all of the
13. In this exercise, we continue Exercise 14.(a) Explain what knowledge (about physics and about students) a belief-network model requires.(b) What is the main advantage of using belief networks over using abductive diagnosis or consistency-based diagnosis in this domain?(c) What is the main
12. In a nuclear research submarine, a sensor measures the temperature of the reactor core. An alarm is triggered (A = true) if the sensor reading is abnormally high (S = true), indicating an overheating of the core (C = true). The alarm and/or the sensor could be defective (S_ok = false, A_ok =
11. Explain how to extend VE to allow for more general observations and queries. In particular, answer the following.(a) How can the VE algorithm be extended to allow observations that are disjunctions of values for a variable (e.g., of the form X = a ∨ X = b)?(b) How can the VE algorithm be
10. Consider the following belief network:with Boolean variables (we write A = true as a and A = false as ¬a) and the following conditional probabilities:P(a) = 0.9 P(b) = 0.2 P(c ∣a, b) = 0.1 P(c ∣a, ¬b) = 0.8 P(c ∣ ¬a,b) = 0.7 P(c ∣ ¬a, ¬b) = 0.4 P(d ∣b) = 0.1 P(d ∣ ¬b) = 0.8
• There are three gas sensors that can detect whether gas is leaking (but not which gas); the first gas sensor detects gas from the rightmost valves (v1…v4), the second gas sensor detects gas from the center valves (v5…v12), and the third gas sensor detects gas from the leftmost valves
• A value can be ok, in which case the gas will flow if the valve is open and not if it is closed; broken, in which case gas never flows; stuck, in which case gas flows independently of whether the valve is open or closed; or leaking, in which case gas flowing into the valve leaks out instead of
9. In this question, you will build a belief network representation of the Deep Space 1 (DS1)spacecraft considered in Exercise 10. Figure 5.14 depicts a part of the actual DS1 engine design.Consider the following scenario.• Valves are open or closed.
8. Suppose you want to diagnose the errors school students make when adding multidigit binary numbers. Consider adding two two-digit numbers to form a three-digit number.That is, the problem is of the form:A1A0 + B1B0 C2C1C0 where Ai, Bi, and Ci are all binary digits.(a) Suppose you want to model
7. Represent the same scenario as in Exercise 8 using a belief network. Show the network structure. Give all of the initial factors, making reasonable assumptions about the conditional probabilities (they should follow the story given in that exercise, but allow some noise). Give a qualitative
6. Kahneman [2011, p. 166] gives the following example.A cab was involved in a hit-and-run accident at night. Two cab companies, Green and Blue, operate in the city. You are given the following data:• 85% of the cabs in the city are Green and 15% are Blue.• A witness identified the cab as Blue.
5. Consider the belief network of Figure 8.35, which extends the electrical domain to include an overhead projector.Answer the following questions about how knowledge of the values of some variables would affect the probability of another variable.(a) Can knowledge of the value of
3. Using only the axioms of probability and the definition of conditional independence, prove Proposition 8.5.Consider the belief network of Figure 8.34. This the “Simple diagnostic example” in the AIspace belief network tool at http://www.aispace.org/bayes/. For each of the following, first
2. Prove Proposition 8.1 for finitely many worlds, namely that the axioms of probability (Section 8.1.2) are sound and complete with respect to the semantics of probability. [Hint: For soundness, show that each of the axioms is true based on the semantics. For completeness, construct a probability
1. Bickel et al. [1975] report on gender biases for graduate admissions at UC Berkeley. This example is based on that case, but the numbers are fictional.There are two departments, which we will call dept#1 and dept#2 (so Dept is a random variable with values dept#1 and dept#2) which students can
21. Implement a nearest-neighbor learning system that stores the training examples in a kd-tree and uses the neighbors that differ in the fewest number of features, weighted evenly. How well does this work in practice?
20.(a) Draw a kd-tree for the data of Figure 7.1. The topmost feature to split on should be the one that most divides the examples into two equal classes. (Do not split on UserAction.) Show which training examples are at which leaf nodes.(b) Show the locations in this tree that contain the closest
19. How does a neural network with hidden units as rectified linear units (f(z) = max(0, z))compare to a neural network with sigmoid hidden units? This should be tested on more than one data set. Make sure that the output unit is appropriate for the data set(s).
18. In the neural net learning algorithm, the parameters are updated for each example. To compute the derivative accurately, the parameters should be updated only after all examples have been seen. Implement such a learning algorithm and compare it to the incremental algorithm, with respect to both
17. Run the AIspace.org neural network learner on the data of Figure 7.1.(a) Suppose that you decide to use any predicted value from the neural network greater than 0.5 as true, and any value less than 0.5 as false. How many examples are misclassified initially? How many examples are misclassified
16. Consider the parameters learned for the neural network in Example 7.20. Give a logical formula(or a decision tree) representing the Boolean function that is the value for the hidden units and the output units. This formula should not refer to any real numbers. [Suppose that, in the output of a
15. It is possible to define a regularizer to minimize ????e(errorh(e) + λ * regularizerh) rather than Formula 7.4. How is this different than the existing regularizer? [Hint: Think about how this affects multiple data sets or for cross validation.]Suppose λ is set by k-fold cross validation, and
14. Suggest how the update for the L1 regularizer could be carried out once per example rather than after all examples have been considered. Which do you think would work better, and why?Does this work better or worse in practice than updating once per example?
13. Consider the update step for the L2 regularizer for linear or logistic regression. It is possible to update the weights due to the regularization inside the “for each” loops in Figure 7.8. Does this actually minimize the ridge regression formula? How must λ be modified? Does this work
12. Consider how to estimate the quality of restaurant from the ratings of 1 to 5 stars as in Example 7.17.(a) What would this predict for a restaurant that has two 5-star ratings? How would you test from the data whether this is a reasonable prediction?(b) Suppose you wanted not to optimize just
11. Suppose you want to optimize the sum-of-squares error for the sigmoid of a linear function.(a) Modify the algorithm of Figure 7.8 so that the update is proportional to the gradient of the sum-of-suares error. Note that this question assumes you know differential calculus, in particular, the
10. Consider Equation 7.1, which gives the error of a linear prediction.(a) Give a formula for the weights that minimize the error for the case where n = 2 (i.e., when there are only two input feature). [Hint: For each weight, differentiate with respect to that weight and set to zero.](b) Why is it
9. Show how gradient descent can be used for learning a linear function that minimizes the absolute error. [Hint: Do a case analysis of the error; for each example the absolute value is either the positive or the negative of the value. What is appropriate when the value is zero?]
8. The decision tree learning algorithm of Figure 7.7 has to stop if it runs out of features and not all examples agree.Suppose that you are building a decision tree for a Boolean target feature and you have come to the stage where there are no remaining input features to split on and there are
7. Implement a decision tree learner that handles input features with ordered domains. You can assume that any numerical feature is ordered. The condition should be a cut on a single variable, such as X ≤ v, which partitions the training examples according to the value v. A cut-value can be
6. Extend the decision tree learning algorithm of Figure 7.7 to allow for multiway splits for discrete variables. Assume that the inputs are the input features, the target feature and the training examples. A split is on the values of a feature.One problem that must be overcome is when no examples
5. The aim of this exercise is to determine the size of the space of decision trees. Suppose there are n binary features in a learning problem. How many different decision trees are there? How many different functions are represented by these decision trees? Is it possible that two different
4. Consider the decision tree learning algorithm of Figure 7.7 and the data of Figure 7.1. Suppose, for this question, the stopping criterion is that all of the examples have the same classification.The tree of Figure 7.6 was built by selecting a feature that gives the maximum information gain.This
3. Suppose we have a system that observes a person’s TV watching habits in order to recommend other TV shows the person may like. Suppose that we have characterized each show by whether it is a comedy, whether it features doctors, whether it features lawyers, and whether it has guns.Suppose we
2. In the context of a point estimate of a feature with domain {0, 1} with no inputs, it is possible for an agent to make a stochastic prediction with a parameter p ∈ [0, 1] such that the agent predicts 1 with probability p and predicts 0 otherwise. For each of the following error measures, give
1. The aim of this exercise is to prove and extend the table of Figure 7.5.(a) Prove the optimal predictions for training data of Figure 7.5. To do this, find the minimum value of the absolute error, the sum-of-squares error, the entropy, and the value that gives the maximum likelihood. The maximum
• Overfitting occurs when a prediction fits the training set well but does not fit the test set or future predictions.
• Linear classifiers and decision tree classifiers are representations which are the basis for more sophisticated models.
• Given some training examples, an agent builds a representation that can be used for new predictions.
• Supervised learning is the problem of predicting the target of a new input, given a set of input–target pairs.
• Learning is the ability of an agent to improve its behavior based on experience.
14. The SNLP algorithm is the same as the partial-order planner presented here but, in the protect procedure, the condition is A ≠ A0 and A ≠ A1 and (A deletes G or A achieves G) .This enforces systematicity, which means that for every linear plan there is a unique partialordered plan. Explain
13. The selection algorithm used in the partial-order planner is not very sophisticated. It may be sensible to order the selected subgoals. For example, in the robot world, the robot should try to achieve a carrying subgoal before an at subgoal because it may be sensible for the robot to try to
12. To implement the function add_constraint(A0 < A1, Constraints) used in the partial-order planner, you have to choose a representation for a partial ordering. Implement the following as different representations for a partial ordering.(a) Represent a partial ordering as a set of less-than
11. Give a condition for the CSP planner that, when arc consistency with search fails at some horizon, implies there can be no solutions for any longer horizon. [Hint: Think about a very long horizon where the forward search and the backward search do not influence each other.]Implement it.
10. Explain how multiple-path pruning can be incorporated into a regression planner. When can a node be pruned? See the discusion earlier.
9. For the delivery robot domain, give a nontrivial admissible heuristic function for the regression planner. A nontrivial heuristic function is nonzero for some nodes, and always nonnegative. Does it satisfy the monotone restriction?
8. Explain how the regression planner can be extended to include maintenance goals, for the STRIPS representation. Assume a maintenance goal is a disjunction of assignments of values to variables.
7. In a forward planner, you can represent a state in terms of the sequence of actions that lead to that state.(a) Explain how to check whether the precondition of an action is satisfied, given such a representation.(b) Explain how to do cycle pruning in such a representation. You can assume that
6. Suppose you have a STRIPS representation for actions a1 and a2, and you want to define the STRIPS representation for the composite action a1; a2, which means that you do a1 then do a2.(a) What are the effects for this composite action?(b) When is the composite action impossible? (That is, when
• sweep: sweep the floor the robot is in.(a) Give the STRIPS representation for dust. [Hint: because STRIPS cannot represent conditional effects, you may need to use two separate actions that depend on the robot’s location.](b) Give the feature-based representation for lr_dusty (c) Suppose that
• dust: dust the room the robot is in, as long as the room is dusty and the dustcloth is clean
• move: move to the other room
• Rob_loc is the location of the robot, with values {garage, lr}.Suppose the robot can do one of the following actions at any time:
• Dustcloth_clean is true when the dust cloth is clean.
• Gar_dirty_floor is true when the garage floor is dirty.
• Lr_dirty_floor is true when the living room floor is dirty.
• Gar_dusty is true when the garage is dusty.
• Lr_dusty is true when the living room is dusty.
5. Suppose we must solve planning problems for cleaning a house. Various rooms can be dusted(making the room dust-free) or swept (making the room have a clean floor), but the robot can only sweep or dust a room if it is in that room. Sweeping causes a room to become dusty (i.e., not dust-free). The
4. This exercise involves designing a heuristic function than is better than the heuristic of Example 6.10.(a) For each of the forward and regression planners, test how efective each of the individual parts of the heuristic for Example 6.10 is, as well as the maximum. Explain why the results you
Showing 2500 - 2600
of 4756
First
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Last
Step by Step Answers