In this problem, you are using PCA to make face recognition. The task is to reproduce...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this problem, you are using PCA to make face recognition. The task is to reproduce the results in the article by Adrian Tam, available at https://machinelearningmastery.com/face-recognition-using-principal- component-analysis/ The dataset we use are the ORL Database of Faces, which is quite of age but we can download it from Kaggle: https://www.kaggle.com/kasikrit/att-database-of-faces/download The file is a zip file of around 4MB. It has pictures of 40 persons and each person has 10 pictures. Total to 400 pictures. When you submit your work to D2L, don't forget to include the data files. To illustrate the capability of using eigenface for recognition, we want to hold out some of the pictures before we generate our eigenfaces. We hold out all the pictures of one person as well as one picture for another person as our test set (in this part, you are required to hold out pictures of different persons from those in the article, so every one may use different pictures to test). The remaining pictures are vectorized and converted into a 2D numpy array. The author used Python to build the model. You can complete the task using R as well. If you use R, then some references can be found online, such as https://rpubs.com/danaecarrerasgarcia/529117. Based on your work and your training data, answer the following questions: (a) (1 point) How many PCs are needed to explain 95% of the variance? (b) (1 point) What is the rank of the data matrix? (c) (1 point) What is the rank of the covariance matrix? (d) (1 point) Given the rank of the covariance matrix, what is the upper bound on the number of non-zero eigenvalues? (e) (4 points) Discuss how do you choose a best model and if your model can correctly identify a person. You can set up a threshold for the L2 distance described in the article. If the best match's distance is less than the threshold, you would consider the face is recognized to be the same person. If the distance is above the threshold, you claim the picture is someone we never saw even if a best match can be find numerically. In this problem, you are using PCA to make face recognition. The task is to reproduce the results in the article by Adrian Tam, available at https://machinelearningmastery.com/face-recognition-using-principal- component-analysis/ The dataset we use are the ORL Database of Faces, which is quite of age but we can download it from Kaggle: https://www.kaggle.com/kasikrit/att-database-of-faces/download The file is a zip file of around 4MB. It has pictures of 40 persons and each person has 10 pictures. Total to 400 pictures. When you submit your work to D2L, don't forget to include the data files. To illustrate the capability of using eigenface for recognition, we want to hold out some of the pictures before we generate our eigenfaces. We hold out all the pictures of one person as well as one picture for another person as our test set (in this part, you are required to hold out pictures of different persons from those in the article, so every one may use different pictures to test). The remaining pictures are vectorized and converted into a 2D numpy array. The author used Python to build the model. You can complete the task using R as well. If you use R, then some references can be found online, such as https://rpubs.com/danaecarrerasgarcia/529117. Based on your work and your training data, answer the following questions: (a) (1 point) How many PCs are needed to explain 95% of the variance? (b) (1 point) What is the rank of the data matrix? (c) (1 point) What is the rank of the covariance matrix? (d) (1 point) Given the rank of the covariance matrix, what is the upper bound on the number of non-zero eigenvalues? (e) (4 points) Discuss how do you choose a best model and if your model can correctly identify a person. You can set up a threshold for the L2 distance described in the article. If the best match's distance is less than the threshold, you would consider the face is recognized to be the same person. If the distance is above the threshold, you claim the picture is someone we never saw even if a best match can be find numerically.
Expert Answer:
Answer rating: 100% (QA)
solution aThe number of PCs necessary to explain 95 of the variance can be found by calculating the ... View the full answer
Related Book For
Operations and Supply Chain Management
ISBN: 978-0078024023
14th edition
Authors: F. Robert Jacobs, Richard Chase
Posted Date:
Students also viewed these accounting questions
-
In this problem you will price various options with payoffs based on the Eurostoxx index and the dollar/euro exchange rate. Assume thatQ= 2750 (the index), x = 1.25 ($/=C), s = 0.08 (the exchange...
-
In this problem you will price various options with payoffs based on the Eurostoxx index and the dollar/euro exchange rate. Assume that Q= 2750 (the index), x = 1.25 ($/=C), s = 0.08 (the exchange...
-
In this problem you will compute January 12 2004 bid and ask volatilities (using the Black-Scholes implied volatility function) for 1-year IBM options expiring the following January. Note that IBM...
-
The city of Toledo has received a proposal to build a new multipurpose outdoor sports stadium. The expected life of the stadium is 20 years. It will be financed by a 20- year bond paying 8 percent...
-
Describe the following requirements that help to achieve audit quality and thereby help to minimize the exposure of external auditors to lawsuits: a. Maintaining auditor independence b. Participating...
-
Calculate the annual percentage rate of forgoing the cash discount under each of the following credit terms: a. 2/10, net 60 b. 2/10, net 30
-
Use the Hubble expansion relation (9.1.1), the temperature scaling relation (9.1.3), and the energy density relation before the electron-positron annihilation (9.3.6b) to show that the temperature as...
-
Presented below is the adjusted trial balance of Kelly Corporation at December 31, 2008. Additional information: 1. Net loss for the year was $2,500. 2. No dividends were declared during 2008....
-
The money supply process involves various factors, including the actions of commercial banks, the central bank's monetary policy, and the public's demand for money. My questions are: How do these...
-
1. Sharps Sandwich Shop has two conflicting quality issues speed and freshness. The premade sandwich system enhances speed; however, it has the potential of affecting freshness. What type of system...
-
The assessment asks you to draw on your personal experience and understanding of your potential to be innovative and/or entrepreneurial. This assessment is intended to get you to reflect on the key...
-
Which of the following is not a behavioral indicator of fraud? 1. Buying new luxury cars or wearing expensive jewelry 2. Acting irritable, defensive, or in a belligerent manner 3. Not taking...
-
Which of the following is not a technique to conceal inventory shrinkage? 1. Counting and valuing the physical inventory at the end of each year 2. Writing off inventory after physical inventory...
-
Mr. Kozlowski forgave relocation loans given to 51 employees and paid their corresponding income taxes at a cost to Tyco shareholders of $96 million. How would you hypothesize that this act impacted...
-
Which of the following is generally least effective in detecting noncash misappropriations? 1. Physical inventory counts and inspections 2. Investigation of inventory shortages 3. Summarizing...
-
d that Fairmont is not in complianThe forensic audit has determinece with Federal withholding requirements for FICA and Medicare because FICA and Medicare were not withheld from employee paychecks...
-
Determine the domain of the following functions and represent in an (x, y)-plane. (a) f(x,y) = ln ((16-x - y)(x + y 4)) (b) f(x,y)=6-2x-3y
-
Q1) What is the a3 Value Q2) What is the a7 Value Q3) What is the a4 Value Q4) What is the b3 Value Q5) What is the b2 Value Q6) What is the sign of 2nd constraint? A pastry chef at a bakery wants to...
-
What are characteristics of efficient, responsive, risk- hedging, and agile supply chains? Can a supply chain be both efficient and responsive? Risk- hedging and agile? Why or why not?
-
Which industry will typically have a faster learning rate: a repetitive electronics manufacturer or a manufacturer of large complex products such as a shipbuilder?
-
A quoting department for a custom publishing house can complete 4 quotes per day, and there are 20 quotes in various stages in the department. Applying Littles law, the current lead time for a quote...
-
Stadler Corporations federal income tax rate is 34 percent. It reports $100,000 depreciation expense on its financial statements and deducts $140,000 depreciation expense on its tax return. How...
-
Which of the following items is not deductible? a. Dues for club used solely for business meetings b. Directly related business entertainment c. Business gift of less than $25 in value d. Dues for...
-
John is a teacher at a local high school. During 2017, he travels three days per week to a school in the next county to work with gifted children in an after-school program that does not end until...
Study smarter with the SolutionInn App