Question: Assessment Brief - Assessment 3 - Using Map - Reduce for processing big data Unit Code / Description ICT 3 1 3 Big Data for
Assessment Brief Assessment Using MapReduce for processing big data Unit CodeDescription ICT Big Data for Software Development CourseSubject Bachelor of Information Technology Semester S Unit Learning Outcomes Addressed ULO: Critically assess and implement advanced data preprocessing and analytics strategies in a software development context, focusing on tasks like data cleansing, transformation, and feature selection. ULO: Design, develop, and evaluate big data solutions using programming models like MapReduce and technologies like Hadoop, tailored specifically to address software development needs such as DevOps integration and quality assurance. Assessment Objective The objective of this assessment is to assess students knowledge and practical skills in working with largescale datasets and leveraging Hadoop ecosystem tools and technologies for data processing and analysis. Assessment TitleType Assessment : Using MapReduce for processing big data Group Assignment Due Date Week Friday, PM Weighting Instructions to Students See the assignment description in below FormatStructure Ms Word or PDF for the report, dataset and code files WordPage limit length of words for the report, font Calibri Referencing Style American Psychological Association APA Submission Guidelines All work must be submitted on Moodle by the due date Only one member of each group needs to submit A PDF or Ms Word file must be submitted which includes all required steps, discussion and evidence of completion of tasks Students must present a demo of the project to their lecturer in week otherwise, they will receive no mark for their submission. Plagiarism and Academic Integrity At CIHE, we take academic integrity seriously and expect all students to maintain the highest standards of honesty and ethical behaviour in their academic work. As a student, it is your responsibility to ensure that all your academic endeavors are conducted with integrity and in accordance with the principles of honesty, fairness, and respect for intellectual property. Please refer to CIHE Student Academic Integrity and Honesty Policy in the Moodle for details. Late Submission Policy An assessment item submitted after the assessment due date, without an approved extension or without approved mitigating circumstances, will be penalised. The standard penalty is the reduction of the mark allocated to the assessment item by of the total mark applicable for the assessment item, for each day or part day that the item is late. Assessment items submitted more than ten days after the assessment due date are awarded zero marks. Assignment Description Total marks Supporting Materials All supporting materials for this assessment can be found in Hadoop Files folder in Moodle: A virtual machine has been prepared for you on which Ubuntu and Hadoop have been installed and configured Hadoop Virtual Machine All files related to the virtual machine can be found in the zip file HadoopVMzip. You need to download the Zip file and open it on your computers hard drive. Then, you need to install VMWare Player on your computer and open the virtual machine file. Virtual Machine Tutorial Part and is a tutorial video on how to use the virtual machine. It shows step by step on how you can you start Hadoop and run a WordCount example. Hadoop Tutorial.PDF also provides you with detailed instructions on how to start Hadoop and run WordCount example. Instructions The following file contains user ratings for Amazon products: https:wwwkaggle.comdatasetssauravamazonproductreviews?resourcedownload Each user has rated at least one product. The format of the data file is CSV and contains four columns: User ID Product ID Rating, Timestamp. Rating is from to The timestamps are unix seconds since UTC. For example, the following line of the file AAWRMCKV BYWZM Is interpreted as follows: User AAWRMCKV has rated product BYWZM at time Tuesday Nov :: Australian Eastern Daylight Time Your task is to use MapReduce programming and find the following information for each product: the average rating and the number of users who rated this product. Here is an example of the output: Product ID Average Rating Number of Users Rated You can choose the output format. However, the required information must be included in the output. You need to include the output file in your submission. Deliverable You need to submit an MS Word or a PDF file which includes the following items: A compete Team Contribution Declaration see the template in the next page The source code for map and reduce function copiedpasted into the MS Word or PDF file; no separate file is needed The output file. Enough screenshots on
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
