Question: Assessment Brief - Assessment 3 - Using Map - Reduce for processing big data Unit Code / Description ICT 3 1 3 Big Data for

Assessment Brief- Assessment 3- Using Map-Reduce for processing big data Unit Code/Description ICT313 Big Data for Software Development Course/Subject Bachelor of Information Technology Semester S1-2024 Unit Learning Outcomes Addressed ULO3: Critically assess and implement advanced data pre-processing and analytics strategies in a software development context, focusing on tasks like data cleansing, transformation, and feature selection. ULO4: Design, develop, and evaluate big data solutions using programming models like Map-Reduce and technologies like Hadoop, tailored specifically to address software development needs such as DevOps integration and quality assurance. Assessment Objective The objective of this assessment is to assess students knowledge and practical skills in working with large-scale datasets and leveraging Hadoop ecosystem tools and technologies for data processing and analysis. Assessment Title/Type Assessment 3: Using Map-Reduce for processing big data (Group Assignment) Due Date Week 11, Friday, 5.00 PM Weighting 20% Instructions to Students See the assignment description in below Format/Structure Ms Word or PDF for the report, dataset and code files Word/Page limit length of 500 words for the report, font Calibri 12 Referencing Style American Psychological Association (APA) Submission Guidelines All work must be submitted on Moodle by the due date Only one member of each group needs to submit A PDF or Ms Word file must be submitted which includes all required steps, discussion and evidence of completion of tasks Students must present a demo of the project to their lecturer in week 12, otherwise, they will receive no mark for their submission. Plagiarism and Academic Integrity At CIHE, we take academic integrity seriously and expect all students to maintain the highest standards of honesty and ethical behaviour in their academic work. As a student, it is your responsibility to ensure that all your academic endeavors are conducted with integrity and in accordance with the principles of honesty, fairness, and respect for intellectual property. Please refer to CIHE Student Academic Integrity and Honesty Policy in the Moodle for details. Late Submission Policy An assessment item submitted after the assessment due date, without an approved extension or without approved mitigating circumstances, will be penalised. The standard penalty is the reduction of the mark allocated to the assessment item by 10% of the total mark applicable for the assessment item, for each day or part day that the item is late. Assessment items submitted more than ten days after the assessment due date are awarded zero marks. Assignment Description (Total marks 20) Supporting Materials All supporting materials for this assessment can be found in Hadoop Files folder in Moodle: 1- A virtual machine has been prepared for you on which Ubuntu and Hadoop have been installed and configured (Hadoop Virtual Machine). All files related to the virtual machine can be found in the zip file Hadoop-VM.zip. You need to download the Zip file and open it on your computers hard drive. Then, you need to install VMWare Player on your computer and open the virtual machine file. 2- Virtual Machine Tutorial (Part 1 and 2) is a tutorial video on how to use the virtual machine. It shows step by step on how you can you start Hadoop and run a WordCount example. 3- Hadoop Tutorial.PDF also provides you with detailed instructions on how to start Hadoop and run WordCount example. Instructions The following file contains user ratings for Amazon products: https://www.kaggle.com/datasets/saurav9786/amazon-product-reviews?resource=download Each user has rated at least one product. The format of the data file is CSV and contains four columns: User ID, Product ID, Rating, Timestamp. Rating is from 1 to 5. The timestamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file A000681618A3WRMCK53V B0002Y5WZM 21383609600 Is interpreted as follows: User A000681618A3WRMCK53V has rated product B0002Y5WZM,2/5 at time 1383609600(Tuesday, Nov 05201311:00:00, Australian Eastern Daylight Time). Your task is to use MapReduce programming and find the following information for each product: the average rating and the number of users who rated this product. Here is an example of the output: Product ID Average Rating Number of Users Rated 03217329443.4604398863414.59 You can choose the output format. However, the required information must be included in the output. You need to include the output file in your submission. Deliverable You need to submit an MS Word or a PDF file which includes the following items: - A compete Team Contribution Declaration (see the template in the next page)- The source code for map and reduce function (copied/pasted into the MS Word or PDF file; no separate file is needed).- The output file. - Enough screenshots on

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!