Question: A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook

A large meteorological data organisation regularly collects various meteorological data from

A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation. A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The description of the project is in the attachedword document. Please be as detail as possible and include reference as well! \f10-K 10-K 1 hd-1312016x10xk.htm 10-K Table of Contents UNITED STATES...

Provide the following information about Pepsico industry. You will need to choose one NAIS code to answer the following questions. 1. Industry NAIS code and industry description 2. Industry size &...

SFTY 330 - Aircraft Accident Investigation Aircraft Accident Project \".....The devil is in the details\" Assignment: This assignment tests your ability to apply the lessons and information learned...

Please complete the assignment and t explain all the ratios that was use and effect. Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION WASHINGTON, D.C. 20549...

Big Data for Social Innovation By Kevin C. Desouza & Kendra L. Smith Stanford Social Innovation Review Summer 2014 Copyright 2014 by Leland Stanford Jr. University All Rights Reserved Stanford Social...

Please complete the assignment and t explain all the ratios that was use and effect. Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION WASHINGTON, D.C. 20549...

What are the most common ANN architectures? How do they differ from each other?

Explain some examples of input validation checks that you have noticed when filling out forms on websites you have visited.

Data administration tasks include the development of information policies, data planning, _ _ _ _ _ _ _ _ Blank, security, and how internal - users and end - users use that data. Multiple Choice...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

4. How can the characteristics of the trainee affect self-directed learning?

3. Discuss the process of behavior modeling training.

1. What are the strengths and weaknesses of the lecture, the case study, and behavior modeling?