Question: A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook

 A large meteorological data organisation regularly collects various meteorological data from

A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation. A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!