Question: Please explain further Big Data workloads Workload management as it pertains to Big Data is completely different from traditional data and its management. The major
Please explain further
Big Data workloads Workload management as it pertains to Big Data is completely different from traditional data and its management. The major areas where workload definitions are important to understand for design and processing efficiency include: Data is file based for acquisition and storagewhether you choose Hadoop, NoSQL, or any other technique, most of the Big Data is file based. The underlying reason for choosing file-based management is the ease of management of files, replication, and ability to store any format of data for processing. Data processing will happen in three steps: 1. Discoveryin this step the data is analyzed and categorized. 2. Analysisin this step the data is associated with master data and metadata. 3. Analyticsin this step the data is converted to metrics and structured. Technology choices 177 Each of these steps bring a workload characteristic: Discovery will mandate interrogation of data by users. The data will need to be processed where it is and not moved across the network. The reason for this is due to the size and complexity of the data itself, and this requirement is a design goal for Big Data architecture. Compute and process data at the storage layer. Analysis will mandate parsing of data with data visualization tools. This will require minimal transformation and movement of data across the network. Analytics will require converting the data to a structured format and extracting for processing to the data warehouse or analytical engines. Big Data workloads are drastically different from the traditional workloads due to the fact that no database is involved in the processing of Big Data. This removes a large scalability constraint but adds more complexity to maintain file system-driven consistency. Another key factor to remember is there is no transaction processing but rather data processing involved with processing Big Data. These factors are the design considerations when building a Big Data system, which we will discuss in Chapters 10 and 11. Big Data workloads from an analytical perspective will be very similar to adding new data to the data warehouse. The key difference here is the tables that will be added are of the narrow/narrow type, but the impact on the analytical model can be that of a wide/narrow table that will become wide/wide. Big Data query workloads are more program execution of MapReduce code, which is completely opposite of executing SQL and optimizing for SQL performance. The major difference in Big Data workload management is the impact of tuning the data processing bottlenecks results in linear scalability and instant outcomes, as opposed to the traditional RDBMS world of data management. This is due to the file-based processing of data, the self-contained nature of the data, and the maturity of the algorithms on the infrastructure itself.
Thank you.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
