New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
database management systems
Modern Database Management 13th Global Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi - Solutions
. Describe the key steps to improve data quality in an organization.
. Explain four reasons why the quality of data is poor in many organizations.
. Define the eight characteristics of quality data.
. What are the four basic facilities for the backup and recovery of a database?
. What are four reasons why data quality is important to an organization?
. How can fuzzy logic, pattern matching, and expert systems be used to improve data quality?
. How can the data capture process be improved?
. Briefly describe four threats to high data availability and at least one measure that can be taken to counter each of these threats.
. What changes can be made in data administration at each stage of the traditional database development life cycle to deliver high-quality, robust systems more quickly?
. Briefly describe four database administration trends that are emerging today.
. What factors must be considered when deciding on an open-source DBMS?
. What functions require the input and involvement of both the data administrator and the database administrator?
. Why are data administrators required to maintain an information repository?
. Indicate whether data administration or database administration is typically responsible for each of the following functions:a. Managing the data repositoryb. Installing and upgrading the DBMSc. Conceptual data modelingd. Managing data security and privacye. Database planningf. Tuning database
. Contrast the following terms:a. chief data officer; DBAb. data administration; database administrationc. open source DBMS; commercial DBMSd. ETL; MDM
. Match the following terms and definitions:data administration database administration master data management data steward open source DBMS Review Questionsa. oversees data quality for a particular data subjectb. a database management system available for free (typically including source code)c.
. Define each of the following terms:a. database administrationb. data administrationc. chief data officerd. master data managemente. open source DBMS
. Another tool featured on TUN is called Tableau. Tableau offers a student version of its product for free (www.tableausoftware.com/academic/students), and TUN offers several assignments and exercises with which you can explore its features. Compare the capabilities of Tableau with those of SAS
. Teradata University Network (TUN) offers access to a tool called SAS Visual Analytics (VA). Use TUN to access SAS VA and complete some of the exercises associated with the tool to become familiar with it. What are your impressions regarding the usefulness of this type of analytical tool?
. Read an SAS white paper (www.sas.com/resources/whitepaper/wp_56343.pdf) on the use of telematics in car insurance. If Fitchwood started to use one of these technologies, what consequences would it have for its IT infrastructure needs?Problems and Exercises 11-29 and 11-30 are based on the use of
. Fitchwood is a relatively small company (annual premium revenues less than $1 billion per year) that insures slightly more than 500,000 automobiles and about 200,000 homes. For what types of purposes might Fitchwood want to use big data technologies (i.e., either Hadoop or one of the NoSQL
. Text mining is an increasingly important subcategory of data mining. Can you identify potential uses of text mining in the context of an insurance company?
. Do you see any opportunities for data mining using the Fitchwood data mart? Research data mining tools and recommend one or two for use with the data mart.
. Using a drawing tool such as Microsoft PowerPoint, design a simple prototype of a top-management dashboard for Fitchwood Insurance Company.
. Suggest some visualization options that Fitchwood managers might want to use to support their decision making.
. Fitchwood management would like to use the data mart for drill-down online reporting. For example, a sales manager might want to view a report of total sales for an agent by month and then drill down into the individual types of policies to see how sales are broken down by type of policy. What
. Review the white paper that has been used as a source for Figure 10-33. Which of the following tasks is the responsibility of data platform, integrated data warehouse, and integrated discovery platform, respectively?a. Finding new, previously unknown relationships within the data.b. Storing very
. For each scenario listed below, identify the following:the type of business analytics, the era of BI&A, the goal of data mining (if applicable), and whether and how big data and analytics have the potential to bring about change in the listed scenario.a. A firm experiencing low sales in a
. Identify six broad categories of implications of big data analytics and decision making.
. How is data quality and management vital in realizing the full potential of big data and analytics?
. Describe the core idea underlying in-database analytics.
. Describe the core idea underlying in-memory DBMSs.
. Describe the mechanism through which prescriptive analytics is dependent on descriptive and predictive analytics.
. How is KNIME used as a predictive analytics tool?
. Discuss why data mining applications are growing rapidly in business.
. Illustrate the goals of data mining and how they answer fundamental business questions.
. Discuss the different types of dashboards and their role in business performance management.
. How does Apache Spark differ from Hadoop?
. Compare and contrast R and Python as computational environments for analytics.
. Briefly describe three types of operations that can easily be performed with OLAP tools.
. Discuss the role of OLAP in the context of descriptive analytics.
. Explain the different tools for querying and analyzing data in traditional data warehouses and marts that enable various forms of descriptive analytics.
. Explain the three different generations of business intelligence and analytics.
. Explain the progression from DSS to analytics through business intelligence.
. Contrast the following terms:a. Data mining; text miningb. ROLAP; MOLAPc. R; Python
. Match the following terms to the appropriate definitions:text mining data mining descriptive analytics analytics predictive analytics prescriptive analyticsa. knowledge discovery using a variety of statistical and computational techniquesb. converting textual data into useful informationc. form
. Define each of the following terms:a. data miningb. online analytical processingc. business intelligenced. predictive analyticse. Apache Spark
. Consider the customer table created in Figure 10-24 and populated with data as shown in Figure 10-27. Write the Hive script that will display the age-groups that exist in the data set and their average incomes.
. Use the Internet to browse the features and offerings of Big Data platforms such as HAVEn and Aster. Prepare a report of your findings.
. Write two HIVE queries, the first to create a PRODUCT table with fields ProdID, Name, Seller, Price; the second to load data into the table from file ProductInfo.csv.Make all necessary assumptions.
. For each situation presented below, illustrate a document as depicted in Figures 10-4 and 10-5 and specify whether it contains an array, an embedded subdocument, relationships, or collections. Use hypothetical data and make necessary assumptions.a. A document containing Books details: Title,
. Review Figure 10-15 and answer the following questions based on it.a. What has happened between Input and Input’?b. Assume that the values associated with each of the keys (k1, k2, and so forth) are counts. What is the purpose of the Shuffle stage?c. If the overall goal is to count the number
. Figure 10-14 describes a simple Hadoop architecture. If a real-world system is implemented using this approach, it will suffer from a specific weakness. Identify what this weakness is and find out what the latest versions of Hadoop have done to address it.
. Assume that the following data regarding Students need to be stored—Name: First Name and Last Name, Roll Number, and Mobile Number. Illustrate with figures how it would be stored in different NoSQL database models.
. Review Figure 10-5 (a). Write a MongoDB query to display all products with review ratings greater than 3 stars and suppress the fields “height” and “width” in the output using the subset of fields text boxes.
. Review Figure 10-3. For each of the formats, identify the elements that are data values and those that are labels describing the data.
. Compare the JSON and XML representations of a record in Figure 10-1. What is the primary difference between these? Can you identify any advantages of one compared to the other?
. HDase and Cassandra share a common purpose. What is it? What is their relationship to HDFS and Google BigTable?
. Explain the implementation of MapReduce on HDFS clusters.
. How does HDFS aid in coping with hardware failure?
. Describe and explain the two main components of MapReduce.
. What is the role of YARN in the management of highly distributed systems?
. List the purposes Hadoop is used for.
. Discuss the features of NoSQL DBMS that ensure high availability but do not guarantee consistency.
. What is the format that can be used to describe database schema besides JSON?
. What is the difference between a wide-column store and a graph-oriented database?
. What is the trade-off one needs to consider while using a NoSQL database management system?
. What is the difference between the explanatory and exploratory goals of data mining?
. Identify the differences between Hadoop and NoSQL technologies.
. What are the two challenges faced in visualizing big data?
. Identify and briefly describe the five Vs that are often used to define big data.
. Contrast the following terms:a. data lake; data warehouseb. Pig; Hivec. volume; velocityd. NoSQL; SQL
. Match the following terms to the appropriate definitions:Hive big data data lake Pig analyticsa. data exist in large volumes and variety and need to processed at a very high speedb. a language that is used to extract, load and transform datac. tool that provides an SQL-like interface for managing
. Define each of the following terms:a. Hadoopb. MapReducec. HDFSd. NoSQLe. Pig
. Visit www.teradatauniversitynetwork.com and use the various business intelligence software products available on this site. Compare the different products based on the types of business intelligence problems for which they are most appropriate. Also, search the content of this Web site for
. Many organizations are now offering cloud-based data warehousing services such as IBM’s dashDB, Amazon’s Redshift, and Microsoft Azure. Pick any three such firms and, using the Internet, compare them based on the factors listed. Prepare a report based on your findings:• Features• Pricing
. Visit an organization that has implemented information systems on a data warehouse, and interview managers to discuss following issues:a. Does increased data collection lead to any information gaps for managers?b. Do they receive information from diverse sources, and how do they increase the
. GROUP BY by itself creates subtotals by category, and the ROLLUP extension to GROUP BY creates even more categories for subtotals. Using all the orders, do a rollup to get total order amounts by product, sales region, and month and all combinations, including a grand total. Display the results
. Because data warehouses and even data marts can become very large, it may be sufficient to work with a subset of data for some analyses. Create a sample of orders from 2004 using the SAMPLE SQL command (which is standard SQL); put a randomized allocation of 10 percent of the rows into the sample.
. Using the MDIFF “ordered analytical function” in Teradata SQL (see the Functions and Operators manual), show the differences (label the difference CHANGE) in TOTAL(which you calculated in the previous Problem and Exercise)from quarter to quarter. Hint: You will likely create a derived table
. Take the query you scrapped from Problem and Exercise 9-58 and modify it to show only the U.S. region grouped by each quarter, not just for 2005 but for all years available, in order by quarter. Label the total orders by quarter with the heading TOTAL and the region ID simply as ID in the result.
. The database you are using was developed by MicroStrategy, a leading business intelligence software vendor. The MicroStrategy software is also available on TUN. Most business intelligence tools generate SQL to retrieve the data they need to produce the reports and charts and to run the models
. Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Are dimension tables conformed in this data mart? Explain.
. Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Explain what dimension data, if any, are maintained to support slowly changing dimensions. If there are slowly changing dimension data,
. Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Explain the methods used in this database for modeling hierarchies.Are hierarchies modeled as described in this chapter?
. After some further analysis, you discover that the Commission field in the Policies table is updated yearly to reflect changes in the annual commission paid to agents on existing policies. Would knowing this information change the way in which you extract and load data into the data mart from the
. What types of data transformations might be needed in order to build the Fitchwood data mart?
. Research some tools that perform data scrubbing. What tool would you recommend for the Fitchwood Insurance Company?
. What types of data pollution/cleansing problems might occur with the Fitchwood OLTP system data?
. The OLTP system data for the Fitchwood Insurance Company is in a series of flat files. What process do you envision would be needed in order to extract the data and create the ERD shown in Figure 9-24? How often should the extraction process be performed? Should it be a static extract or an
. Customers may have relationships with one another (e.g., spouses or parents and children). Redesign your answer to Problem and Exercise 9-48 to accommodate these relationships.
. Agents change territories over time. If necessary, redesign your answer to Problem and Exercise 9-47 to handle this changing dimensional data.
. Would you prefer to normalize (snowflake) the star schema of your answer to Problem and Exercise 9-38? If so, how and why? Redesign the star schema to accommodate your recommended changes.
. Create a star schema for this case study. How did you handle the time dimension?
. Sales and marketing is interested in viewing all sales data by territory, effective date, type of policy, and face value. In addition, the data mart should be able to provide reporting by individual agent on sales as well as commissions earned. Occasionally, the sales territories are
. Pine Valley Furniture wants you to help design a data mart for analysis of sales. The subjects of the data mart are as follows:Salesperson Attributes: SalespersonID, Years with PVFC, SalespersonName, and SupervisorRating.Product Attributes: ProductID, Category, Weight, and
. A firm wants to reduce fluid drilling costs substantially by increasing drilling fluid efficiency. Research finds that both fluid drilling speed and cost are significantly influenced by Time, Geography, Drilling fluid type, Formation, and Well type. Geography refers to Country, Oil field, Block,
. A pharmaceutical retail store manages its current sales, procurement and materials availability at the store through Excel sheets. Owing to the increase in the number of branches in the city, the store manager is now finding this process of data maintenance tedious. She is now banking on the
. A university gathers student admission data from three different sources: through forms filled manually at university desks, by registering at the university Web site, or by registering on the department’s Web site. All the three sources have disparate form structures. Two databases are
. Employees working in IT organizations are assigned different projects for a specific duration, such as a few months or years.The duration is specified by the project start date and end date in the database. The project location is different for each project, so employee location changes with
Showing 900 - 1000
of 3225
First
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Last
Step by Step Answers