All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Ask a Question
Search
Search
Sign In
Register
study help
computer science
principles of database management
Questions and Answers of
Principles Of Database Management
Which of the following is not a characteristic of a data warehouse?a. Subject-oriented.b. Integrated.c. Time-variant.d. Volatile.
How is a data warehouse defined according to Bill Inmon? Elaborate on each of the characteristics and illustrate with examples.
In terms of data manipulation, a data warehouse focuses on…a. Insert/Update/Delete/Select statements.b. Insert/Select statements.c. Select/Update statements.d. Delete statements.
Discuss and contrast each of the following data warehouse schemas:• Star schema;• Snowflake schema;• Fact constellation.
What are surrogate keys? Why would you use them in a data warehouse instead of using the business keys from the operational systems?
Which statement is correct?a. A star schema has one large central dimension table which is connected to various smaller fact tables.b. The dimension tables of a star schema contain the criteria for
Discuss four approaches to deal with slowly changing dimensions in a data warehouse. Can any of these approaches be used to deal with rapidly changing dimensions?
Which statement is not correct?a. A snowflake schema normalizes the fact table of a star schema.b. A fact constellation schema has more than one fact table which can share dimension
Consider the following OLAP Cube:• Give an example of a…• Roll-up operation;• Drill-down operation;• Slicing operation;• Dicing operation.
Explain and illustrate the following concepts:• Independent data mart;• Virtual data warehouse;• Operational data store;• Data lake.
What is windowing? Illustrate a query with windowing using the above table.Given the following table:Consider the following queries:What is the output of the above queries?Can you reformulate each
Which statement is not correct?a. Junk dimensions can be defined to efficiently accommodate lowcardinality attribute types such as flags or indicators.b. An outrigger table can be defined to store
Which statement about ETL is not correct?a. Some estimates state that the ETL step can consume up to 80% of all efforts needed to set up a data warehouse.b. To decrease the burden on both the
Which statement is not correct?a. A data mart is a scaled-down version of a data warehouse aimed at meeting the information needs of a homogeneous small group of endusers such as a department or
Which statement is correct?a. A key distinguishing property of a data lake is that it stores raw data in its native format, which could be structured, unstructured, or semistructured.b. A data lake
Which statement is not correct?a. Query and reporting tools are an essential component of a comprehensive business intelligence solution.b. A pivot or cross-table is a popular data summarization
Which statement is not correct?a. Multidimensional OLAP (MOLAP) stores the multidimensional data using a multidimensional DBMS (MDBMS) whereby the data are stored in a multidimensional array-based
Which statement is correct?a. Roll-up (or drill-up) refers to aggregating the current set of fact values within or across one or more dimensions.b. Roll-down (or drill-down) de-aggregates the data
Ideally, data integration should include…a. Only data.b. Only processes.c. Both processes and data.
Give some examples of operational business intelligence.
Which statement is not correct?a. Analytics techniques are more and more used at the operational level as well by front-line employees.b. Analytics for tactical/strategic decision-making
Conduct an illustrated SWOT analysis of data consolidation versus data integration versus data propagation.
Which statement is not correct?a. The essence of data consolidation as a data integration pattern is to capture the data from multiple, heterogeneous source systems and integrate it into a single
What is data virtualization and what can it be used for? How does it differ from data consolidation, data federation, and data propagation?
The federation pattern typically follows…a. A pull approach.b. A push approach.
What is meant by “Data as a Service”? How does this relate to cloud computing? What kind of data-related services can be hosted in the cloud? Illustrate with examples.
Enterprise information integration (EII) is an example of…a. Data consolidation.b. Data integration.c. Data propagation.d. Data replication.
Discuss two types of dependencies that should be appropriately managed to guarantee the successful overall process execution. What patterns can be used to manage these dependencies?
Enterprise application integration (EAI) and enterprise data replication (EDR) are examples of…a. Data consolidation.b. Data federation.c. Data propagation.d. Data virtualization.
Discuss and contrast the following three service types: workflow services, activity services, and data services. Illustrate with an example.
Which statement is not correct?a. Data virtualization isolates applications and users from the actual (combinations of) data integration patterns used.b. Data virtualization extensively uses data
Discuss how different data services can be realized according to different data integration patterns.
Which statement is not correct?a. Process integration is to integrate and harmonize the various business processes in an organization as much as possible.b. The control flow perspective of a
How can full-text documents be indexed? Illustrate with an example.
Process execution languages such as WS-BPEL aim at managing…a. Only the control flow.b. Only the data flow.c. Both the control and data flow.
How do web search engines work? Illustrate in the case of Google.
The choreography pattern to manage sequence and data dependencies is a…a. Centralized approach.b. Decentralized approach.
Discuss the impact of data lineage on data quality. Illustrate with examples.
Which statement is correct?a. The prevalent approach for indexing full-text documents is an inverted index.b. SQL is well suited to query structured collections of records as well as unstructured
What is data governance and why is it important?
Which statement is not correct?a. Master data management (MDM) compromises a series of processes, policies, standards, and tools to help organizations define and provide multiple points of reference
Discuss and contrast the following data governance frameworks: Total Data Quality Management (TDQM); Capability Maturity Model Integration (CMMI); Data Management Body of Knowledge (DMBOK); Control
What do the 5 Vs of Big Data stand for?a. Volume, variety, velocity, veracity, value.b. Volume, visualization, velocity, variety, value.c. Volume, variety, velocity, variability, value.d. Volume,
Discuss some application areas where the usage of streaming analytics (such as provided by Spark Streaming) might be valuable. Consider Twitter, but also other contexts.
Which of the following statements is not correct?a. Velocity in Big Data refers to data “in movement”.b. Volume in Big Data refers to data “at rest”.c. Veracity in Big Data refers to data
Think about some examples of Big Data in industry. Try to focus on Vs other than the volume aspect of Big Data. Why do you think these examples qualify as Big Data?
Which components does the base Hadoop stack include?a. NDFS, MapReduce, and YARN.b. HDFS, MapReduce, and YARN.c. HDFS, Map, and Reduce.d. HDFS, Spark, and YARN.
Both Hortonworks (Hortonworks Hadoop Sandbox) and Cloudera (Cloudera QuickStart VM) offer virtual instances (for Docker, VirtualBox, and VMWare) providing a full Hadoop stack you can easily run
Which of the following statements is correct?a. DataNodes in HDFS store a registry of metadata.b. The HDFS NameNode sends regular heartbeat messages to its DataNodes.c. HDFS is composed of a
Some analysts have argued that Big Data is fundamentally about data “plumbing”, and not about insights or deriving interesting patterns. It is argued that value (the fifth V) can just as easily
Which of the following statements is not correct?a. A mapper in Hadoop maps each element in a collection to one or more output elements.b. A reducer in Hadoop reduces a collection of elements to
If Spark’s GraphX library provides a number of interesting algorithms for graph-based analysis, do you think that graph-based NoSQL databases are still necessary? Why? If you’re interested, try
Which of the following statements is not correct?a. Apart from handling MapReduce programs, YARN can also be used to manage other types of applications.b. YARN’s JobHistoryServer keeps a log of
Which of the following commands are not a part of HBase?a. Place.b. Put.c. Get.d. Describe.
Which of the following statements is correct?a. HBase can be considered as a NoSQL database.b. HBase offers an SQL engine to query its data.c. MapReduce programs cannot be used with HBase. Data
Pig is…a. A programming language that can be used to query HDFS data.b. A project offering a programming language to provide more userfriendliness compared to MapReduce programs.c. A database
Which of the following statements is not correct?a. Hive offers an SQL engine to query Hadoop data.b. Hive’s query language is not as feature-complete as the full SQL standard.c. Hive offers a
Which of the following schema-handling methods does Hive apply?a. Schema on write.b. Schema on load.c. Schema on read.d. Schema on query.
Which of the following statements is not correct?a. RDDs allow for two forms of operations: transformations and actions.b. RDDs represent an abstract, immutable data structure.c. RDDs are
Which of the following is not one of the reasons why Spark programs are generally faster than MapReduce operations?a. Because Spark tries to keep its RDDs in memory as long as possible.b. Because
Which of the following statements is not correct?a. Spark SQL exposes DataFrame and Dataset APIs which underlyingly use RDDs together with a performant SQL query engine.b. Spark SQL can be used
Which of the following statements is correct?a. One of the disadvantages of Spark is that it does not support streaming data.b. One of the disadvantages of Spark is that its streaming and machine
OLAP (on-line analytical processing) can help in which of the following steps of the analytics process?a. Data collection.b. Data visualization.c. Data transformation.d. Data denormalization.
Discuss the key activities when pre-processing data for credit scoring. Remember, credit scoring aims at distinguishing good payers from bad payers using application characteristics such as age,
The GIGO principle mainly relates to which aspect of the analytics process?a. Data selection.b. Data transformation.c. Data cleaning.d. All of the above.
Consider the following dataset of predicted scores and actual target values (you can assume higher scores should be assigned to the goods).• Calculate the classification accuracy, sensitivity, and
What are the key differences between logistic regression and decision trees? Give examples of when to prefer one above the other.
Which of the following statements is correct?a. Missing values should always be replaced or removed.b. Outliers should always be replaced or removed.c. Missing values and outliers can potentially
Which of the following strategies can be used to deal with missing values?a. Keep.b. Delete.c. Replace/impute.d. All of the above.
Discuss how association and sequence rules can be used to build recommender systems such as the ones adopted by Amazon, eBay, and Netflix. How would you evaluate the performance of a recommender
Explain k-means clustering using a small (artificial) dataset. What is the impact of k? What pre-processing steps are needed?
Outlying observations which represent erroneous data are treated using…a. Missing value procedures.b. Truncation or capping.
Examine the following decision tree:According to the decision tree, an applicant with Income > $50,000 and High Debt = Yes is classified as:a. Good risk.b. Bad risk. Yes No Good Risk Income>
Discuss an example of social network analytics. How is it different from classical predictive or descriptive analytics?
The Internet of Things (IoT) refers to the network of interconnected things such as electronic devices, sensors, software, and IT infrastructure that create and add value by exchanging data with
Decision trees can be used in the following applications:a. Credit risk scoring.b. Credit risk scoring and churn prediction.c. Credit risk scoring, churn prediction, and customer profile
Many companies nowadays are investing in analytics. Also, for universities, there are plenty of opportunities to use analytics for streamlining and/or optimizing processes. Examples of applications
Consider a dataset with a multiclass target variable as follows: 25% bad payers, 25% poor payers, 25% medium payers, and 25% good payers. In this case, the entropy will be…a. Minimal.b. Maximal.
Which of the following measures cannot be used to make the splitting decision in a regression tree?a. Mean squared error (MSE).b. ANOVA/F-test.c. Entropy.
Bootstrapping refers to…a. Drawing samples with replacement.b. Drawing samples without replacement.
Clustering, association rules, and sequence rules are examples of…a. Predictive analytics.b. Descriptive analytics.
Given the following five transactions:T1 {K, A, D, B}T2 {D, A, C, E, B}T3 {C, A, B, D}T4 {B, A, E}T5 {B, E, D},consider the association rule R: A ➔ BD.Which statement is correct?a. The support of
The aim of clustering is to come up with clusters such that the…a. Homogeneity within a cluster is minimized and the heterogeneity between clusters is maximized.b. Homogeneity within a cluster is
Which statement about the adjacency matrix representing a social network is not correct?a. It is a symmetric matrix.b. It is sparse since it contains a lot of non-zero elements.c. It can include
Which statement is correct?a. The geodesic represents the longest path between two nodes.b. The betweenness counts the number of the times that a node or edge occurs in the geodesics of the
Featurization refers to…a. Selecting the most predictive features.b. Adding more local features to the dataset.c. Making features (= inputs) out of the network characteristics.d. Adding more
Which of the following activities are part of the post-processing step?a. Model interpretation and validation.b. Sensitivity analysis.c. Model representation.d. All of the above.
Is the following statement true or false? “All given success factors of an analytical model, i.e., relevance, performance, interpretability, efficiency, economical cost, and regulatory compliance,
Which role does a database designer have according to the RACI matrix?a. Responsible.b. Accountable.c. Support.d. Consulted.e. Informed.
Which of the following costs should be included in a total cost of ownership (TCO) analysis?a. Acquisition costs.b. Ownership and operation costs.c. Post-ownership costs.d. All of the above.
Which of the following statements is not correct?a. ROI analysis offers a common firm-wide language to compare multiple investment opportunities and decide which one(s) to go for.b. For companies
Which of the following is not a risk when outsourcing analytics?a. The fact that all analytical activities need to be outsourced.b. The exchange of confidential information.c. Continuity of the
Which of the following is not an advantage of open-source software for analytics?a. It is available for free.b. A worldwide network of developers can work on it.c. It has been thoroughly
Which of the following statements is correct?a. When using on-premises solutions, maintenance or upgrade projects may even go by unnoticed.b. An important advantage of cloud-based solutions
Which of the following are interesting data sources to consider to boost the performance of analytical models?a. Network data.b. External data.c. Unstructured data such as text data and multimedia
Which of the following statements is correct?a. Quality of data is key to the success of any analytical exercise since it has a direct and measurable impact on the quality of the analytical model
To guarantee maximum independence and organizational impact of analytics, it is important that…a. The chief data officer (CDO) or chief analytics officer (CAO) reports to the CIO or CFO.b. The
What is the correct ranking of the following analytics applications in terms of maturity?a. Marketing analytics (most mature), risk analytics (medium mature), HR analytics (least mature).b. Risk
Showing 300 - 400
of 398
1
2
3
4