Comprehensive Guide to Data Analysis Concepts and Techniques

Flashcard Icon

Flashcard

Learn Mode Icon

Learn Mode

Match Icon

Match

Coming Soon!
Library Icon

Library

View Library
Match Icon

Create

Create More Decks
Flashcard Icon Flashcards
Flashcard Icon Flashcards
Library Icon Library
Match Icon Match (Coming Soon)

Computer Science - Computer Graphics

View Results
Full Screen Icon

user_striner Created by 9 mon ago

Cards in this deck(89)
Observations or measurements represented as text, numbers, or multimedia are known as _____.
Blur Image
A structured collection of data associated with a unique body of work is called a _____.
Blur Image
An organized collection of data stored as multiple datasets is referred to as a _____.
Blur Image
In a data table, a given row is known as an _____.
Blur Image
The entire collection of data in a table is referred to as a _____.
Blur Image
A column in a data table that varies from person to person, such as gender or name, is called a _____.
Blur Image
The two types of variables are 1) numerical and 2) _____.
Blur Image
Types of numerical variables include 1) continuous and 2) _____.
Blur Image
Types of categorical variables include 1) nominal and 2) _____.
Blur Image
A measurement such as time, weight, or distance that can be a fraction is known as a _____.
Blur Image
A count, such as the number of cars or students, that cannot be a fraction is called a _____.
Blur Image
A variable related to name, where you cannot say which one is more or less, is a _____.
Blur Image
A variable that can be ordered, such as from smallest to largest, is called an _____.
Blur Image
The difference between a variable and an attribute is that attributes must tell something specific and unique about the data, such as _____.
Blur Image
A way of showing a variable on the x-axis using shading or stacking to represent count is called _____.
Blur Image
True or False: Dot plots only work well with small datasets because they can show the exact value.
Blur Image
A graphical representation that provides a view of data density using bins and has no gaps between bars is called a _____.
Blur Image
Bar plots are used for what type of data? _____.
Blur Image
Dot plots are used for what type of data? _____.
Blur Image
Histograms are used for what type of data? _____.
Blur Image
The factor that determines the density and shape of a histogram, including bin width and number of bins, is called _____.
Blur Image
The measure of asymmetry that tells you the shape of distribution, indicating where the tail is, is known as _____.
Blur Image
A distribution that is right skewed, meaning most of the data is on the left, has a _____.
Blur Image
A distribution that is left skewed, meaning most of the data is on the right, has a _____.
Blur Image
A distribution with equal tailing in both directions, where most of the data is in the center, is called _____.
Blur Image
Two measures that show the shape of a histogram or distribution are _____.
Blur Image
The measure that represents the prominent peak of the distribution is called the _____.
Blur Image
True or False: If a distribution is left or right skewed, it means the distribution is unimodal.
Blur Image
A distribution with one prominent peak is described as _____.
Blur Image
A distribution with two prominent peaks is described as _____.
Blur Image
A distribution with no prominent peaks is described as _____.
Blur Image
A distribution with more than two prominent peaks is described as _____.
Blur Image
Statistics that summarize and provide information about your sample data, including the mean and skewness, are called _____.
Blur Image
The three major categories of summary statistics are 1) measures of location/central tendency, 2) measures of spread, and 3) _____.
Blur Image
Types of 'measures of location/central tendency' include 1) mean, 2) median, and 3) _____.
Blur Image
Types of 'measures of spread' include 1) range, 2) skewness, 3) quartiles, and 4) _____.
Blur Image
Types of 'graphs and charts' include 1) histograms, 2) box plots, and 3) _____.
Blur Image
The central value of the data, calculated as the sum of all data points divided by the number of data points, is called the _____.
Blur Image
If the data is ordered from smallest to largest, the observation right in the middle is called the _____.
Blur Image
True or False: There can be a difference in where the mean vs. median say where the center is.
Blur Image
Is the mean robust? Is the median? _____.
Blur Image
If the histogram is skewed, then mean _____ median.
Blur Image
If the histogram is close to symmetric, then mean and median are _____.
Blur Image
What is the significance of a variable in relation to quantile and more? _____.
Blur Image
What are the measures of spread components? 1) range, 2) skewness, 3) quartiles, 4) interquartile range, 5) variance, and 6) _____.
Blur Image
The term that shows whether the values are tightly clustered or dispersed is called _____.
Blur Image
The median is _____ or _____% quantile.
Blur Image
Interpreting the 0.5 or 50% quantile means that 50% of the data are above and 50% are _____.
Blur Image
A more general term that describes splitting the data into groups that contain the same number of data points is called a _____.
Blur Image
A quantile that divides the data into four equally sized groups, even when there are not 100 observations, is called a _____.
Blur Image
The 0.75 quantile says that _____% of the observations are below that value.
Blur Image
The 50th percentile means that _____% of the data is above that value and 50% is below it.
Blur Image
The 0th percentile or 0% quantile occurs when there are no values that are _____ that given value.
Blur Image
To calculate a percentile, use the formula: total # observations below that # / total # of observations in the _____.
Blur Image
Q1/Q2/Q3 divides the data into _____.
Blur Image
Q1, the first quartile, is the 0.25 quantile or 25th percentile, below which _____% of the data falls.
Blur Image
Q2, the second quartile, is the 0.50 quantile or 50th percentile, below which _____% of the data falls.
Blur Image
Q3, the third quartile, is the 0.75 quantile or 75th percentile, below which _____% of the data falls.
Blur Image
Quantiles are a broader category encompassing quartiles and _____.
Blur Image
The highest minus the lowest value, which tells you about dispersion, is called the _____.
Blur Image
A limitation of range is that it only considers two extreme values and is too sensitive to _____.
Blur Image
The IQR can be used to find extreme values or _____.
Blur Image
A value that is very distant from the other values in the dataset is called an _____.
Blur Image
The lower outlier gate is calculated as Q1 - 1.5(IQR), which tells you about _____ extreme values.
Blur Image
The upper outlier gate is calculated as Q3 + 1.5(IQR), which tells you about _____ extreme values.
Blur Image
A box plot summarizes the data set using five statistics and also plots unusual observations, known as _____.
Blur Image
The max whisker in a box plot is also known as the _____.
Blur Image
The lower whisker in a box plot is also known as the _____.
Blur Image
The IQR is a robust measure like the median, whereas the range is _____.
Blur Image
Variance is calculated by taking the average of squared deviations from the mean and tells you the degree of _____.
Blur Image
Measures of variability include 1) variance and 2) _____.
Blur Image
A larger variance means there is more _____.
Blur Image
Standard deviation is calculated by taking the square root of the variance and measures the _____.
Blur Image
A small standard deviation indicates that there is some spread in the data but most of it is still in a fairly tight cluster to the _____.
Blur Image
When the standard deviation increases, it means data points are becoming more spread out further from the _____.
Blur Image
The population standard deviation formula has n in the denominator, which means it will _____ the actual value of the variance and SD.
Blur Image
The sample standard deviation formula has 'n-1' in the denominator, which means it will give an _____ estimate.
Blur Image
True or False: Box plots show standard deviation.
Blur Image
_____ % of the data will be within +/- 1 standard deviation from the mean.
Blur Image
_____ % of the data will be within +/- 2 standard deviations from the mean.
Blur Image
Robust measures include IQR and median, whereas mean and range are _____.
Blur Image
The correlation coefficient (r) measures the strength of linear relationships between two numerical variables and ranges from _____.
Blur Image
Two elements for describing the relationship of two variables in a scatter plot are 1) strength and 2) _____.
Blur Image
Interpreting very low correlation values, such as R = 0.08, indicates that the variables are _____.
Blur Image
True or False: When the relationship is not linear, we don't use R to describe it.
Blur Image
True or False: We use linear regression only when there is a strong correlation.
Blur Image
When one variable causally affects the other, it is called the _____.
Blur Image
The variable that is affected by the explanatory variable is called the _____.
Blur Image
The equation for linear regression is y = B0 + B1(x) + e, where e is _____.
Blur Image

Ask Our AI Tutor

Get Instant Help with Your Questions

Need help understanding a concept or solving a problem? Type your question below, and our AI tutor will provide a personalized answer in real-time!

How it works

  • Ask any academic question, and our AI tutor will respond instantly with explanations, solutions, or examples.
Flashcard Icon
  • Browse questions and discover topic-based flashcards
  • Practice with engaging flashcards designed for each subject
  • Strengthen memory with concise, effective learning tools