What is the main assumption for linear regression, and what are two of the three assumptions for the residuals? Residuals are independent and identically distributed; mean of residuals is zero Residuals are normally distributed; residuals have a mean of zero Residuals have constant variance; residuals are independent of each other Conditional mean of Y is linear in X; residuals are normally distributed and have constant variance

Conditional mean of Y is linear in X; residuals are normally distributed and have constant variance

In the context of linear regression, how is a unit increase in \[ x_1 \] reflected in Y? Y will decrease by the amount \[ eta_1 \] Y will be multiplied by \[ eta_1 \] Y will remain unchanged Y will be increased by the amount \[ eta_1 \]

Y will be increased by the amount \[ eta_1 \]

Define the collinearity problem and name two ways this affects our coefficient estimates. Parameter estimates are unaffected; does not impact model performance Parameter estimates are stable; improves model accuracy Parameter estimates have high uncertainty; affects interpretation and inflates uncertainty Parameter estimates are precise; reduces model complexity

Parameter estimates have high uncertainty; affects interpretation and inflates uncertainty

How is the Maximum Likelihood Estimate related to Ordinary Least Squares? OLS minimizes the variance of the residuals OLS is unrelated to Maximum Likelihood Estimation Minimizing MSE on the training data yields the Maximum Likelihood Estimate solution Maximizing the likelihood function gives the OLS solution

Minimizing MSE on the training data yields the Maximum Likelihood Estimate solution

How can you identify overfitting when plotting training and test errors as a function of model complexity or training time? Training error will increase while test error decreases Both training and test errors will increase Both training and test errors will decrease The training error and test error will begin to diverge

The training error and test error will begin to diverge

What is the purpose of regularization in machine learning models? To reduce the size of the dataset To prevent overfitting by adding a penalty term To improve training speed To increase model complexity

To prevent overfitting by adding a penalty term

What does the regularization parameter do, and what does it effectively control? Reduces the number of features; controls feature selection Increases the dataset size; controls data augmentation Imposes a penalty on less desirable solutions; controls model complexity Increases the learning rate; controls training speed

Imposes a penalty on less desirable solutions; controls model complexity

What does a small value of lambda (alpha) signify, and what does a large value signify in regularization? Small value: High regularization, prevents overfitting; Large value: No regularization, allows overfitting Small value: No effect on model; Large value: Improves model accuracy Small value: Increases model complexity; Large value: Reduces model complexity Small value: Little regularization, allows overfitting; Large value: Large regularization, leads to underfitting

Small value: Little regularization, allows overfitting; Large value: Large regularization, leads to underfitting

Why is it customary to standardize all independent variables before regularization? So that large coefficients aren't penalized more severely To increase the speed of convergence To improve the interpretability of the model To reduce the number of features

So that large coefficients aren't penalized more severely

Comprehensive Guide to Machine Learning Concepts and Techniques

Flashcard

Learn Mode

Match

Library

Create

Flashcards

Library

Match (Coming Soon)

Computer Science - Software Engineering

user_striner Created by 7 mon ago

Cards in this deck(99)

What is the main assumption for linear regression, and what are two of the three assumptions for the residuals?

In the context of linear regression, how is a unit increase in \[ x_1 \] reflected in Y?

Define the collinearity problem and name two ways this affects our coefficient estimates.

How is the Maximum Likelihood Estimate related to Ordinary Least Squares?

How can you identify overfitting when plotting training and test errors as a function of model complexity or training time?

What is the purpose of regularization in machine learning models?

What does the regularization parameter do, and what does it effectively control?

What does a small value of lambda (alpha) signify, and what does a large value signify in regularization?

Which weight is ignored when imposing a penalty term in regularization?

Why is it customary to standardize all independent variables before regularization?

What penalty as a function of weights does Ridge use? What about Lasso and Elastic Net?

What does R Squared tell us, and how is adjusted R Squared different?

Define bias and variance in the context of model evaluation.

What is the expected loss equal to in terms of bias and variance, and what would your optimal MSE be equal to?

How does regularization impact the bias-variance tradeoff?

What are two reasons to use K-Fold Cross Validation?

Name three of five ways to increase model performance.

What does a learning curve plot, and what would overfitting and underfitting look like?

What does it mean for a model to be a universal approximator family of functions?

Describe the landscape of a non-linear cost function and how model parameters are found.

What are three goals of stochastic gradient descent for neural networks?

What are the consequences of a learning rate that's too low or too high?

What are the advantages of SGD? Name two of five.

What is the structure of the loops in mini-batch SGD training, and when should you stop?

What conditions must be true for Multi-Layer Perceptrons to be considered universal approximators?

What are the four most common activation functions used in neural networks?

Explain the process of backpropagation in neural networks.

Name three out of five design parameters for Multi-Layer Perceptrons (MLPs).

How do the number of hidden units and epochs affect model complexity in neural networks?

A model with less training and K hidden layers is similar to a network with less than K hidden layers and _____?

Explain the curse of dimensionality in the context of data analysis.

Name two of the three reasons we use dimensionality reduction techniques.

Explain the concept of feature selection in machine learning.

Explain feature extraction and give two examples.

What does PCA do in the context of dimensionality reduction?

What is the optimal MSE of the best reconstruction using PCA equivalent to?

What is the sum of the retained variance and lost variance equal to in regards to PCA?

What does a scree plot show in the context of PCA?

What is one method of non-linear dimensionality reduction and how does it work?

What does Bonferroni's theorem state in statistical hypothesis testing?

What are two types of classifier approaches, and how do they differ?

What are two types of probabilistic approaches in classification?

What kind of decision rule do Bayes Classifiers use, and what is it optimal at minimizing?

How do you classify when asymmetric costs are associated with your classifications?

What is the reject option in classification, and when is it used?

Name three of six reasons why the Bayes Rate is unachievable in practice.

What are the four confusion matrix metrics used in classification evaluation?

What does the ROC Curve plot, and what does the AUC mean?

When is a Precision-Recall curve more beneficial, and why?

How do generative models generate a posterior probability?

What are the three steps to constructing a decision tree?

Name three out of five advantages of decision tree classifiers.

Name three out of five limitations of decision tree classifiers.

In a logistic regression model, what is the logit (log-odds) a function of?

How do you interpret a unit change in \[ x_1 \] in the context of logistic regression?

What is another name for the logistic function in logistic regression, and what does it do?

Describe the class boundary in binary logistic regression. What is it perpendicular to?

What is required for a logistic regression model to be robust and well-calibrated?

How does multi-class logistic regression work, and how many two-class models are needed for K classes?

In multi-class logistic regression, what is the difference between a hard decision and a soft decision?

What is the difference between the sigmoid function for binary logistic regression and the softmax function for multi-class logistic regression?

How does the softmax function work, and is it monotonic or non-monotonic?

When are multi-layer perceptrons (MLPs) beneficial in machine learning?

How does an MLP convert outputs to posterior probabilities?

When is the K-Nearest Neighbors (K-NN) algorithm beneficial?

What does Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) assume about class distributions?

What does Naive Bayes assume among the features given a class?

When is Naive Bayes effective in classification tasks?

What are some key metrics for classification tasks? Name three.

What are three ways to handle imbalanced datasets during classification tasks?

What is one reason we use ensembles in machine learning?

When using an average combiner, what is the MSE of the Averaged Ensemble equal to?

What is the definition of ambiguity in the context of ensembling, and what does this mean for model fits?

Does bagging increase or decrease variance in ensemble methods?

How does bagging work, and what are two properties of bagging?

How does random forest build off of bagging in ensemble methods?

When using random forests, why do we take m << d random features?

During random forest modeling, how is the number of features (m) in a subset decided?

What are two ways of resampling in the context of handling imbalanced datasets?

How does boosting fit base models, and why is shrinkage important?

What are two advantages of boosting in ensemble methods?

What is the concept of mixture of experts in machine learning?

What does the gating network do in mixture of experts models?

The maximum likelihood principle is used to _____?

If we have 8 options for a single categorical variable, how many dummy variables would we have if we did one-hot encoding?

If we add an additional predictor in a linear regression, what would happen to the R-squared?

What is the final output layer linear with respect to in an MLP?

A fully connected MLP has one input, one hidden layer, and one output layer. The number of neurons in the 3 layers are 3, 4, and 2 respectively. How many learnable parameters does this network have in total (including all bias weights)?

What would you expect to happen to the overall error if you increased regularization on a non-linear regression model?

What is transfer learning, and how is it applied in machine learning?

Is PCA an example of feature engineering or feature selection?

What does the Bayes Decision Rule lead to in terms of misclassification error?

Why is backpropagation considered efficient in neural networks?

When can overfitting in linear regression happen?

Does the collinearity problem affect predictive power in regression models?

What is the difference between model bias and the bias of a point estimator?

How does momentum decrease convergence time in optimization algorithms?

How does initialization affect optimization in machine learning models?

How can you use a learning curve to determine if a model is overfitting or underfitting?

Ask Our AI Tutor

Get Instant Help with Your Questions

Need help understanding a concept or solving a problem? Type your question below, and our AI tutor will provide a personalized answer in real-time!

How it works

Ask any academic question, and our AI tutor will respond instantly with explanations, solutions, or examples.

Get Started

Browse questions and discover topic-based flashcards
Practice with engaging flashcards designed for each subject
Strengthen memory with concise, effective learning tools

Discover By Topic

Comprehensive Guide to Machine Learning Concepts and Techniques

Related Decks