Comprehensive Guide to Machine Learning Concepts and Techniques

Flashcard Icon

Flashcard

Learn Mode Icon

Learn Mode

Match Icon

Match

Coming Soon!
Library Icon

Library

View Library
Match Icon

Create

Create More Decks
Flashcard Icon Flashcards
Flashcard Icon Flashcards
Library Icon Library
Match Icon Match (Coming Soon)

Computer Science - Software Engineering

View Results
Full Screen Icon

user_striner Created by 7 mon ago

Cards in this deck(99)
What is the main assumption for linear regression, and what are two of the three assumptions for the residuals?
Blur Image
In the context of linear regression, how is a unit increase in \[ x_1 \] reflected in Y?
Blur Image
Define the collinearity problem and name two ways this affects our coefficient estimates.
Blur Image
How is the Maximum Likelihood Estimate related to Ordinary Least Squares?
Blur Image
How can you identify overfitting when plotting training and test errors as a function of model complexity or training time?
Blur Image
What is the purpose of regularization in machine learning models?
Blur Image
What does the regularization parameter do, and what does it effectively control?
Blur Image
What does a small value of lambda (alpha) signify, and what does a large value signify in regularization?
Blur Image
Which weight is ignored when imposing a penalty term in regularization?
Blur Image
Why is it customary to standardize all independent variables before regularization?
Blur Image
What penalty as a function of weights does Ridge use? What about Lasso and Elastic Net?
Blur Image
What does R Squared tell us, and how is adjusted R Squared different?
Blur Image
Define bias and variance in the context of model evaluation.
Blur Image
What is the expected loss equal to in terms of bias and variance, and what would your optimal MSE be equal to?
Blur Image
How does regularization impact the bias-variance tradeoff?
Blur Image
What are two reasons to use K-Fold Cross Validation?
Blur Image
Name three of five ways to increase model performance.
Blur Image
What does a learning curve plot, and what would overfitting and underfitting look like?
Blur Image
What does it mean for a model to be a universal approximator family of functions?
Blur Image
Describe the landscape of a non-linear cost function and how model parameters are found.
Blur Image
What are three goals of stochastic gradient descent for neural networks?
Blur Image
What are the consequences of a learning rate that's too low or too high?
Blur Image
What are the advantages of SGD? Name two of five.
Blur Image
What is the structure of the loops in mini-batch SGD training, and when should you stop?
Blur Image
What conditions must be true for Multi-Layer Perceptrons to be considered universal approximators?
Blur Image
What are the four most common activation functions used in neural networks?
Blur Image
Explain the process of backpropagation in neural networks.
Blur Image
Name three out of five design parameters for Multi-Layer Perceptrons (MLPs).
Blur Image
How do the number of hidden units and epochs affect model complexity in neural networks?
Blur Image
A model with less training and K hidden layers is similar to a network with less than K hidden layers and _____?
Blur Image
Explain the curse of dimensionality in the context of data analysis.
Blur Image
Name two of the three reasons we use dimensionality reduction techniques.
Blur Image
Explain the concept of feature selection in machine learning.
Blur Image
Explain feature extraction and give two examples.
Blur Image
What does PCA do in the context of dimensionality reduction?
Blur Image
What is the optimal MSE of the best reconstruction using PCA equivalent to?
Blur Image
What is the sum of the retained variance and lost variance equal to in regards to PCA?
Blur Image
What does a scree plot show in the context of PCA?
Blur Image
What is one method of non-linear dimensionality reduction and how does it work?
Blur Image
What does Bonferroni's theorem state in statistical hypothesis testing?
Blur Image
What are two types of classifier approaches, and how do they differ?
Blur Image
What are two types of probabilistic approaches in classification?
Blur Image
What kind of decision rule do Bayes Classifiers use, and what is it optimal at minimizing?
Blur Image
How do you classify when asymmetric costs are associated with your classifications?
Blur Image
What is the reject option in classification, and when is it used?
Blur Image
Name three of six reasons why the Bayes Rate is unachievable in practice.
Blur Image
What are the four confusion matrix metrics used in classification evaluation?
Blur Image
What does the ROC Curve plot, and what does the AUC mean?
Blur Image
When is a Precision-Recall curve more beneficial, and why?
Blur Image
How do generative models generate a posterior probability?
Blur Image
What are the three steps to constructing a decision tree?
Blur Image
Name three out of five advantages of decision tree classifiers.
Blur Image
Name three out of five limitations of decision tree classifiers.
Blur Image
In a logistic regression model, what is the logit (log-odds) a function of?
Blur Image
How do you interpret a unit change in \[ x_1 \] in the context of logistic regression?
Blur Image
What is another name for the logistic function in logistic regression, and what does it do?
Blur Image
Describe the class boundary in binary logistic regression. What is it perpendicular to?
Blur Image
What is required for a logistic regression model to be robust and well-calibrated?
Blur Image
How does multi-class logistic regression work, and how many two-class models are needed for K classes?
Blur Image
In multi-class logistic regression, what is the difference between a hard decision and a soft decision?
Blur Image
What is the difference between the sigmoid function for binary logistic regression and the softmax function for multi-class logistic regression?
Blur Image
How does the softmax function work, and is it monotonic or non-monotonic?
Blur Image
When are multi-layer perceptrons (MLPs) beneficial in machine learning?
Blur Image
How does an MLP convert outputs to posterior probabilities?
Blur Image
When is the K-Nearest Neighbors (K-NN) algorithm beneficial?
Blur Image
What does Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) assume about class distributions?
Blur Image
What does Naive Bayes assume among the features given a class?
Blur Image
When is Naive Bayes effective in classification tasks?
Blur Image
What are some key metrics for classification tasks? Name three.
Blur Image
What are three ways to handle imbalanced datasets during classification tasks?
Blur Image
What is one reason we use ensembles in machine learning?
Blur Image
When using an average combiner, what is the MSE of the Averaged Ensemble equal to?
Blur Image
What is the definition of ambiguity in the context of ensembling, and what does this mean for model fits?
Blur Image
Does bagging increase or decrease variance in ensemble methods?
Blur Image
How does bagging work, and what are two properties of bagging?
Blur Image
How does random forest build off of bagging in ensemble methods?
Blur Image
When using random forests, why do we take m << d random features?
Blur Image
During random forest modeling, how is the number of features (m) in a subset decided?
Blur Image
What are two ways of resampling in the context of handling imbalanced datasets?
Blur Image
How does boosting fit base models, and why is shrinkage important?
Blur Image
What are two advantages of boosting in ensemble methods?
Blur Image
What is the concept of mixture of experts in machine learning?
Blur Image
What does the gating network do in mixture of experts models?
Blur Image
The maximum likelihood principle is used to _____?
Blur Image
If we have 8 options for a single categorical variable, how many dummy variables would we have if we did one-hot encoding?
Blur Image
If we add an additional predictor in a linear regression, what would happen to the R-squared?
Blur Image
What is the final output layer linear with respect to in an MLP?
Blur Image
A fully connected MLP has one input, one hidden layer, and one output layer. The number of neurons in the 3 layers are 3, 4, and 2 respectively. How many learnable parameters does this network have in total (including all bias weights)?
Blur Image
What would you expect to happen to the overall error if you increased regularization on a non-linear regression model?
Blur Image
What is transfer learning, and how is it applied in machine learning?
Blur Image
Is PCA an example of feature engineering or feature selection?
Blur Image
What does the Bayes Decision Rule lead to in terms of misclassification error?
Blur Image
Why is backpropagation considered efficient in neural networks?
Blur Image
When can overfitting in linear regression happen?
Blur Image
Does the collinearity problem affect predictive power in regression models?
Blur Image
What is the difference between model bias and the bias of a point estimator?
Blur Image
How does momentum decrease convergence time in optimization algorithms?
Blur Image
How does initialization affect optimization in machine learning models?
Blur Image
How can you use a learning curve to determine if a model is overfitting or underfitting?
Blur Image

Ask Our AI Tutor

Get Instant Help with Your Questions

Need help understanding a concept or solving a problem? Type your question below, and our AI tutor will provide a personalized answer in real-time!

How it works

  • Ask any academic question, and our AI tutor will respond instantly with explanations, solutions, or examples.
Flashcard Icon
  • Browse questions and discover topic-based flashcards
  • Practice with engaging flashcards designed for each subject
  • Strengthen memory with concise, effective learning tools