What does Natural Language Processing (NLP) concern itself with? involves techniques that do not work for 'artificial languages' like Esperanto or programming languages, like Java NONE of the other statements are true involves the synthesis of natural compounds, such as organic materials, from their chemical language descriptions concerns the computational analysis, interpretation, and production of natural language

concerns the computational analysis, interpretation, and production of natural language

What might you do to improve the performance of a multilingual BERT based classifier performing poorly? train it for another epoch to see if the validation performance increases check whether a monolingual BERT model exists for your language and if so, test it try increasing the batch size if memory on the GPU allows ALL of the other answers are valid

ALL of the other answers are valid

In Multi-task learning, what is the aim/benefit of fine-tuning a Language model to perform many different tasks at the same time? generalise knowledge across the different tasks and thus to perform better on any particular task ALL the other answers are correct become so good at general-purpose question answering that it can generalise even to new domains learn to do many tasks contemporaneously, so we don't need to deploy and maintain many task-specific models

generalise knowledge across the different tasks and thus to perform better on any particular task

Which of the following statements about the most frequently occurring words in a corpus is true? they are the most important words in the query they should never be removed during preprocessing NONE of the other answers are correct they are the most discriminative words in any document

NONE of the other answers are correct

In a web search engine, which signals might provide useful features to the rank learner? the similarity between the text of incoming hyperlinks and the query the distance between geographic locations associated with the webpage and the location of the user the number of hyperlinks from other webpages that link to this particular page ALL of the other options are correct

ALL of the other options are correct

Comprehensive Flashcards on NLP, Machine Learning, and Information Retrieval Concepts

Flashcard

Learn Mode

Match

Library

Create

Flashcards

Library

Match (Coming Soon)

Computer Science - Software Engineering

user_jevbwl Created by 7 mon ago

Cards in this deck(88)

Which of the following is NOT considered a limitation of regular-expression based text extraction?

Which task involves assigning a categorical label to every word in a piece of text?

What does Natural Language Processing (NLP) concern itself with?

What might you do to improve the performance of a multilingual BERT based classifier performing poorly?

Which of the following functions is NOT commonly used as an activation function in Neural networks?

Which of the following is NOT a common ranking function used in term-based information retrieval?

The size of the vocabulary grows roughly in proportion to the square root of the length of the document, is a statement of whose law?

In Multi-task learning, what is the aim/benefit of fine-tuning a Language model to perform many different tasks at the same time?

Which of the following statements about the most frequently occurring words in a corpus is true?

In a web search engine, which signals might provide useful features to the rank learner?

What does a byte-pair encoding do?

The sentence 'Paris Hilton was photographed leaving the Paris Hilton' is an example of a sentence containing:

Latent Dirichlet Allocation (LDA) is an algorithm that is often used to:

If the probability of the sequence 'I love NLP' was exactly 1/64, what would the perplexity of the sentence be?

What are the true positive (TP), true negative (TN), false positive (FP) and false negative (FN) counts given the following confusion matrix: TP=3071, TN=2401, FP=142, FN=103?

Which of the following is NOT a property of Word2Vec word embeddings?

Which of the following measures would not be appropriate for evaluating a speech-to-text system?

In hierarchical agglomerative text clustering, what does single-linkage (minimum distance) tell you about the types of clusters that could be found?

How can you improve entity in relation to which and options?

The sentence: 'I didn't just say what I just said.' is an example of a phrase that:

When evaluating dialog produced by a chatbot, ideally we would rate performance based on:

When generating text from a language model, which technique will likely require the most computational resources and thus be slowest?

The fact that the exclamation mark '!' can denote a factorial, the question mark '?' can indicate a missing value, and the period '.' can be a decimal point, complicates which NLP task?

The task of determining who or what is being referred to by a pronoun in a sentence is called:

The process of aligning words to a common reference dictionary to ensure consistent spelling/formatting throughout the corpus is referred to as:

Which statement about the limitations of Ngram language models is NOT correct?

GloVE embeddings are used to:

The Mel Spectrogram is just a spectrogram which has:

In order to improve the probability estimates for an n-gram language model we could:

A statistical language model computes:

When generating text from a language model with top-k sampling, setting the value of k to the size of the vocabulary would be equivalent to performing:

Which, if any, of the following techniques is NOT used to produce a spectrogram for analysing audio signals?

Machine translation is an example of what type of problem?

Which of the following prompts to a language model would be considered an example of one-shot learning?

Which statement about the T5 (Text-To-Text Transfer Transformer) model is NOT true?

Assume that you have learnt Word2Vec embeddings of size 512 over a vocabulary of four hundred thousand tokens. Approximately how much memory (in GB) would you need to store all of the vectors if the usual double precision (64 bit) floating point numbers are used?

Which of the following statements about the Bag-of-Words (BOW) representation of a document is correct?

In NLP, the process of splitting a document up into a sequence of words is called:

Consider the regular expression: '\d{1,2}-(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)-\d{2,4}'. Which, if any, of the following strings would the expression match?

Which of the following is a common text pre-processing step in an NLP application?

Explain the concept of other in relation to text and following?

In a traditional (lexical/term-based) search engine, a posting list contains:

Which statement about sequence-to-sequence models with attention is NOT true?

What is the significance of which in relation to options and language?

In Information Theory, the logarithm of one on the probability of an event corresponds to:

In text-to-speech systems, certain words like 'bass' can be problematic. Why?

Which statement about a Long Short-Term Memory (LSTM) network is NOT true?

Given the following conditional probabilities for a trigram language model, what would be the probability of the sequence 'I like chocolate ice cream'?

Building spoken interface agents is much harder than building chatbots because:

Which traditional (lexical) retrieval function would be most robust to a spammer who tries to push their web page up in the search rankings by adding many occurrences of the query term?

According to a Naive Bayes model, what is the Probability that a student gets a 'high' grade if she describes the exam as 'long and difficult'?

The act of attributing human emotions and intentions to a computer program is referred to as:

Which of the following statements about the uses of word embeddings is generally true?

The study of patterns of stress and intonation that affect the intended meaning of spoken language is referred to as:

Do we prefer language models with higher perplexity or lower perplexity?

In terms of speech acts, when someone repeats back to the speaker part of what they have just said, what is usually the purpose of doing this?

The fact that the expression 'I made her duck' could mean 'I caused her to lower her head to avoid being hit' or 'I cooked the fowl that she had bought' is an example of the fact that:

Traditionally, Conditional Random Fields were used in NLP to solve which of the following tasks:

Which of the following tasks would NOT be considered a typical NLP problem:

In NLP, which of the following statements regarding a parse tree is NOT correct?

What type of learning is the model doing when given the following prompt: 'I'm afraid for the calendar. Its days are numbered. => not funny I only know 25 letters of the alphabet. I don't know y. => not funny What do you call a fish wearing a bowtie? SoFISHticated. => funny What do you call a factory that makes okay products? A satisfactory. => funny I thought the dryer was shrinking my clothes. Turns out it was the refrigerator all along. => funny I asked my dog what's two minus two. He said nothing. =>'?

Considering the taxonomy of speech acts defined by Bach and Harnish, when someone advises/asks/orders/requests somebody, they are performing which type of speech act?

Which of the following Machine Learning models makes use of a bidirectional Transformer architecture to extract a feature representation of text?

Using top-k sampling with k set to 2, what would the chance of seeing the output 'a b c' be in a bigram language model with given probabilities?

Which of the following techniques is often used for learning sequence-to-sequence models in NLP?

What trick did Eliza (the chatbot) use for creating meaningful conversations in an open domain with little or no domain knowledge?

Text normalisation is needed for a text-to-speech system in order to:

Which of the following statements about the Logistic Regression (LR) classifier is true?

If the output of a text classifier produces the following confusion matrix on the test set, what is the Precision of the classifier? Predicted Class + - Actual + 95 25 Class - 5 75

The main disadvantage of the k-Medoids algorithm with respect to the k Means algorithm is:

Which of the following tasks would NOT usually be considered a Natural Language Processing task:

Given the piece of text 'This exam is too much' and a trigram language model, what is the chance that the model produces the word 'fun' as the next token if 'top-k' sampling is used with k set to 5?

Which of the following techniques used in NLP is the most recent and considered state-of-the-art?

How is language in relation to entity and which?

In order to speed up model training, the Transformer model REMOVED what part of the sequence-to-sequence with attention model architecture?

Which one of the following regular expressions would match the telephone number '+69 403 992 010'?

How many times would the regular expression 'f\w*ny?' match the following string: 'While I get how you feel, I don't find this exam either fun or funny.'?

Entity linkage is the task of:

Consider the following normalised tf-idf vectors. What would be the order of the documents if the cosine similarity is used to rank them?

Which statement about Word2Vec is true?

Which statement about sequence-to-sequence models is true?

The vector: [0,0,0,0,1,0,5,0,0,0,23,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,1,0,2,0,0,1,0,0,0,0,15,0,0,0,0,0, ....,0,0,1,0,0,0,0,0] is most likely a:

What 2-dimensional representation of an audio signal is often used in speech detection and synthesis?

What is the difference between stemming and lemmatization?

Begin-Inside-Outside tagging is often used for:

The main reason for performing stemming before building a text classifier is to:

Which statement about GPT (GPT-2, GPT-3, etc) models is NOT true?

News aggregators can make use of clustering techniques to:

Ask Our AI Tutor

Get Instant Help with Your Questions

Need help understanding a concept or solving a problem? Type your question below, and our AI tutor will provide a personalized answer in real-time!

How it works

Ask any academic question, and our AI tutor will respond instantly with explanations, solutions, or examples.

Get Started

Browse questions and discover topic-based flashcards
Practice with engaging flashcards designed for each subject
Strengthen memory with concise, effective learning tools

Discover By Topic

Comprehensive Flashcards on NLP, Machine Learning, and Information Retrieval Concepts

Related Decks