Question: The following code uses cross-validation in order to estimate predictive accuracy for a linear model of days-to-remission as a function of gene expression in ALL

The following code uses cross-validation in order to estimate predictive accuracy for a linear model of days-to-remission as a function of gene expression in ALL dataset. It runs to completion without errors but produces a number of warnings (shown below) about "differing numbers of rows" and "mismatches in object lengths."

Please explain the source of those warnings and how they can be cleaned up. Please also explain how whatever caused those warnings affects the output (if at all), and how and why (and if) the output changes upon fixing the code. (Hint: in order to observe these warnings you do not have to go through all 12K genes at each step of cross-validation - one percent of that amount is plenty - and it will save you a lot of time you would otherwise waste watching it run; remember also that R is an interpreter, so you can run commands one at a time if you need to and examine their outputs).

library(ALL) data(ALL)

set.seed(1234)

# calculate days-to-remission:

ALL.pdat <- pData(ALL) date.cr.chr <- as.character(ALL.pdat$date.cr) diag.chr <- as.character(ALL.pdat$diagnosis) date.cr.t <- strptime(date.cr.chr,"%m/%d/%Y") diag.t <- strptime(diag.chr,"%m/%d/%Y") ALL.pdat$D2R <- as.numeric(date.cr.t - diag.t)

# prepare the data structures:

ALL.exprs <- exprs(ALL)[,!is.na(ALL.pdat$D2R)] ALL.pdat <- ALL.pdat[!is.na(ALL.pdat$D2R),] n.xval <- 5 s2.xval <- numeric()

xval.grps <- sample(1:dim(ALL.pdat)[1]%%n.xval+1)

# run over each cross-validation:

for ( i.xval in 1:n.xval ) { min.pval <- 1.0

 min.id <- NA train.exprs <- ALL.exprs[,xval.grps!=i.xval] train.d2r <- ALL.pdat[xval.grps!=i.xval,"D2R"]  # evaluate each gene in the training dataset to find the one # most associated with the outcome:  for( i in 1:dim(train.exprs)[1]) {  ###for( i in 1:100 ) {

 p.val <- anova(lm(train.d2r~train.exprs[i,],))[1,"Pr(>F)"] if ( p.val < min.pval ) {

 min.pval <- p.val

min.id <- i }

}

 # print the gene found:

 cat(rownames(train.exprs)[min.id],min.pval,fill=T)

 # refit the model for best gene found on training dataset:

 best.lm.xval <- lm(train.d2r~train.exprs[min.id,])

 # calculate predictions on test dataset:

 test.exprs <- ALL.exprs[,xval.grps==i.xval] test.d2r <- ALL.pdat[xval.grps==i.xval,"D2R"] test.pred <- predict(

 best.lm.xval,data.frame(t(test.exprs),test.d2r) )

 # accumulate squared errors of prediction:

 s2.xval <- c(s2.xval,(test.pred-test.d2r)^2) }

40176_at 1.433363e-05 35296_at 8.721938e-07 1213_at 3.760985e-06 34852_g_at 2.161217e-06 33901_at 1.399374e-06 Warning messages:

1: 'newdata' had 19 rows but variables found have 77 rows 2: In test.pred - test.d2r :

 longer object length is not a multiple of shorter object length ...

# print average squared error in cross-validation:

 mean(s2.xval)

[1] 332.7707

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Question 1 1/1 point (graded) The following code was used in the video to plot RSS with 0 =25 . beta1 = se q(0, 1, len=nrow(galton_heights) ) results % ggplot(aes(beta1, rss)) + geom_line() +...

Submitted to Management Science manuscript MS-0001-1922.65 Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title....

Write 2 paragraphs about Macro risks and the term structure of interest rates article. No max word count, page count, or formatting requirements but has to be submit to my tutor's work as my own....

can someone solve this Modern workstations typically have memory systems that incorporate two or three levels of caching. Explain why they are designed like this. [4 marks] In order to investigate...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

This question involves the use of AGGREGATE linear PYTHOIN regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Describe...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil...

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1 2. Data was collected where a weightlifter was asked to...

answer all questions promptly What is the maximum segment length of a 100Base-FX netdwork,Thelast character('X', etc) refers to the line code method used. Line code is a pattern of voltage, current...

137 and cos 12 13T 9. Find the exact value of cos Do not use a calculator, and explain your reasoning 24 carefully.

A companys purchasing manager bought 5,000 pounds of material for $5.50 per pound instead of the budgeted $6.00 per pound, resulting in a favorable variance of $2,500. The company has a policy of...

In general, borrowing by selling bonds is Blank _ _ _ _ _ _ than obtaining funds by taking a loan from a bank. Multiple choice question. more expensive cheaper

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

3. Using Frischs process to avoid defaulting to the manager, how will you help the team make recommendations?

5. What information would the team members need?

Where those not participating, encouraged to participate?