Question: In this project, we will analyze a single - cell RNA - seq dataset, with the goal of unveiling hierarchic al structure and discovering important

In this project, we will analyze a single

-

cell RNA

-

seq dataset, with the goal of unveiling hierarchic

al structure and discovering important genes. The datasets provided are all different subsets of a larger single

-

cell RNA

-

seq dataset, compiled by the Allen Institute. This data contains cells from the mouse neocortex, a region in the brain which governs higher

-

level functions such as perception and cognition.

The single

-

cell RNA

-

seq data comes in the form of a counts matrix, where

each row corresponds to a cell

each column corresponds to the normalized transcript compatibility count

(

TCC

)

of an equivalence class of short RNA sequences, rescaled to units of counts per million. You can think of the TCC entry at location of the data matrix as the level of expression of the

-

th gene in the

-

th cell.

Download gene

_

analysis

_

data.tar.gz

. (

If you don't know how to open it

,

try WinZip or

7 -

zip.

)

The data is provided in three folders:

1,

which is a small, labeled subset of the data. It contains the count matrix along with

ground truth" clustering labels

,

which were obtained by scientists using domain knowledge and statistical testing. This is for use in Problem

1 .

2_

unsupervised, which contains only a count matrix. This is for use in Problem

2 .

2_

evaluation, which contains a labeled training and test set. This is for use in Problem

2

to evaluate feature selection.

The p

2_

unsupervised

_

reduced and p

2_

evaluation

_

reduced folders contain datasets with a reduced number of genes, in case you are unable to run some of the procedures on the larger versions. In particular, a full logistic regression could take

1

2

GB of memory to run.

In Problem

1 (

autograded

),

you will explore a small subset of the data, using visualization and clustering methods to discover its structure.

In Problem

2 (

written report

/

peer review

),

you will use the tools you had from Problem

1

to explore a larger subset of the data. Using clustering combined with logistic regression, you will discover informative features which can be used to distinguish cells of different types.

Finally, in Problem

3 (

written report

/

peer review

),

you will revisit open

-

ended decisions you made in your analyses, such as T

-

SNE hyper

-

parameters or number of clusters chosen, and explore how robust your end results are to these potentially ambiguous decisions.

Hint: The data are only available in

.

npy files. For people who prefer

.

csv format, you can use a few lines of Python to convert

.

npy files to

.

csv files as below.

import numpy as npimport pandas as pdX

=

.

load

("

data

\

mathbf

{

} 1 \

.

npy

")

.

DataFrame

(

) .

_

csv

("

data

\

mathbf

{

} 1 \

.

csv

")

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Publications 2016 Project Execution: A Research Agenda to Explore the Phenomenon James Walter Marion Embry-Riddle Aeronautical University, m..j@erau.edu Tracey Richardson Embry-Riddle Aeronautical...

PAPERS What Project Strategy Really Is: The Fundamental Building Block in Strategic Project ManagementPeerasit Patanakul, Stevens Institute of Technology, Hoboken, NJ, USA Aaron J. Shenhar, Rutgers...

OPERATIONS MANAGEMENT ASSIGNMENT 6 1 Human resources, project management and operations management are all equally vital to a business's success. Each of these focuses on different areas of the...

Chapter 2 - Planning Project Communication IN THIS CHAPTER Struggling with Proper Project Communication Planning Preventing Common Communication Problems Creating a Project Communication Plan...

pLEASE po pa help! Reflection po about dto sa mga Framework for Project Management 4. Framework for Project Management Many different professions contribute to the theory and practice of project...

Write a project management plan. we have a template and project description. we need to edit the template(table of contents) with our own ideas. CPSC 8820-01 Project Management Plan Your Unique...

Hyten Corporation On June 5, 1998, a meeting was held at Hyten Corporation, between Bill Knapp, Director of Marketing/Sales, and John Rich, director of engineering. The purpose of the meeting was to...

THIRD AVENUE SOFTWARE HEALTH-CARE APP PROJECT This case is new for the ninth edition of Information Technology Project Management . The case provides an opportunity to apply agile and Scrum...

Consider a vertical 20 cm tall flat plate at 120?C suspended in a fluid at 100?C. If the fluid is being forced past the plate from above, estimate the fluid velocity for which natural convection...

What are the KKT conditions for nonlinear programming problems of the following form? Minimize f(x) Subject to gi(x) bi, for i = 1, 2, . . . ,m and x 0,

A superincreasing knapsack problem is _ _ _ _ to solve than a jumbled knapsack. Group of answer choices shorter tougher lengthier easier

You are a lucky college graduate with the generous new salary of $6,000 a month. You estimate you will be paying about 30% toward taxes. Given this, if you wanted to save according to the 90-10 rule,...

2. Are you varying your pitch (to avoid being monotonous)?

8. Do you animate your facial expressions as you deliver your speech?

3. Are you varying your speaking rate and volume?