Question: 1 Introduction This assignment is to implement a simple storage application using deduplication. The deduplication is based on Rabin fingerprinting. 2 System Overview You're going

1

Introduction

This assignment is to implement a simple storage application using deduplication. The deduplication is based on Rabin fingerprinting.

2

System Overview

You're going to implement a single Java program called MyDedup. MyDedup supports the following operations: upload and download.

2.1

Upload

The upload operation includes the following functions:

-

Chunking. It reads in the pathname of a file and divides the input file into chunks using Rabin fingerprinting.

-

Identifying unique chunks. Only unique data chunks will be uploaded to the cloud. We use a fingerprint index to record and identify unique chunks.

2.1.1

Procedures

To upload a file, MyDedup first loads a metadata file named mydedup. index, which specifies the fingerprint index that keeps all fingerprints of currently stored files and identifies the unique chunks. It then uploads the unique chunks to the cloud. It also updates the fingerprint index with the fingerprints of the new unique chunks. Finally, the up

-

-

date fingerprint index will be stored in the index file again for the next run, and the program quits.

We make the following assumptions.

-

The file may be a binary file

(

.

.,

VM image

)

of size up to

20

MiB

.

-

The mydedup.index file and all file recipes are uploaded to the cloud before the client quits. If mydedup.index does not exist, MyDedup starts with an empty index structure. The formats of the index file and file recipes are up to your choice.

-

After a file is uploaded, the file will be immutable

(

no modification

),

but it may be deleted.

-

We assume that there is no crash of the client or the cloud. We also assume that only one client is using the cloud at any time

(

so there is no synchronization issue

) . -

We identify files using their upload pathnames. Different uploaded files must have different pathnames.

-

Files are always uploaded

/

downloaded in units of containers. All unique chunks to be uploaded are packed into a container of size up to

1

MiB

.

Specifically, the client maintains an in

-

memory buffer. It adds each new unique chunk to the buffer. If adding a new unique chunk causes the buffer to go beyond container size limit

,

the client flushes all chunks in the buffer as a new container and uploads the container to the cloud. Note that after the client reaches the end of a file, it should always upload all chunks in the buffer as a new container to the cloud.

After each file upload, you should report the statistics. Note that the statistics are cumulative

(

.

.,

including all files that have been stored

) .

Specifically, we report the following details

(

excluding the metadata statistics

)

-

Total number of files that have been stored

-

Total number of pre

-

deduplicated chunks in storage

-

Total number of unique chunks in storage

-

Total number of bytes of pre

-

deduplicated chunks in storage

(

denoted by

\ (

_{1} \))

-

Total number of bytes of unique chunks in storage

(

denoted by

\ (

_{2} \))

-

Total number of containers in storage

-

Deduplication ratio:

\ (

_{1} /

_{2} \) (

rounded to two decimal places

) .

2.1.2

Chunking

We use Rabin fingerprinting for variable

-

size chunking; see lecture notes for details. In particular, we divide chunks by checking if an offset matches an anchor

-

point criterion. An anchor point is determined by the anchor mask with multiple

1 -

bits. If the bitwise AND operation of the Rabin fingerprint and the anchor mask is equal to zero, we have an anchor point.

A data chunk is defined by the byte range starting from the first byte right after the previous anchor point

(

or the beginning of the file

)

to the end of the current anchor point. While we reach the end of the file, we simply produce a chunk between the last anchor point and the end of the file, even though the length of the new chunk can be very small.

MyDedup takes the following parameters from the command line as input parameters:

(

)

min

_

chunk, the minimum chunk size

(

in bytes

), (

)

avg

_

chunk, the average chunk size

(

in bytes

),

and

(

iii

)

max

_

chunk, the maximum chunk size

(

in bytes

) .

Each chunk size parameter is required to be a power of

2

; please return an error otherwise. We assume that the multiplier

\ (

\)

is equal to

257 .

Chunks are identified based on MD

5 .

2.2

Download

Given the pathname, MyDedup retrieves chunks

(

in containers

)

and reconstructs the original file.

2.3

Delete

Given the pathname, MyDedup deletes the file from the cloud. If a deleted chunk is no longer shared by any other file, its entry in the index structure should be removed. However, its physical copy may remain in the original container. If all chunks in a container are deleted and not shared by any other file, the container should also be physically removed from the storage backend.

2.4

Local Storage Backend

Your program supports the local storage backend, which is used to mimic a cloud storage site. All data chunks will be stored under the directory data

/ .

2.5

Sample Input

/

Output Format

Upload:

java MyDedup upload

Report Output:

Total number of files that have been stored:

10

Total number of pre

-

deduplicated ch

1 Introduction This assignment is to implement a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

can i get alittle help with this? its practice questions for my Final. 16 Financial Highlights International Business Machines Corporation and Subsidiary Companies ($ in millions except per share...

Hi, I have an Assignment for my Finance Subject. I have attached the necessary documentation here for you to view including the Lecture slides of all the Topics covered for this assignment. Please...

CHA P TER 9 Understanding Software: A Primer for Managers 1. INTRODUCTION L E A R N I N G O B J E C T I V E S 1. Recognize the importance of software and its implications for the rm and strategic...

Mates Rates Rent-A-Car ( just do the part a) using visual studio code (C#) Criteria sheet - Par A Example supplementary files (readme.pdf) Example supplementary files (class-diagram.pdf) Assignment...

Your mission in this assignment is to write a simple text-based adventure game in the tradition of Will Crowthers pioneering Adventure program of the early 1970s. In games of this sort, the player...

Computer Organization and Networks Practicals 2021/22 October 9, 2021 Computer Organization and Networks Practicals 2021/22 b68495714b Contents Contents 0 Introduction 3 0.1 Registration . . . . . ....

UMUC Haircuts Appointment Process Individual Needs Appointment for Hair Styling Calls UMUC Haircuts and requests appointment Drives to UMUC Haircuts 1 Employee greets customer and asks customer last...

Hi, This subject is financial accounting, here is a short essay type question, approximately 5 paragraphs. ''Drawing on private interest theory, what powers do you believe the Australian Accounting...

1. Please write a c++ program and show all outputs. PLEASE SEE LIKE EXAMPLE BELOW. In this assignment, you will implement the VectorGraphic and Point classes as described in Lesson One. There is no...

1. Describe the three management functions commonly executed by managers within their organization. 2. What are the benefits of ethical practices and corporate governance? 3. Explain the differences...

A company is trying to determine if buying a new computer that costs $60,000 is a good decision. Disheartened from 10%, Depreciation tax shield benefits After-tax cost savings After-tax cost savings...

Which action can older adults take to protect themselves from financial abuse? Share their ATM PIN number with a trusted family member. Have written agreements regarding fees for services. Make...

Explain the key elements of Human capital theory and criticisms levelled against it. Briefly comment on the significance of human capital for the Fourth Industrial Revolution. 30 marks