Describe five different risks that you considered to data and services you set up and used in
Question:
- Describe five different risks that you considered to data and services you set up and used in the static and dynamic web hosting solutions that you built.
- Describe the risk assessment process that you used to determine the impact of risks to data stored on the static web site and the dynamic web site.
- Describe three disaster recovery techniques or strategies that you have recommended to ensure business continuity for your web sites. How does each technique enable fast recovery of data and normal service restored?
- Describe how you used the ISO and NIST standards to develop your disaster recovery framework and processes.
- Explain what the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) metrics are. Explain how you used these values to develop your DR strategies and recommendations.
- Describe the techniques and methods you used to monitor your web server instance. Describe the events that you identified as significant and the alerts you created to notify events occurring.
PART A: Knowledge Based Questions
These questions relate to knowledge components for ICTCLD502 - Design and implement highly available cloud infrastructure
- Describe the principles of the AWS well architected framework and how you have applied each principle when building your web sites.
- Describe the features and capabilities of the infrastructure components that you used to build your cloud web sites including VPC components, instances, database and storage components.
- Describe the features and application of the protocols and software tools that you implemented to deliver your web service and manage your web service. (Hint: DNS and TLS; SSH and CLI)
- Describe the five different cloud cost models available for instances and how to get value when demand for web services grows.
- Explain what high availability is and describe what components you implemented to ensure HA for your web sites.
- What single points of failure did you identify in your web site design? Explain how mitigated these SPOF with fault tolerance?
- Define reliability and each of these metrics: mean time to failure (MTTF), to repair (MTTR) and between failures (MTBF)
- Define recoverability and each of these metrics: recovery time (RTO) and recovery point (RPO) objectives
- Describe a service level agreement (SLAs) and how it relates to a web service.
- Explain what vertical scaling is and what horizontal scaling is. Describe how to vertically scale an instance and how to horizontally scale an instance.
- Describe the testing techniques you used to ensure avoid single point failures.
- Describe how you found a bug or error in a command or code? What technique did you use to find the error?
- Describe tools and methods to measure availability impact when an outage occurs?
- Describe two cloud services that have built-in fault tolerance and how this differs from infrastructure designed for fault tolerance.
- Describe how the local balancing and autoscaling that you have implemented works and why it improves the availability of your web service.
- Describe how you monitored the performance of your web service. What service, methods and metrics did you use?
Part B: Project
Project Brief:
Refer to project brief
Project requirements:
- Develop and evaluate a cloud disaster recovery plan that includes at least three major risk events.
- Determine likelihood and impact of risk events to assist in the development of one cloud disaster recovery plan
- Document disaster recovery plan and ways the plan reaches Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets.
- Design and implement at least one fault tolerant cloud infrastructure on a cloud platform resilient to networking, compute, storage, database and data centre failures
- Design and deploy automated infrastructure scaling for at least one business need
- Simulate failures of at least one component and demonstrate is fault tolerant.
- Use cloud management console, software development kits or command line tools
- Define, monitor and record resource availability in cloud environment, including:
- reliability
- recoverability
- service levels
- scalability.
PART B: Demonstrated tasks
These questions relate to the observation components for ICTCLD501 - Develop cloud disaster recovery plans.
- Task 1:
- 1. Prepare to develop plan
1.1 Identify disaster recovery plan requirements according to business needs and requirements
Read the Project Brief and Project Requirements and list the disaster recovery plan requirements as dot points
1.2 Determine existing organisational recovery plans
Schedule and meet with the client to determine existing recovery plans.
- 1.3 Identify vendor disaster recovery plan and service level agreements
Schedule and meet with the client to identify vendor disaster recovery plans and service level agreements.
- Task 2:
2. Conduct impact analysis
Follow the BIA walkthrough process we applied to the layer 3 switch to evaluate the static and dynamic web services. Work through the steps outlined in NIST to perform a BIA and risk assessment. Use the NIST template to report your findings.
2.1 Determine time and recovery point objectives according to business needs
Determine the RTO and RPO and show how you have determined these values
2.2 Assess potential risks plan exclusions according to business requirements
Identify the risk assessment approach
Identify the threats and vulnerabilities to the web services, code and data, infrastructure.
2.3 Estimate amount of data and security level of data managed
Determine what data is used and where it is stored.
Determine who or what should have access to the data and what access they should have?
2.4 Evaluate severity of impact and disruption of risk events
Assess the impact of the threats and vulnerabilities identified.
Assess the likelihood of occurrence of the risks.
2.5 Document outcomes of impact analysis according to organisational policies and procedures
Determine and report on the inherent risk of components.
finished the BIA template with the details of your analysis
- Task 3:
3. Develop disaster recovery solutions
Determine controls and countermeasures to mitigate the risks to the web services.
3.1 Develop range of disaster recovery solutions according to business requirements
Investigate and determine disaster recovery strategies for each risk
Determine the level of residual risk of these protections are put in place.
3.2 Determine vendor protections and prioritise risks
Investigate and determine cloud protections for each risk
Determine the level of residual risk of these protections are put in place.
3.3 Assess external insurance protection levels and their suitability requirements
Investigate external insurance protection levels which would apply to buildings and physical infrastructure. What assurance is there from the Cloud provider?
Investigate insurance to protect from loss of income
3.4 Identify other disaster recovery solution components
Investigate and determine vendor protections for each risk
Determine the level of residual risk of these protections are put in place
- Task 4:
4. Finalise disaster recovery plan.
Meet with the client to determine which controls and countermeasures have been approved for implementation. Make a disaster recovery plan for the client to implement the changes.
4.1 Align disaster recovery risk potential according to business requirements
Evaluate risk protections for effectiveness and value for money.
Make recommendations about which strategies will be implemented.
4.2 Outline steps of disaster recovery plan including timelines, key features, service providers and any other aspect
Complete the planning to implement the recommended strategies.
Describe the key features and components of each strategy and a timeline for implementation and testing
4.3 Document disaster recovery plan according to business needs and requirements
Use the disaster recovery plan template to complete a disaster recovery plan.
It must also include:
When a disaster is called
Key people and contact info
SLA and UC info
Training requirements
Testing requirements
Audit requirements
- Task 5:
5. Test cloud disaster recovery plan
5.1 Conduct verbal tabletop walkthrough of cloud disaster recovery plan with required personnel
Choose a risk and conduct a verbal walkthrough to test the DR plan if the risk eventuates.
Allocate roles to team members
Evaluate the plan ...is it complete, is it current,
Provide and listen to feedback
5.2 Seek and respond to feedback as required
Provide and listen to feedback.
Implement feedback as required
5.3 Lodge cloud disaster recovery plan according to organisation and legislative protocol
Update the DR plan with changes agreed from tabletop review
5.4 Obtain final sign off from required personnel
Submit your plan to senior management for approval. (lecturer)
PART B: Demonstrated tasks
These questions relate to observation components for ICTCLD502 - Design and implement highly available cloud infrastructure
Task 6
- Identify high-availability requirements
Read the Project Requirements and DR plans to determine the HA requirements.
1.1 Determine reliability, recoverability and service levels required for application
Determine the business requirements and list the requirements here
1.2 Determine cloud infrastructure according to business needs
Determine the cloud infrastructure components required. List the components here
1.3 Identify level of shared security responsibility models according to business needs.
Research the shared security responsibility. Describe cloud security responsibilities and how cloud demonstrates that they have been met. Describe the client's security responsibilities.
- Task 7:
2. Evaluate architecture availability
2.1 Review architecture of traditional multi-tier web application in non-cloud environment and identify high availability requirements
Investigate the existing environment by conducting an inventory of services from the provided VMs of the static and dynamic web services in the non-cloud datacenter environment.
Identify the HA requirements and list them here.
2.2 Identify any single points of failure
Identify SPoF and list them here.
2.3 Estimate recovery objectives for multi-tier web components and for overall architecture
Estimate RPO and RTO for components
2.4 Determine components that must scale vertically and the potential impact on system availability
Determine components that must scale vertically in the existing design and the time to scale including outage times.
2.5 Document architecture review findings according to business needs
Complete a gap analysis of the existing service and environment.
Make recommendations.
- Task 8:
3. Design cloud-based architecture for high availability
3.1 Design equivalent architecture for high availability using cloud services
Use AWS well architected principles and patterns to design the architecture for high availability.
3.2 Identify and remove single points of failure as required
Identify SPoF and remove or mitigate each one.
3.3 Estimate recovery objectives for each component and overall architecture
Estimate the RTO and RPO objectives for components
3.4 Determine components that must scale vertically and the potential impact on system availability
Which components must scale vertically? What impact?
3.5 Document architecture design according to business needs
finised the architecture design.
Make sure all components are labelled with names and metadata.
- Task 9:
4. Implement cloud-based architecture for high availability
4.1 Implement architecture design in cloud environment
make an implementation plan for the build.
Build the components of your architecture.
4.2 Demonstrate connectivity between resources at all tiers
make a test plan.
Demonstrate connectivity between all your build components
4.3 Monitor and measure availability of resources
Demonstrate how you can monitor availability metrics and save some screenshots.
4.4 Simulate failures of component and confirm that infrastructure is fault tolerant
Simulate a component failure and measure time to recover.
Demonstrate the simulation failure and save some screenshots of the testing.
4.5 Simulate resizing components likely to impact performance and measure availability impact
Simulate a load and measure the impact on performance and availability?
Was an outage observed? Did performance decrease? How long did it take for the web service to scale?
Demonstrate the load simulation and save some screenshots of testing.
4.6 Compare and document simulation findings according to documented design
Submit your planning and testingto senior management for approval. (lecturer)
- Task 10:
- Finalise cloud infrastructure
5.1 Adjust and improve availability of architecture according to simulations as required
Review the results of simulations
5.2 Confirm, seek and respond to feedback with required personnel
Discuss simulation results and recovery times. How could performance be improved?
Collate feedback and ideas received
Make a recommendation to improve performance.
5.3 Obtain final sign off from required personnel
Complete testing documentation and submit to management