Question: Case Study The Launch and Hand - off Readiness Review at Google ( 2 0 1 0 ) One of the many surprising facts about

Case Study
The Launch and Hand-off Readiness Review at Google (2010)
One of the many surprising facts about Google is that they have a functional orientation for their Ops engineers, who are referred to as Site Reliability Engineers(SRE), a term coined by Ben Treynor Sloss in 2004. That year, Treynor Sloss started off with a staff of seven SREs that grew to over 1,200 SREs by 2014. As Treynor Sloss said, If Google ever goes down, its my fault. Treynor Sloss has resisted creating a single sentence definition of what SREs are, but, he once described SREs as what happens when a software engineer is tasked with what used to be called operations.
Every SRE reports to Treynor Slosss organization to help ensure consistency of quality of staffing and hiring, and they are embedded into product teams across Google (which also provide their funding). However, SREs are still so scarce they are assigned only to the product teams that have the highest importance to the company or those that must comply with regulatory requirements. Furthermore, those services must have low operational burden. Products that dont meet the necessary criteria remain in a developer-managed state.
Even when new products become important enough to the company to warrant being assigned an SRE, developers still must have self-managed their service in production for at least six months before it becomes eligible to have an SRE assigned to the team.
To help ensure that these self-managed product teams can still benefit from the collective experience of the SRE organization, Google created two sets of safety checks for two critical stages of releasing new services called the Launch Readiness Review and the Hand-Off Readiness Review (LRR and HRR, respectively).
The LRR must be performed and signed off on before any new Google service is made publicly available to customers and receives live production traffic, while the HRR is performed when the service is transitioned to an Ops-managed state, usually months after the LRR. The LRR and HRR checklists are similar, but the HRR is far more stringent and has higher acceptance standards, while the LRR is self-reported by the product teams.
Any product team going through an LRR or HRR has an SRE assigned to them to help them understand the requirements and to help them achieve those requirements. The LRR and HRR launch checklists have evolved over time so every team can benefit from the collective experiences of all previous launches, whether successful or unsuccessful. Tom Limoncelli noted during his SRE@Google: Thousands of DevOps Since 2004 presentation in 2012,Every time we do a launch, we learn something. There will always be some people who are less experienced than others doing releases and launches. The LRR and HRR checklists are a way to create that organizational memory.
Requiring product teams to self-manage their own services in production forces Development to walk in the shoes of Ops, but guided by the LRR and HRR, which not only makes service transition easier and more predictable, but also helps create empathy between upstream and downstream work centers.
Limoncelli noted, In the best case, product teams have been using the LRR checklist as a guideline, working on fulfilling it in parallel with developing their service, and reaching out to SREs to get help when they need it.
Furthermore, Limoncelli observed, The teams that have the fastest HRR production approval are the ones that worked with SREs earliest, from the early design stages up until launch. And the great thing is, its always easy to get an SRE to volunteer to help with your project. Every SRE sees value in giving
advice to project teams early, and will likely volunteer a few hours or days to do just that.
The practice of SREs helping product teams early is an important cultural norm that is continually reinforced at Google. Limoncelli explained, Helping product teams is a long-term investment that will pay off many months later when it comes time to launch. It is a form of good citizenship and community service that is valued, it is routinely considered when evaluating engineers for SRE promotions.
CONCLUSION
In this chapter, we discussed the feedback mechanisms that enable us to improve our service at every stage of our daily work, whether it is deploying changes into production, fixing code when things go wrong and engineers are paged, having developers follow their work downstream, creating non-functional requirements that help development teams write more production-ready code, or even handing problematic services back to be self-managed by Development.
Summary of the main points the author made and the lessons learned from case study The Launch and Hand-off Readiness Review at Google (2010)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!