We Are On The Cloud! And It Is A Little Breezy Here…
Updated: Jun 7, 2019
The Case For Cloud and DevOps Centers of Excellence and Enterprise Governance
DevOps, as a concept of operation, and Cloud Infrastructure, as a Service, emerged at about the same time – 2009/2010. The two combined may be the greatest innovation in process, culture, and technology from the first decade of the 21st century. No wonder, why they were embraced wholeheartedly by the industry at an unprecedented pace. The trend started from the social media/consumer service high-tech startups in the Silicon Valley and quickly spread to the other coast, where even the most risk-adverse and traditional organizations responded with cautious excitement (excitement, nonetheless). The White House published their Cloud First Policy in 2011, and U.S. Digital Services published the Digital Services Playbook (the de-facto US government guide for adopting Agile and DevOps) in 2014. I personally witnessed hardened waterfall, SoX-governed financial service strongholds and public sector FISMA (Moderate) programs being revamped, DevOps-enabled and shipped to the public cloud within months. In all cases, there was a brief moment of great excitement and pride, followed shortly by an “Oh, my! This is really powerful, but how do we drive it?”
Every new innovation goes through a quite predictable curve of adoption (Rogers, Diffusion of Innovation, 1962/2005):
The curve itself is self-explanatory and nothing new. We have seen it a few times around with Client/Server, ERP, E-Commerce, SOAP, just to name a few. Notice the Chasm of Disillusionment right after the initial peak of excitement from the early wins. It is applicable industry-wide as well as on Enterprise level. We are at the point where more and more enterprises have reached that Chasm of Disillusionment after the initial success of DevOps transitions and Cloud migrations.
“The cloud services cost is really high because anybody can (everybody does) spin up anything and there is no control. We lost visibility and ability to govern.”
“There is a proliferation of tools doing similar things and we are paying the overhead of each team maintaining their own toolset and creating their own pipelines.”
“Good DevOps engineers are hard to find and are very expensive. We cannot sustain critical staffing levels for full enterprise adoption.”
“The engineers quickly set up the cloud accounts, but now Finance and Accounts Payable don’t know how to budget and manage cost.”
These are concerns I have heard expressed by both commercial and government IT leaders. They reflect the natural conflict between the decentralized decision-making approach necessitated by DevOps and the need of large organizations, especially in highly regulated environments, to manage risk and maintain control.
Here are three recommendations to help large organizations cross over from the Chasm of Disillusionment to the Slope of Enlightenment and into the Plateau of Productivity:
1. Pace Your DevOps
DevOps and Cloud don’t provide the silver bullet to solve all problems in IT and Operations. Some teams, projects, and systems may benefit greatly by adopting the new practices and technologies, whereas others may be fine as they are, or use only a fraction of these practices and tools where the benefit significantly outweighs the cost and risk.
Gartner published in 2017 the Pace Layers of DevOps to illustrate the notion that depending on the context of the system, Traditional Ops approaches still have their place (Don’t throw the baby with the dirty water!).
Fig.2 Gartner Pace Layers of DevOps
Before committing your organization to a massive DevOps transformation and Cloud migration, review and rationalize your portfolio and assess where each system falls in the pace layers. Take into consideration dependencies and best architectural practices (loose coupling, APIs, microservices) to ensure traditional systems don’t hamper the cadence of delivery of the innovating teams and systems.
As a practical example, my team supported the O&M of a major legacy government system (System of Record). The system was built as a traditional 3-tier monolith and it was running within the acceptable business and operational SLAs. The system was sunset for retirement/replacement in 2-3 years – there was no real business case for migrating the system to AWS. The problems began once the client started building a few new microservices to modernize the system (System of Innovation) directly into the Cloud and with full DevOps tooling. The new microservices were fully enabled for daily deployments. However, when it came time to integrate the new modules with the old system, it became very obvious that our 3-week release cycle could not deliver the changes needed by the new microservices fast enough (As Systems Thinking postulates – every system moves only as fast as its slowest component). It became necessary (and urgent) for our client to invest into retooling the old system and taking full advantage of the rapid cycle times of the Continuous Integration / Continuous Delivery pipeline we implemented for them on the AWS cloud.
2. Adapt Your Governance Models
IT Governance was put, in the first place, in your organization for good reasons. Smart, practical and adaptable governance models do reside even in the most open and decentralized organizations. Decentralization is not the same as chaos, and Governance is not the same as bureaucracy and rigidity. It is the latter, not the Governance, that kills innovation and hampers progress. New, modern, DevOps-friendly Governance still serves the same purpose: providing a framework of control to ensure all pieces of the system (organization) work in sync and within boundaries of performance parameters (financial, regulatory compliance, customer service, market expectations). However, Governance models for DevOps and Cloud must exhibit these additional characteristics:
Openness – transparency in all decisions and the rationale behind the decisions.
Inclusivity – broad participation from various parts and levels of the organization: Enterprise Architecture, DevOps practitioners, Business, Shared Services and Infrastructure, Procurement, and Finance.
Bi–directionality – listen to the voices from the trenches, because they are the ones closest to the problems and to the solutions.
Flexibility – apply the Pace Layer approach above to determine tradeoffs between agility and stability depending on context. Allow teams to experiment first and then catch up on control framework and compliance.
Agility – don’t let the “perfect” be the enemy of the “good enough” and don’t let a Governance construct be the bottleneck for innovation. Governance decision must be made as quickly as the teams need them.
3. Create Centers of Excellence and DevOps Service Lines
The Center or Excellence (COE) concept is not new – it emerged in the 80s from Manufacturing (same as Lean and Kanban). However, with the advent of Agile and DevOps some practitioners took the concept of “self-governed teams” a little too far, which practically led to each team reinventing the wheel, creating their own version of Scrum/Kanban and building their own CI/CD pipelines with their own toolset. It brings the members of a team closer together (at a startup with a single product line), but for large organizations (multiple product lines and value stream release trains), it creates proliferation, overhead, and pockets of siloed expertise, ultimately distancing the teams apart.
Large organizations can achieve economies of scale by creating DevOps/Cloud COEs and cross-product line service lines (some organizations call it Shared Services). There are various flavors of COE implementations:
Formal (charted) vs. informal (grassroots community), staffed by dedicated full-timers vs. staffed by part-time matrixed contributors, push (incubators and disseminators of knowledge and solutions) vs. pull (aggregators and consolidators), prescriptive (mandating their practices and procedures), descriptive (providing advice and knowledge resources), or augmenting (providing hands-on engineering support).
Shared Service – formally defined IT management service, bound by policies, procedures, and SLAs, delivering well-specified repeatable services to the product-line teams (Think CI/CD as a Service, or Cloud Managed Services).
Each organization can choose the flavor depending on their context and strategic goal. The advantage of this approach is that it reduces overhead, drives consistency and allows all business lines to take advantage of the best practices.
Two critical caveats and considerations when designing and instituting your COE/Service Line to avoid the trap of creating a new authoritative structure with no feel for the pulse of the business and product line needs:
Maintain the feedback loop – provide a mechanism and a forum for the product line teams to share their local innovations and nominate them for promotion and adoption at the enterprise level.
Maintain flexibility – allow highly-innovative teams (working on a new product line, or revamping an existing one) to move ahead at a faster speed (The feedback loop above will ensure that their innovations may one day become a new shared service for the entire enterprise).
It is the role of the Governing body (above) to make that promotion/adoption and exception waiver decisions.
Tying It All Together
An Enterprise DevOps/Cloud Governing body can and must be as flexible and Agile as any of the line teams. It makes the determinations for the Pace Layer of each system and the level of differentiated governance and agility trade-offs.
The Governing body provides valuable oversight and directional steering of the overall direction of the enterprise DevOps. To make an analogy, it acts as the “legislative branch” and decision making authority for setting the mandatory, enterprise-wide policies, as well as deciding which policies and decisions could be delegated to the product line teams. The COE/Shared Service Line acts as the federated “executive branch” providing compliant services and support to all teams while still allowing the product line teams to maintain their autonomy.