SAFe VS Platform Engineering

I know this is a very opinionated topic and "agile coaches" everywhere are ready to fight, so I'll try to keep it short making clear this is based just on my experience and on discussions with other engineers and managers in different companies and levels.

We’re a team of Scaled Agile SRE,
Working together to deliver quality,
Breaking down silos and communication gaps,
We’re on a mission to make sure nothing lacks.

We follow the SAFe framework to a tee,
With its ARTs and PI planning, we’re not so free,
To deliver value in every sprint,
Continuous delivery is our mint.

Chorus:
Scaled Agile and SRE,
Together we achieve,
Quality and speed,
We’re the dream team.

We prioritize work and plan ahead,
Collaborate and ensure nothing’s left unsaid,
We monitor, measure, and analyze,
Our systems to avoid any surprise.

Chorus

We take ownership and accountability,
To deliver value with reliability.

Chorus

So when you need to deliver at scale,
You know who to call and who won’t fail,
Scaled Agile SRE,
Together we’re the ultimate recipe.
ChatGPT4 & me

To not make this post too verbose I’ll try to focus only on two points that I find paramount in a SRE team living in a Scaled Agile framework (SAFe) with a Kanban style approach: capacity planning and value flow.

Capacity

What is your definition of capacity?

Most of the teams don’t ask this simple question to themselves and then struggle for months to give a better planning. Is that the sum of our hours per day? Or is it calculated based on each one capacity after removing the average amount of support, maintenance, security fixes and operations emergencies?

While learning to drive, in general but even more for a motorcycle, you’re introduced to the paradoxical concept of “expect the unexpected!“

Of course, this won’t save always your life but surely it can reduce a lot the probability of you having an accident. It will because you’ll stick to some best practices tested in tens of years of driving. Like to not surpass while not seeing the exit of a turn, don’t drive too close to the previous vehicle, always consider the status of the road, the surroundings and your tires before speeding up…

The good part of computer science is that you have a lot of incidents!

But this becomes a value only if you start measuring them and then learning from them.

So we should consider our work less like artistic craftsmanship and more from a statistical point of view, going back over the closed user stories and trying to get some average completion time splitting by categories (support, emergencies, toil elimination, research…)

Nobody complains!

You have now a rough estimation of how much time is spent on variable actions and maintenance, let’s say 20 hours per week.

You know also your fixed appointments will be at least 20 min per day for the daily meeting, 1 hour per week to share issues coming from development teams and 1 hour for infrastructure refinement (open tasks evaluation, innovations to adopt or to share with the team…).

Let’s say you won’t be neither on support (answering dev teams questions and providing them new resources) nor on call (supporting operations team solving emergencies).

This will give you around 40 – 20 – 1 (dailies) – 1 (weekly) – 1 (infra) – 1 (dev team weekly) – 0.5 (weekly with your manager) = 15.5 h/w of capacity, meaning 31h of capacity for the next iteration if it lasts two weeks.

Probably less since you know you have already other two periodical useless meeting of one hour each, so let’s round to 13 h/w ≈ 150 min/day of “uninterrupted” work.

Well… actually to not get crazy and start physically fighting my hardware I need a couple of breaks, let’s say 15 min in the morning and the same in the middle of the afternoon.

That means ≈ 120 min/day of “uninterrupted” work.

Fine, I assume I can take that user story we’ve evaluated 10h with high priority for the next week and a smaller one for the next week leaving some contingency space.

We publish this results in the PI planning and to the management, and nobody complains.

Long story short: if nobody ever complains probably you’re not involving stakeholders correctly in your PI Planning or worse you’re not involving them at all!

And that’s bad.

Why are you working on those features?

Why those features exist in first place?

If your team is decoupled from the business view, are you sure that all this effort will help something? Or do you smell re-work and failure?

We should mention also that these planning didn’t leave any space for research and creative thinking. People will start solving issues quick and dirty more and more.

Yeah, I could call Moss and Roy for a good pair programming sessions since they have already solved this issue in the last iteration but… who wants another meeting? Let’s copy paste this work around and go on for now…

How much value has my work?

To measure value, we need some kind of indicator.

There are a lot of articles on cons and pros about setting metrics for our goal even before starting. Let’s say here that you want to have a few custom indicators that proves to be a good estimation based on previous experience, they should take in consideration side effects and they should be some kind of aggregated result meaning that they shouldn’t be easily hackable (working only to improve the metrics and not the quality).

Maybe we introduce general service availability and average service response time as two service level indicators (SLI).

Then we start having management working on Value Stream Analysis to understand where this values since it was requested as a new feature by the customers before the current agile train.

They succeed to reduce periodical meetings by 50% and increase 1 to 1 communication. Now dev teams are able to solve issues by themselves thanks to better documentation and run-books etc…

Conclusions

Imagine you are trying to implement a complex application in Golang, after a while you’re still failing, so you decide to switch to Java Quarkus, that you don’t know and to mess around because you heard it is easier. After a while guess what? It still doesn’t work.

The same is for the Agile frameworks. People expect them to solve stuff auto-magically, but if we don’t put effort into changing our own behavior, into measuring our-self in order to improve (and not to give our manager micromanagement power), using the latest agile methodology will never solve our Friday afternoon issues.

Libertà vo cercando, ch'è sì cara, come sa chi per lei vita rifiuta

SAFe VS Platform Engineering

Capacity

Nobody complains!

How much value has my work?

Conclusions

Sources

Lascia un commento Annulla risposta

SAFe VS Platform Engineering

Capacity

Nobody complains!

How much value has my work?

Conclusions

Sources

Implementing continuous SBOM analysis

#1 Sharing Friday

Lascia un commento Annulla risposta