Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What twenty years of DevOps has failed to do (honeycomb.io)
80 points by mooreds 1 day ago | hide | past | favorite | 155 comments




It failed because there is an ongoing denial that development and operations are two distinct skillsets.

If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.


The goal of dev is to be able to change everything whenever they want.

The goal of ops is to have a strong infra that has the fewest changes possible.

They are opposite and usually there are more devs than ops but the first respondent to an issue are ops.

You can only have devops if both roles are intertwined in the same team AND, the organization understands the implications.

Everywhere I've been, devops was just an excuse to transfer ops responsibilities to dev because dev where cheaper. Dev became first respondents without having the knowledge of the infrastructure.

So dev insisted to have docker so that they would be the one managing the infra.

But everyone failed to see that whichever expensive tools you buy, the biggest issue was the lack of personal investment to solve a problem.

If you are a 1.5x dev in a 0.9x team, you get all the incidents, and are still expected to build new stuff.

And building new stuff is fun.

Spending 2 days to analyze a performance issue because a 0.3x dev found it easier to do a .sort() in Linq instead of Sql is fun only once.


People can’t care about stuff they don’t know about. Split the roles and you split responsibility. It’s the same with dev and QA. Suddenly, the person paid to care about quality or stability realizes that the person who’s paid for something else doesn’t care like their job depends on it. Because it doesn’t. So OP above is right, splitting things and specializing horizontally is most times a bad and, if you think about it, not very smart move.

From someone who has managed both Developmentals team and Operations team for decades.. trust me, they are different beasts and have to be handled/tackled differently.

Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.

The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.


> Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work.

Could you expand on this? How would an organization be geared up for this?


Best example is the largest eCommerce conglomerate in the world: Amazon.

In the early 2000s, Google, Amazon and few other companies, were trying to crack the conundrum of APIs (Application Programmable Interfaces), i.e., do web-services.

Amazon cracked that conundrum the best and fastest way (and ultimately that's its AWS (Amazon Web Service) cloud platform rules the corporate world).

But how did Amazon do that IT innovation (beating other innovative IT services companies like Google), when it was merely an e-commerce and related site/company till then?

It's because Amazon did the unthinkable. They overhauled how their company worked.

Crucially, Amazon’s engineering teams were instructed in a 2002 memo - by Jeff Bezos, no less - to take an API-first approach: 1. Teams will expose their data and functionality through service interfaces. 2. Teams must communicate with each other through these interfaces. 3. No other form of interprocess communication is allowed. 4. Teams must plan for all service interfaces to be exposed to outside developers.

This mandate that every department in the company must communicate internally only using webservises was pivotal to the company's IT-focused transformation.

So if HR needed to share some payroll related data to Finance team, it needed to do that communication via APIs, instead of traditional way of attaching it via email or sharing over the shared network drive.

This kind of forced, rigorous inter- & intra- communications made the Amazon teams to encounter and resolve every type of issues and concerns that could be faces with APIs/we services and related IT technology.

And thus, Amazon was ready with such incredible new innovative, robust and scalable functionality ahead of every other company in the world.

Today, AWS is THE cloud platform of choice, and it helps drive most of the biggest websites and platforms of the internet and it has world's biggest/richest companies as its customers.

https://gatheringclouds.substack.com/p/the-rise-of-amazon-we...

I feel that simply forcing some Developmental teams to adopt DevOps or Agile in a company doesn't work, if the rest of the company doesn't support DevOps or Agile to the extent needed on a daily basis. Only such deep overhaul can ensure these sorts of innovative best practices can not only survive, but also thrive in the company.

And that's the only way the benefits of such radical changes can be felt where it's needed most: the customer experience and the revenue books.


I agree. The domains are just too large for anyone to be an expert in everything. Platform engineers are expected to know 3 clouds, k8s, cloudflare, security, SRE, python, javascript, CI/CD and about 20 other things. Its just not possible to be great at all the things at the same time.

Employers would rather pay one salary than 2. They are not punished for demanding more frim their employees.

We really ought to form some kind of union that operates across companies. We must demand better working conditions.



> You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.

Wasn't that the original goal of DevOps? Getting dev and ops not being siloes and get them collaborating? The "make devs do ops" definition seemed to come along later.


The original goal of DevOps never happened. Companies immediately jumped on this with "rationalizations" and "integrations" to make it so fewer people were in charge of more things.

Case in point, the number of companies who create "devops" teams completely missing the point of the exercise.

But is DevOps a role or a principle?

The way I have seen it in my carreer is to have operational and development capabilities within the same team. And the idea of a „DevOps guy“ is a guy „developing operations integrations“.

As opposed to completely siloing ops nd dev.


For most companies, it is a role, the new name for IT administrators.

DevOps is the practice of using modern software methods to automate the tasks of operations work. That includes using version control, templating languages and various forms of role-based configuration automation.

Anyone who thinks they can hire a devop or declare that they do devops is as deluded as 97% of the folks who claim that they are doing Agile. (If you are firmly on the other side of each of the four principles of the Agile Manifesto, you may or may not be doing great software development, but it's not Agile.)

The problem with the typical DevOps team is that there's no operations expertise.


> If you are firmly on the other side of each of the four principles of the Agile Manifesto

The agile manifesto has 12 principles (per the orig: "Twelve Principles of Agile Software")

Are you thinking of a different list when you say 4, or are you maybe combining some together?


My experience is that most companies don't do Agile, and DevOps is basically sys admin that also happens to own Jenkins or similar.

I've had recruiters telling me they are "looking for a devops" (not even a "devop").

> You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.

Are you claiming it's fundamentally impossible for people to get along, or just that positive interpersonal relationships can't be reliably forced at scale?


I mean, look at Kubernetes though. You have to understand both the application and the infrastructure in order to get the deployment right. Especially in any instance of having to pin the runtime to any type of resource (certain disk writing, GPUs, etc).

My experience has been that devs don’t understand their own app resource requirements

This would be considered a failure, or are you saying they don't need to?

I am saying that in my experience they get upset when the VM or container they provision blows up because it lacks enough resources or they do not place guardrails on their app and end up getting OOMKilled.

I think my favorite interaction with a dev around this was when I was explaining how his java program looked like a big juicy target for the OOM killer and it had killed it in order to keep the system working. His response was, "I don't care about the system, I care about my program!" And he understood the irony of that, but it was a good reminder that we have somewhat different views and priorities.

A developer not caring about the system is why the file explorer is painfully slow today, compared to 15 years back.

If a programmes doesn't care about the system, I already know he's shit at his job.


That is basically WinUI/WinAppSDK, the whole WinRT stack and related dev experience, where even plain .NET is faster.

However the team will advertise it as performance, due to being written in C++.

Pity it gets slowed down by COM reference counting all over the place, which cannot be optimised away, and the application identity sandbox.


And that frustration makes sense in the context of the article. Devs don't care about any of that stuff because they're customer facing, it's a distraction from their primary responsibility.

It would be like asking an Amazon delivery drivers to care about oil changes and tire rotations. It's much easier to have a team of mechanics whose primary responsibility is enabling drivers to just drive and focus on delivering packages.


That's not a kubernetes specific issue. If you run on VMs or Edge, devs also need to know the resource requirements. If anything, k8s makes that consistent and as easy as setting a config section (assuming you have the observability to know what good values are). The default behavior I've seen is to set reqs w/o lims so you get Sche'd but not OOM'd

You don't need 10x developers. You just need to avoid the 1/10 multiplier of pitting separate development and operations teams against each other.

DevOps is dead because it's run by a bunch of ops people who don't know how to do dev and a bunch of dev people who don't know how to do ops. The only tooling problem is that a bunch of companies created "DevOps tools" that then get dictated to use: K8s, terraform, etc. The only way this works is if you build the application to fit within those frameworks. Writing an indexer that is massively parallel and is mainly constrained by CPU/Memory. Instead, you have devs building something that gets thrown over the fence to a devops team that then containerizes it and throw it on K8s. What happens if the application requires lots of IOPS or network bandwidth? K8s doesn't schedule applications that way. "Oh you can customize the scheduler to take that into account". 2 years later, it's still not "customized" because they are ops people who don't know how to code. If you do customize it, the API is going to change in a few months which will break when you upgrade.

Would you say it's truly dead or that it fails to meet the performance bar you've described?

The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.

The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.

In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.


[flagged]


Spoken like someone who has never had to deal with business critical production environments.

It’s like saying that in a post-Viagra world there shouldn’t be men who have trouble getting laid.

Don't want to get too deep into your analogy. I was addressing the "DevOps cannot code" part. To me it is a leadership failure if a DevOps team is still afraid of tackling bigger challenges (like the example given by the OP). That, of course, depends on whether DevOps teams will exist in the long run.

The very fact that we are talking about "DevOps" teams (that do not include dev) is wrong from the very start.

DevOps is a methodology, not a role.


I've always felt that DevOps became a function/team partly because companies and especially SWE's started complaining that they were spending too much time "doing Ops work" and product/business started demanding more features for which they running out of cycles. And add to that the burnout from being on-call (especially if the dev team is relatively small and you have to go on-call every 2-3 weekends).

When I still did on call ops, devs got notified before us if their apps were the problem. We got notified first if it was our infra

Having an ops team does not mean devs get to through on call team over the wall to someone else. That's a sure recipe for resentment and turnover


For most HR departments it is a role, it even has a career path.

> the "DevOps cannot code" part. To me it is a leadership failure

Have you done devops yourself? It sounds like a resounding No. Like you complained ops doesn't like to code (not a core skill for the job), ops complains that devs can't understand basic concepts of how their software runs. Is this also a failure of leadership? Is everyone supposed to know parts of everyone else's jobs?


There are not very many ops people who cannot code. Especially these days. I spent at least the last 20 years doing ops. Ops people are HIGHLY motivated to create things that DON’T FAIL. However, ops teams are often blocked by MANAGERS from doing essentially development in the prod environment. I’m talking about tools and scripts. At the places I’ve worked with the highest uptime, it was because ops had an unlimited, unfettered free hand.

Remove the handcuffs from your ops team and your reliability will SOAR.


Average ops have never been less capable and adverse to programming than now. The problem is getting worse, not better. I know because I am in ops and one of the few who loves to code and accidentally entered the field

This is a red flag. You should find a new Ops team to work with.

No way. I have worked in ops for 20 years now; almost everyone knows how to code. Some enjoy it and some don't, but people are capable of it and will do it when needed.

I agree many can code, though a subset are certainly more scripting than engineering (like a typical 3-tier app)

There is also a subset that is very allergic to coding at this point. I've interviewed enough to see people who only know HCL/yaml. There is enough need and work (waste?) in the space that roles like this can exist


I see where you were coming from now. That sounds more like the Infra team. There are ops teams who are segmented in different ways. In my ops team, I don’t touch the infra and they don’t touch the applications.

I think that any kind of “modern ops” necessarily includes coding, even if there isn’t a ton of Python or Rust being generated as part of the workflow.

Kubernetes deployment configurations and Ansible playbooks are code. PromQL is code. Dockerfiles and cloud-init scripts are code. Terraform HCL is code.

It’s all code I personally hate writing, but that doesn’t make it less valid “software development” than (say) writing React code.


These things are not nearly equivalent. It’s writing code, it’s not software engineering.

Correct, it’s systems engineering.

It's configuration management, systems engineering is low level imo

I think you have it backwards. Systems engineering is the big picture discipline of designing & managing complex systems while config management is a specific process within that.

But the same is true of devs. Many of them are pretty clueless about coding. It's a whole generation of "bootcamp people" who were designers or bartenders and heard there were more lucrative jobs.

> most orgs are used to responding to a daytime alert by calling out, “Who just shipped that change?” assuming that whoever merged the diff surely understands how it works and can fix it post-haste. What happens when nobody wrote the code you just deployed, and nobody really understands it?

I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.


What happens if the person who wrote the code went on vacation? What happens if the code is many years old and no current team member has touched the code?

Understanding code you didn't personally write is part of the job.


I agree that understanding legacy code and code by other people is part of the job, but I don't see how these points are related.

> What happens if the person who wrote the code went on vacation?

They get yelled at, because shipping code at 5 pm on Friday and then leaving for vacation is typically considered a "dick move".

> What happens if the code is many years old and no current team member has touched the code?

Then the issue probably isn't caused by a recent deployment?


> every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM

Actually it could be the opposite: they hold the LLM responsible. When the code change breaks production they'll just ask the LLM to fix it. If it can't? "Not my fault, the LLM wrote it not me! We just need to improve our prompting next time!" Never underestimate humans' capacity to avoid doing work.


I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".

Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).

Software engineering purists have to get out of some of these religious beliefs


> Software engineering purists have to get out of some of these religious belief

To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?


I would be in the skeptics' camp 3-4 months ago. Opus-4.5 and GPT-5.2 have changed my mind. I'm not talking about mere code completion. I am talking about these models AND the corresponding agents playing a really really capable software engineer + tester + SRE/Ops role.

The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.


I agree the latest Gen of models, Opus 4.5 and Gemini 3 are more capable. 5.2 is OpenAI squeezing as much as they can out of 4 because they haven't had a successful pre training run since Ilya left

I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can


In my own anecdotal experience Claude Code found a bug in production faster than I could. I was the author of the said code, that was written 4 years ago by hand. GPs claim perhaps is not all that unsubstantiated. My role is moving more towards QA/PM nowadays.

I have many wins with Ai, I also have many fail hards. This experience helps me understand where their limits are

Do you have fail hards to share along with your wins? Are we going to only share our wins like stonk hussies?


For sure. Not hard fails, but bad fixes. It confidently thought it fixed a bug, but it really didn't. I could only tell (it was fairly complex), because I tried reproducing it before/after. Ultimately I believe there was not sufficient context provided to it. It has certainly failed to do what I asked it to do in round 1, round 2, but eventually got it right (a rendering issue for a barcode designer).

These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug.

The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.


Leadership will do what customers demand, which in most cases won't be ship-constantly-and-just-mitigate.

How to find problems through testing before they happen is a decades-long unsolved problem, sadly.


Even lesser agents are incredibly good and incredibly fast using tools to inspect the system & come up with ideas for things to check, and checking them. I absolutely agree: we will 100% give the agents far more power. A browser, a debugger for the server that works with that browser instance, a database tool, a opentelemtry tool.

The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.


If companies were generally capable of that level of awareness they would not operate the way that they do.

Yaml is my #1 failure in devops. That so many have resigned themselves to this limit and no longer seek to improve, it's disappointing. Our job is to make things run better and easier, yet so many won't recognize the biggest pains in their own work. Seriously, is text templating an invisibly scoped language really where you think the field has reached maturity?

Write it in a higher level language and generate the YAML from that. See the YAML as a wire protocol, not something you author things in.

exactly, why interop with everything that exists today is important

however, you don't want config being turing complete, that creates a host of other problems at a layer you don't want them


I know what you mean, but there seems to be some kind of misplaced fear about this which has led us down the garden path of unmaintainable config (or even trying to jinja template it!)

If your config is turing complete and consumed as-is, then without a lot of discipline you can dig yourself into a hole, sure.

If you're producing YAML that is not turing complete, that constraint means you have to code in a way that produces deterministic output. It's actually very safe, and YAML maps 1:1 to types in something like Python.

My favourite go-to example is for AWS Cloudformation:

https://github.com/cloudtools/troposphere


JSON so much easier in my experience and less prone to error

JSON does not have comments, no JSON5 is not the answer either

Think bigger, it's not something you are using today. The next config language should have schemas built in and support for modules/imports so we can do sharing/caring. It should look and feel like config languages and interoperate with all of those that we currently use. It will be a single configuration fabric across the SDLC.

This exists today for you to try, with CUE

I've been cooking up something the last few weeks for those interested, CUE + Dagger

https://github.com/hofstadter-io/hof/tree/_next/examples/env


Like XML? :)

Like Python?


Why not Python?

Typing is bolted on rather than a native concept, for one.

Why is that a problem?

Because types are important and having them be a native part of the language creates opportunities for error checking, editor completions, and LLM bounding.

Invisible scoping and turning complete

Python is better than bash in ops, been using more Go in this space

Config is another beast and separate languages


I’m not sold that config is a complex enough domain to necessitate another language. What problems is CUE solving when compared to python and why are those problems substantial enough to make it worth learning a new language?

That's exactly the thing -- complexity. Cue bounds complexity, like json, yaml, and toml. But it offers more composeability than any of them.

Given that we now have TOML, JSON, INI, CSV, YAML, etc it seems we are converging on either JSON, YAML or TOML. There is too much inertia behind those three and not much behind CUE right now.

CUE works with all of those languages, so it doesn't matter what the tools or others are using. I can always apply CUE at any point to output their required format as needed.

Keep your legacy config and mess if you want, you're the one missing out

Also, I don't see TOML in the wild enough and the others have been around long enough, I must chuckle and not take seriously these claims about "inertia"


I’m not claiming inertia makes TOML ‘best’, just that it’s clearly not blocked by inertia either. Cargo standardized on TOML years ago, and GitLab Runner has relied on it for a long time. If a format can win in major ecosystems, “people won’t adopt anything new” isn’t the whole story.”

I genuinely despise the identing requirements of YAML.

For comments, I use a _comment field for my custom JSON reading apps


I dislike the idea of _comment because it’s something that is parsed and becomes part of the data structure in memory. Comments should be ignored and not parsed.

When I wrote a custom deployment tool for some lab deployments, my Python based tool used JSON as the config language and comments were parsed I guess but not part of my data structure. They were dropped

yeah, this is what I'm talking about, innovation has stopped and we do dirty hacks like `imports: [...]` in yaml and `_comment` in json

How are people not embarrassed by this complete lack of quality in their work?


I don’t think we need anything formal resembling XML like JSON. It was originally meant for over the wire payloads and people like myself use it for more than that

You're still thinking "good enough". I'm advocating for the "we can do so much better" attitude

The current popular config choices cause a lot of extra work, bugs, and effort. Is improving the status quo not a worthy goal anymore? Are we at a point in history throwing our hands up and saying meh, I deal with this... is basically where people are today? (I'm somewhat a believer of this based on anecdata and vibes)


The uncomfortable reality is that config formats don’t win by being best. They win by being:

1. already installed everywhere,

2. easy to parse in every language,

3. supported by editors/linters/CI tools,

4. stable enough that vendors bet on them.


The config language we write does not have to be the same thing the programs read. Same analogy to compilers and assembly

Yaml has it's place and it is great for describing what your single microservice needs.

YAML is okay for writing structured prose for humans. It’s terrible for anything consumed by programs because even that single microservice has a high likelihood of some problem caused by YAML’s magic typing, silent data loss due to indentation, etc. unless you pair it with a separate validation tool chain, making the argument for simplicity increasingly dubious.

Validation is required, yes.

Sure, so at that point how much are we really saving versus using a better alternative? Using YAML correctly is harder because you need not only to do the validation everything needs to do but also doing other things specific to YAML to avoid problems created by YAML rather than the problem domain. For example, if typing less is my goal isn’t it easier to, say, always quote country_name rather than have to run a separate validator which catches the Norway problem?

Why not pick a config language that works with our current config formats, looks like out current config files, and addresses many of the dumb problems that arise only in current config choices?

It doesn't have schemas nor does it scale. It has no valid place because invisibly scoped languages are a terrible idea.

It's certainly insufficient, look at what happened to Helm


I’m with you that it’s terrible, but it very much does have schemas! The vast vast majority of YAML-based big APIs (k8s, helm, compose, and so on) all absolutely do check documents against schemas (not just ad-hoc validation rules) internally.

The real issue is two things: the smaller one is that there’s no single or self-describing schema system (like XML supports); the larger thing is that most YAML schema validations prioritize supporting extremely permissive and complex input documents over being predictable and appropriately restrictive. And that’s a harder problem to fix, because it has more to do with priorities and community conventions.

If people wanted strict schemaful YAML to be the norm, they would have consolidated on one of the many tools that does that by now. The issue is, people don’t want that; they want extremely flexible and open-ended APIs. YAML as currently practiced is conducive to that goal, but it’s the goal that leads to issues, not the choice of (bad, I agree) data language.


No yaml schema will save you when your HelmRelease will arbitrarily merge together your yaml files on top of kustomize on top of whatever else.

In practice schemas are mostly useless in my experience because people bend yaml as if they really really want a programming language instead.


DevOps only failed in that so many don't know what it is.

DevOps isn't a tool, but there are lots of tools that make it easier to implement.

DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.

DevOps isn't an organization structure, though the wrong org structure can make it fail.

DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.

From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.


I think this is the key insight: most of the problems are related to management decisions so they’re only DevOps failures to the extent that the movement failed to get political pressure to fix those.

I'd argue that it has failed in some organisations. DevOps for me is embedding the operations with the development team. I still have operations specialist, however, they attend the development team stand ups and help articulate the problems to the developers. They may have separate operations standups and meetings to ensure the operations teams know what they are doing and share best practices. Developers learn about the operations side from those that understand it well and the operations experts learn the limitations and needs of the developers. Occasionally I am fortunate to discover someone's that can understand both areas incredibly well. Either way, this results in increased trust and closer working. You don't care about helping some random person on a ticket from a tream you don't know. You do care about the person you work with daily and understand the problems they have.

If you can't account for someone spending x% of their time working with a team but for budgetary purposes belonging to a different team then sack your accountants.

DevOps,like agile, when done correctly should help to create teams that understand complete systems or areas of a business work more efficiently than having stand alone teams. The other part of the puzzle is to include the QA team too to ensure that the impact of full system, performance and integration tests are understood by all and that both everyone understands how their changes impact everything else.

Having the dev team build code that makes the test and ops teams life easier benefits everyone. Having the ops team provide solutions that support test and dev helps everyone. Having test teams build system that work best with the Dev and ops teams helps everyone.

Agile development should enable teams to work at a higher level of performance by granting them the agency to make the right decisions at the right time to deliver a better product by building what is needed in the correct timeline.

DevOps and agile fail where companies try to follow waterfall models whilst claiming agile processes. The goal with all these business and operating models is to improve efficiency. When that isn't happening then either you aren't applying the model correctly or you need to change the model.


If your developers weren't looking at dashboards before, they won't use a chat interface to interrogate it either. That doesn't really bring it to them any more than their existing capabilities. There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.

My underlying assumption is that this is a content marketing piece to show managers / investors that "we are doing/thinking something in ai as a company"

> There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.

I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error. It has been extremely hard since that demo to trust any AI for system level debugging.


If we were smart we'd use AI to grok a system in order to help us reduce its complexity. I don't think we're anywhere close to even being able to provide all the necessary context to solve problems like this.

(1) when was that? If it was less < 6months ago, the current gen of models is noticeably better

(2) AWS is not a leader, if even a contender, in the AI space. I would not evaluate the potential based on a demo they produced


I don't know of any other term in tech that people experience in so many different often contradictory ways that causes people to talk past each other because they're all talking about different things or been places that work so differently.

Agile is like this too.

Yeah I thought of that one, but still reckon DevOps goes well past it.

Am I the only one who remembers when DevOps meant "developers are responsible for dealing with the operational part of their software too, so that they don't just throw stuff over the wall for another team to deal with the 3AM pages"?

It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"


DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources, and yet expecting better services and better products from them.

My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.


> DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources, and yet expecting better services and better products from them.

Most places I've worked it was the even worse "we've laid off the ops team, now developers are responsible for both" followed by "no we can't hire any more developers, we have enough already".


> DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources [..].

Like everything, the original intentions must have been noble. But as we can see, looking back, it got popular and popular enough to get to the enterprise types.

Nothing really survives that.

PS: I have witnessed a sysadmin team being renamed DevOps and then SRE with not much other meaningful changes. I couldn't believe it at the time.


> Like everything, the original intentions must have been noble.

It was, ca. 2012-15. Sysadmins making automation tools so they could offload the horseshit, often batshit bash/perl scripting work, of manually provisioning dev environments (on VMs, or even basic configuration of new bare metal) to devs, who were already more comfortable with writing their own automation. Devs can unblock themselves, and devs hate relying on anyone else and everyone worships and fears the devs, so fine, give them the sysadmins' rope and rafters.

Moving to a "cattle not pets" mentality for servers well before the proliferation of containers and microservices, much less the mainstreaming of serverless workflows and cloud compute. CI/CD, to make software release processes scriptable, or even better declarative, tasks that could be tested and verified in version-controlled source _before_ being deployed, just like the software itself.

Better automation and better testing meant devs could ship safer and faster; devs owning pipelines meant devs could fix dev-related problems faster.

A lot of early devops tools were written by sysadmins who were tired of being buried by rapidly growing requests to unblock developers, who were outnumbering them by the hundreds or thousands to one at FANG companies (pre-FAANG, much less the big six).

Puppet attacked config management by turning it into declarative code, Ansible made that easier to deploy; Luke Kanies and Michael DeHaan came from sysadmin. HashiCorp made VM provisioning scalable; Armon Dadgar and Mitchell Hashimoto were compsci students who hated doing ops work with rudimentary early cloud services. Most of their early sales inroads into companies came from IT departments using their open-source products; most of their early evangelists were IT executives.

Google splintering devops into the SRE role they coined mostly reflected how they (thought they) had made the "devs unblocking themselves on provisioning" problem that had inspired a lot of foundation tools simply part of the dev culture, especially through GCS and k8s. They didn't think about "devops" anymore much like people don't think about breathing, and narrowed their focus onto uptime.

That was really the failure IMO, that the idea was mostly a cultural one: people working on a problem should also have a stake in, or ownership of, the things they need to unblock their work. A dev being "blocked" from dev work by IT because only IT can provision a piece of hardware or stand up a VM is a cultural problem; the largely open-source tools made by sysadmins and junior/student devs were a response to an entrenched enterprise culture that showed no interest in doing the work necessary to solve that problem.

The tools forced the culture change, but then the tools created their own culture, and the world that defined the culture also changed beneath them. But the companies built around those tools didn't want to die, so they turned devops into whatever might keep them alive.

The problem isn't that "devops" failed to do the job it set out to do (make sysadmins' lives easier), it's that the entire problem area changed so much, and so quickly, that its goal was no longer relevant. There were no "sysadmins" left to help; there are still systems, and there are still administrators, but their responsibilities have been diced up and tossed into the organizational winds.

Not quite as easy of a narrative for the founder of an ops company selling an ops product to frame in a company blog post, though. Not that things in the post are necessarily wrong, but IMO the problem isn't "devops failed", it's why the fuck are we still talking about devops? The word means nothing anymore, its massive overloading pollutes any discussion about who's having problems, what those problems are, and what the solutions to those problems might be.

Or, IMO the problem is that few to no people are asking the modern equivalent of "how do we make sysadmins' lives better?" They're instead chasing a ghost of a concept that peaked a decade ago, because that's easier than looking at an organization's failures from both a sufficiently high and low level to see the cracks that run all the way through them.


We tried this, but we just got more defects, because the Devs lost what little Ops knowledge they had. Where previously Ops would have to involve Devs, now that Production Support has some Dev knowledge, suddenly they get the blame for everything. Devs no longer have interest in things like "reading log files"; they just ship any problems over to Production Support.

Any day, I (as a manager) would prefer to have an experienced Developer do a Production Support role, rather than a cheaply obtained non-engineering campus graduate hired as "Tech Ops" resource to do Production Support on complex, mission-critical systems.

It is a bad idea for a company to give shoddy after-sales support to customers, because they would then lose the customer's trust and relationship in the long run. No customer wants to see their production systems have frequent incidents caused hours or days of outages.

Vendor companies ignoring investment and support for Production Support on their Products/Services, do so at their own peril.

In fact, canny companies have realised the real money is not in upfront cost, but in volume billing (billing/invoicing, based on monthly transactions counts, number of users/licenses and tiered rate card), so they need to have adequate Production Support teams

This is why companies are trying their level best to move existing customers to subscription services (e.g., Office 365 by Micro$oft).


You can find examples that go both ways for both endeavors, anecdata...

The problem in your case is not the dev vs ops split, it's a company culture thing which I'm sure you see play out in more places than this current focus


I am with you.

DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.

If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.


This is not true, you can make it work well either way. It's about people and processes, not about some specific setup or way of grouping people

That was an ambition of devops at one point, it has not born the fruit it promised. Dev teams are not positioned to do ops well. We have specializations for a reason

Indeed. Another comment brought up the comparison with the idea of "full stack" (https://news.ycombinator.com/item?id=46662777). Management would sure love it if we could all be interchangeable widgets, wouldn't they (with no pesky tribal knowledge either)

"I think the entire DevOps movement was a mighty, ... it failed."

I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.

I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.


For sure, I just turned down a gig because the company saw devops as an afterthought, not as something they would invest in. They wanted me to come in and "fix some issues quick" on a short-term contract. What they really need is 1-2 FTE ops people who think about their problems every day. If you are pushing past 3-4 devs to 10 and you have no intention of hiring a FTE ops person, you are not doing it right and shall reap what you have sown before long

My message to the CTO of Honeycomb.io (who apparently wrote this post): please avoid getting philosophical and controversial to gin up curiosity about your AI platform. If you want to highlight the benefits of your platform then do so earnestly and objectively. Please don't mask marketing with an excoriation of a profession that has never been well-defined (or has always been defined to fit into an organization's political landscape for the most part). And you guys (like every other SRE/Ops platform) capitalized on that structural divide and deservedly got rich by selling licenses to these teams. I don't think you can come in now with this holier-than-thou best practice messaging just because platforms like yours have zero moat in this post-CC/Codex world.

Hence my vitriol: https://news.ycombinator.com/item?id=46662287.


> id getting philosophical and controversial to gin up curiosity about your AI platform

Also: please could he please avoid doing it by illustrating his non-sense with graphs that are both childish and non-sensical?


The CTO is a she.

As a movement, DevOps failed a long time ago. Once the word completely lost its meaning, it was impossible to educate anyone about it. But as a business practice (which is mostly what it is), it's still a viable option that any business can implement. It just takes the right people rising into leadership positions to enact it.

It failed because for management and HR, DevOps means ops, for them it is the new buzzword for systems integration, integration engineers, sysadmin,..

I have been foolish enough to accept a few project proposals with DevOps role, which in the end meant ops work dealing with VMs, networking and the like.


I miss good ol’ classical sys admin.

Because the idea you can have all aspects of maintaining a complex piece of technology, maintained by a single cross-skilled team of interchangeable cogs, is utopian and unworkable past any reasonable level of scale

DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man

Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out

But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master


> the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple,

I don’t think this is the right take. “Silo’s” is an ill-defined term, but let’s look at a couple of the negative aspects. “Lack of communication”, and “Lack of shared understanding” (or different models of the world). I’m going to use a different industry example, as I think it helps think about the problem more abstractly.

In the world of biomedical engineering, the types of products you are making require the expertise of two very different groups of people. Engineers and Doctors. A member of either of these groups have an in-group language, and there is an inherent power differential between them. Doctors are more “important” than engineers. But to get anything made, you need the expertise of both.

One way to handle this is to keep the engineers and doctors separate and to communicate primarily via documents. The doctor will attempt to detail exactly how a certain component should work. The engineer will attempt to detail the constraints and request clarifications.

The problem with this approach is that the engineer cannot speak “doctorese” nor can the doctor speak “engineerese”; and the consequence is a model in each person’s head that differs significantly from the other. There is no shared model; and the real world product suffers as a result.

The alternative is to attempt to “break the silos”; force the engineers and doctors to sit with each other, learn each other’s language, and build a shared mental model of what is being created. This creates a far better product; one that is much closer to the “physical reality” it must inhabit.

The same is true across all kinds of business groups. If different groups of people are required to collaborate, in order to do something, those people are well served by learning each other’s languages and building a shared mental model. That’s what breaking silos is about. It is not “everyone is the same”, it’s “breaking down the communication barriers”.


I don't think that's like DevOps, though. A closer analogy would be a business that only hired EngDocs, doctors who had to be accredited engineers as well as vascular surgeons.

I don't think anyone thinks siloes are themselves a good thing, but they might be a necessary consequence of having specialists. Shift-left is mostly designed to reduce conversations between groups, by having individuals straddle across tasks. It's actually kind of anti-collaboration, or at least pessimistic that collaboration can happen


Oh, I completely agree! We created “EngDocs”, as you say, and simply made the situation worse. An EngDoc is an obviously ludicrous concept, on its face. But by breaking down the silo in the biomedical example, each engineer becomes a bit knowledgeable about an aspect of medicine and each doctor gains some knowledge about aspects of engineering.

I am arguing that all such people, whether developers or ops or ux designers or product managers; need to engage in this learning as they collaborate. This doesn’t mean that we want the DevPM as a resultant title, just that Siloing these different groups will lead to perverse outcomes.

Dev and ops have been traditionally siloed. DevOps was a silly attempt to address it.


I don't understand these graphs. Why do the lines go back in time?

feed"back" loops

I'm surprised no one has commented that open loops plus closed loops is more or less how velcro works, and that stuff sticks.

Has this buzzword even been around 20 years?

Like other fads, it has certainly failed to go out of existence as soon as it should have.

Scratching neck: come on... just one more vendor, bro

From the article:

> What the devs care about is the ability to understand the product experience from the perspective of each customer. In practice, this can mean any combination or permutation of agent, user, mobile device type, laptop, desktop, point of sale device, and so on

Really? Any permutation?

Most (arse hole) devs

- Import world - as it works on their latest 1TB machine or macOS studio

- always on the latest iPhone or pixel

- Add 100 tracking that works on their own machine

- POS device? They should ask some of devs to go and work on their canteens that have POS


DevOps failed because management never really understood that it needed to be a dedicated team, and that it shouldn't just be a little vassal state of the ops team, nor is it just "backend devs who know ops" who keep getting pulled away from devops work to do backend work.

IF DevOps had been allowed to thrive, we could have eventually reached the holy grail: A Heroku-like environment that developers can use to run their apps. Instead, we stopped short in the no-mans land of k8s, helm, AWS lock-in, terraform, hashicorp, red hat, azure, not to mention tons of spreadsheets owned by the ops team that developers must beg and plead to get changed.


I am lucky that the article is alien to me. I work somewhere we do devops. I write code and tests etc. but I would unlikely go a day without observing something in prod either by telemetry, dogfooding or getting paged. It is a really cool way to work understanding the whole system (within the team remit of course). And vaguely understanding neighbouring teams.

To the comments dev and ops are different. They are! I think the magic is massive platform team support too. I am not troubleshooting why splunk indexes aren't indexing for example.


You made a graph with T being the x axis and then had lines which go backwards in time. I closed the window at this point.

In my company, instead of relying on an ops team.. we rely on a devops team.

In my experience DevOps has little interest in doing actual DevOps - they just want to run ops. They want to advise (or tell us we’re holding it wrong) but not actually get their hands dirty. On the flip side, devs don’t want to spend a ton of time learning k8s or how to manage servers, cloud services, etc.

DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.


The vast majority of people who called themselves DevOps were opportunist corporatists looking to get a promotion. It became less Ops as in operations and more Ops as in opponent. These people were haphazardly given the keys to production and I was often bullied by their ability to control it. Management needs to know if my application was running? They asked DevOps who often came back with lies in order to fit their own career objectives.

I apologize if my words were sharp because many DevOps engineers were not mean to me. Perhaps I just had bad luck to deal with ignorant gatekeepers to production. You already know if my opinion doesn't apply to you.


I can't wait for indie developers to build super-agents that commoditize providers like Honeycomb.io and more importantly clone all their features and offer them up for free as OSS.

Sounds like you don't know what a nightmare of version compat and bespokeness ops/obv is. This is going to be one of the harder things for LLMs to do because everyone is running on some snowflake held together with duct tape

Fair point - my statement is more about stealing market for simpler integrations by undercutting them on price.

And I don't want to trivialize the reality of enterprise platforms where bespoke connectors rule. I have dealt with migrations of platforms that are business critical and managing version compatibility and ensuring none of the integrations regressed was par for the course. I am not even saying that that makes me qualified to replicate Honeycomb.io. But I do think someone with a deep technical background in building observability platforms armed with Claude Code or Codex and armed with the right set of MCP's and all the necessary tooling should be able to build a clone of Honeycomb.uio.

Maybe it won't be a fast turnaround like a typical vibe-coded project but even if it is a month-long project to even get to 60% feature parity. these vendors will have to sit up and pay attention.


> And I don't want to trivialize...

as you immediately trivialize something it seems you know very little about

MCPs are outdated btw, it's bad to attach a bunch of MCPs in with your messages, pollutes the context. If you don't do this, you can build agents that are better than copilot/codex on gemini-3-flash. Claude Code is probably the leader here, but still definitely not capable of what you it is


I assume then you are retired or not a programmer as you are wishing for the last bastions of comoanies that pay programmers to melt with the ice sheets, leaving the desert of no paid coding work.

DevOps only works when the developers are always right. What usually happens is the DevOps team thinks they know best (they are developers too, just not the ones using the tools), and they build a lot of garbage that no one wants to use, often making things more complicated than they were before.

Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control. This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity. Velocity slows, while the DevOps bottlenecks are used as a reason to hire.

It's an organizational problem, not a talent or knowledge problem. Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.


Don't attribute internal dysfunction and mistakes to an entire field. I've worked in an org where the opposite is true. Blanket statements like these never hold up because they lack nuance and usually are inspired by frustration

If an idea or methodology leads to "internal dysfunction" more often than not, than maybe it's a problem with the idea, and not a "lack of nuance".



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: