Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Workflow engines. Every company develops one base off of queues or some form of async messaging. Works great when prototyping and your initial customer base. Works less great as you grow, add more complicated features, and realize you didn't have the distributed systems expertise to write this thing to begin with. It doesn't handle any of the common edge cases, and is increasingly painful to operate, needing constant babysitting.

Use Temporal, StepFunctions, something and try to avoid this urge.



"Workflow engines" and their close cousins "DSLs" are the ultimate newbie trap. In theory, it's awesome for everyone: programmers get to work on interesting, abstract problems like distributed systems, syntax parsing, event-based architectures, and "business users" get to make changes to "business rules" without bothering developers or impacting feature roadmaps. Win-win, right?

In reality you just end up making a shitty, nerfed version of a programming language, that business users can't understand, because you still have to understand conditional logic to model workflows, oh and your documentation is terrible because devs don't bother with the boring stuff. Most of the time the devs end up implementing the workflows anyway because they don't actually work properly.

If you really need a workflow engine definitely use something off-the-shelf, but I would go so far as to say that in the 95% case, you don't even need a workflow engine: you need a developer who is capable of writing some python scripts. Even if you pay a developer a full salary to do nothing but sit around and make changes to Python scripts on-demand, that's still going to be way cheaper than the complicated workflow engine solution, which will probably require a team (or multiple teams) to maintain.


This reminded me of the time I wrote a mini-language for querying web server log files and generating reports (this was early 2000s) so you could answer questions like "how many users searched for 'cyprus flight' per day and went on to purchase a holiday?". With a nice web-based interface etc.

It was a disaster of course. The business people weren't programmers, had no understanding of programming, didn't want to learn programming even greatly simplified with docs and examples, and I ended up translating their queries into code anyway. So essentially I had written a DSL for myself.


With some tweaks to your second paragraph, the description fits CMake:

In reality you just end up [with] a shitty, nerfed version of a programming language, that [...] users can't understand, because you still have to understand [CMake's] logic to [utilize its magic behavior], oh and [the] documentation is terrible because devs don't bother with the boring stuff. Most of the time [you] end up implementing [workarounds] anyway because [CMake doesn't] actually work properly.


While Cmake has a large share of idiosyncrasies, a build system is much more graph oriented and must therefor at some point have a way to declare such dependencies, often using a DSL, for concurrency and incrementals to be reliable. Lets not forget IDEs that need to understand the project structure as well.

Workflows on the other hand are more conditional and imperative, mapping better to normal programming languages, with the exceptions of transient error handling, long timers and distributing workload. Here, writing a custom DSL would be a much bigger mismatch.

Best for build configuration would be a hybrid, where the declaration of dependencies is done with normal programming language, which at the end calls a build(dep_tree)-function where all the magic happens. With the risk of developers abusing this setup-step and run half the build in their own reinvented imperative flow here instead... trust me, I've seen this happen even in makefiles that run shell commands outside of targets. This is what Scons tries to be, it seems however not to be very popular compared to cmake.


Isn't Scons abandon-ware? I remember it being part of the build setup at work at some point and it caused quite a bit of frustrations because it broke every now and then. I believe it was mainly failing on Windows.

Conan's recent development is promising and it gives you the full power of Python. It can also be used declaratively but with limitations. If I remember correctly, there are discussions in the C++ world about a declarative exchange format for dependencies and build information, but it's in the early stages. It's not trivial because there are also C++ modules now.


It will work that way with anything :)

Especially gradle.


> the complicated workflow engine solution ... will probably require a team (or multiple teams) to maintain

You can get these "as a service" which might scratch the itch for some.

(Disclaimer: I work for a company that sells Airflow-as-a-service and adjacent consulting)


Low level infrastructure of any kind is the worst.

Programmers love to work on it, so there's never a shortage of good implementations, companies need it, so some of them get proper funding and teams, and yet it's really hard so there's no way your DIY solution will have the features of the big ones, and it probably won't have the reliability.

And worse, you'll probably need to learn and support the existing thing anyway, why not just skip to it?


You're probably thinking of a different type of "workflow engine" than what Temporal is. It's not something where business people drag boxes in a GUI. It's still all code, within your code base, only with different approach to handling long running (whatever that means in your context - 2 minutes or 2 years) tasks.


I agree about the limitations of DSLs.

Temporal is a workflow engine that doesn't use a DSL or nerfed version of a programming language. It runs your arbitrary code with any of the supported runtimes (currently Go/Java/Node/Python).

I'm able to write, deploy, and maintain workflow code by myself and use their cloud service for persistence and admin UI.


Damn, this hits the nail right on the head.


>"Workflow engines"

I actually had quite opposite about this particular area. At the time we were to develop particular product for a client company. The product would've greatly benefited from using workflow engine. I did some shopping around, talked to sales reps and have discovered that we would have to shell out at least $350K for our particular case. So I've proposed to the boss that I would quickly build one that would cover basics. The boss has agreed and I built it in about one month. It worked fine for what it was intended for.

After a while we have approached a vendor (the one we would have paid those 350K for their wares). We showed them what we have built, how it was used. They were impressed enough and we became sales and implementation partners. They have routed a gobbles of jobs, training and installations for us to do.

As for original home built engine - over the time we have replaced it with the one from our partner without much troubles.

Win win for everyone involved


I had very similar experience. We built a custom workflow engine, with visual designer in a few weeks, which then went with our DMS custom solution. Any workflow engine we tested, before or after the build, was either too complex or too expensive. We had few bugs and newbie mistakes in ours, but nothing too bad. We sold few more installations quickly because we could implement any customer need promptly. It's now more than 10 years and I moved on, but the company is still selling the engine with other solutions. I would say, a big win for build vs buy.


Temporal has really solved so many problems for us it is only opinionated about a few things that actually matter, and gives you complete flexibility otherwise.

The days of Airflow and similar seem like a stone age in comparison.


Major benefit of Airflow is the number of already implemented integrations. Importing data from GCS to BigQuery, copying data from Postgres to GCS, KubernetesPodOperator and so on. IIUC with Temporal you get only workflow management which can be easily integrated with any application to implement business logic. And this is great, because implementing business workflow in Airflow is even more awful than the Airflow itself. But for any ETL or plumbing job Airflow is IMO better due to existing integrations.


You are correct, that's the main difference. I wrote some more on the topic of data workflow engines like Airflow vs general-purpose application development workflow engines like Temporal here: https://community.temporal.io/t/what-are-the-pros-and-cons-o...


Is Temporal meant to be an Airflow replacement? The website exclusively offers examples of executing multi-step core business logic and not ETL workflows.


It’s for whatever your code can do, same as Airflow.

One of the great things a Temporal workflow can do is can send or wait around for signals from external processes, indefinitely if needed. It’s much easier to start orchestrating things you already have. You don’t really need to buy into it as much as you do with Airflow. If it exceeds retries or timeouts, it can send a signal or launch a process to notify a human that something needs fixed, then someone can intervene, then notify the workflow that it can keep going now. Airflow is much more all-or-nothing success or failure in my experience. Very hard to re-enter the workflow after something got twisted.

Certainly Airflow has more ETL integrations at this point in part due to how much longer it’s been around and the use cases it’s been evangelized for.

I had never worked in place that had much investment in the ETL integrations, we used dockerized processes and just the docker operator, as it was easier to develop and test independant of an airflow instance.


So, when looking at Airflow some time ago it looked like it was good at 'fixed workflows', something like, fetch last day's data from a website, process then load it at the DB

But it seemed it was bad at more flexible ones, like, load data, then process each entry in a certain way, then trigger a new workflow based on each entry (like send an email to every entry on the data fetched based on some condition)

Does Temporal does this?


Yes, Temporal workflows are as dynamic as needed.

The other useful pattern is always running workflows that can be used to model lifecycle of various entities. For example you can have an always running workflow per customer which would manage its service subscription and other customer related features.


Workflow engines are the most difficult thing to sell in organisations as the usecases are open ended. Organisations needs certain level of maturity to understand that they need one


People assume all workflow engines need all complicated document logic and business aware routing.

We built one for a medium sized business without any of that. It's essentially a form system where the users can select the next destination for the form's approval on their own. Users understand the business process and are responsible for implementing it in any other context, turns out, they are capable of managing it in an online forms system as well.

Then all you really need is an auditing system that tracks all the states the document has moved through and displays that to users who are making decisions based on the form and that state. Add "final approval" and "return for revision" and "recall" states and you're pretty much set.

No business specific logic. No need to keep the system configuration in sync with the organizational chart. No need to build "vacation delegation" or "user impersonation" features. You just need to keep the forms up to date with the business use cases, the users will manage everything else on their own.

Our system has been in place for around 6 years now. We do maybe two form updates a year. We have not changed the backend code or system logic since it was deployed. The only other support issues we have to deal with are when the LDAP integration configuration needs to be updated.


In a way your solution sounds like a JIRA tickets. Not criticising but a lot of upper middle management want guard rails.


To a certain extent, that was certainly the idea, the main difference would be that once a flow is started it is not editable. It has to be returned or recalled to be changed, and then the audit trail starts over again.

That's also a valid criticism. Especially if you're expecting a lot of automated processes to be kicked off once an appropriate approval chain exists. This really is only well suited to businesses where there are limited opportunities for automation. In this particular case, once they recognized that all of the terminal business process steps are mostly manual anyways, they understood the utility of something so simple.. and cheap.


I'm glad you got a solution implemented out of it. Nice to see reason win the day.


Nah, it sounds like Lotus Notes.


That and a lot of engineers are genuinely excited to build their own workflow engine - whether they call it that or not - because it’s complicated and it feels like they just discovered a brand new and powerful abstraction.

Tears follow when the team either doesn’t account for all the edge cases or doesn’t have the resources to address them.


Absolutely agree with you.


What exactly is a workflow engine?

At a previous job, we had a fair amount of celery tasks and logic around starting them based on user input or on a schedule, retrying on failures and marking progress or cleaning up state in various databases.

Is that a workflow engine?


Sure.

Open source analogue would be Apache Airflow.

Abstractly, it's some directed acrylic graph (DAG) that is asynchronously computed, sometimes on a schedule.

Unfortunately, most things fall under DAG. But the framework / engine exists to manage the complexity of the ever-extending pipelines declared by the engineers

Event/push-based workflows also fall under this taxonomy.


why acyclic? review steps can send stuff back to previous input steps, can't they?


Yeah, I mean, the computation can be re-materialized.

DAG may too specific. It's really a dependency graph, that likely has a DAG topology.


It is usually better to have explicit returns.


What makes Temporal a "workflow engine" rather than a background job runner? I think I used Active Job or its like a career ago in rails. The docs on Temporal are showing me retries and storing results of the job. That does seem useful!


Temporal jobs/tasks are called workflows because the code is effectively translated into workflow steps—i.e. code progress is made by workers, and each step is persisted so that if the worker dies, you don't lose anything—another worker picks up in the exact same place in the code, with all local variables and threads intact. It also provides helpful built-in features like retries and timeouts of anything that might fail, like network requests, and the ability to "sleep" for arbitrary periods without consuming threads/resources (other than a timer in a database that is automatically set up for you when you call `sleep('1 month')`).


This comment is just too real and too precise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: