> The only way to truly opt out of big-company organizational politics is to avoid working at big companies altogether.
This is perhaps what I find somewhat odd about Sean's writing. It sometimes reads to me like a scathing critique of the dysfunctional bureaucratic dynamics of big tech companies, but that isn't really his conclusion!
The key point is at the end of the OP. The dysfunction and bureaucracy are annoying, even to the people who make a career out of it, there's no level of enlightenment where it stops being so. It's just an inevitable consequence of doing some kinds of things and making some kinds of decisions. If you're faced with an important decision affecting 10,000 employees or a million users, there's no perfectly good way to make it, only a least bad way.
Lol thanks for reading my blog post! (Alex here) Your statement of my position:
> We live in a late-stage-capitalist hellscape, where large companies are run by aspiring robber barons who have no serious convictions beyond desiring power. All those companies want is for obedient engineering drones to churn out bad code fast, so they can goose the (largely fictional) stock price. Meanwhile, end-users are left holding the bag: paying more for worse software, being hassled by advertisements, and dealing with bugs that are unprofitable to fix. The only thing an ethical software engineer can do is to try and find some temporary niche where they can defy their bosses and do real, good engineering work, or to retire to a hobby farm and write elegant open-source software in their free time.
Let me re-state this in another way, which says functionally the same thing:
> Companies are hierarchical organizations where you sell your specialized labor for money. You should do what they expect of you in order to collect a paycheck, cultivate as enjoyable of a working environment as you can, then go home and enjoy the rest of your free time and your nice big tech salary.
Is this cynical? In some sense, sure, but I don't think it's inaccurate or even toxic, and I think it's probably how something like 90% of big tech employees operate. Sometimes your writing makes it seem like this is actually what you think. If your "objective description" of big tech companies were in service of this goal -- getting along better and not fighting the organization to preserve your own sanity and career -- I don't think people would take issue with it.
But you make the analogy of public service and seem in some sense to believe in values that are fundamentally at odds with these organizations. Is your position that, through successful maneuvering, and engineer can make a big tech organization serve the public in spite of internal political and economic pressures? This seems far more idealistic than what I believe. To quote Kurt Vonnegut, "We are what we pretend to be, so we must be careful about what we pretend to be."
> Logs were designed for a different era. An era of monoliths, single servers, and problems you could reproduce locally. Today, a single user request might touch 15 services, 3 databases, 2 caches, and a message queue. Your logs are still acting like it's 2005.
If a user request is hitting that many things, in my view, that is a deeply broken architecture.
> If a user request is hitting that many things, in my view, that is a deeply broken architecture.
If we want it or not, a lot of modern software looks like that. I am also not a particular fan of building software this way, but it's a reality we're facing. In part it's because quite a few services that people used to build in-house are now outsourced to PaaS solutions. Even basic things such as authentication are more and more moving to third parties.
The reason we end up with very complex systems I don't think is because of incentives between "managers and technicians". If I were to put my finger to it, I would assume it's the very technicians who argued themselves into a world where increased complexity and more dependencies is seen as a good thing.
At least in my place of work, my non-technical manager is actually on board with my crusade against complex nonsense. Mostly because he agrees it would increase feature velocity to not have to touch 5 services per minor feature. The other engineers love the horrific mess they've built. It's almost like they're roleplaying working at Google and I'm ruining the fun.
> If a user request is hitting that many things, in my view, that is a deeply broken architecture.
Things can add up quickly. I wouldn't be surprised if some requests touch a lot of bases.
Here's an example: a user wants to start renting a bike from your public bike sharing service, using the app on their phone.
This could be an app developed by the bike sharing company itself, or a 3rd party app that bundles mobility options like ride sharing and public transport tickets in one place.
You need to authentice the request and figure out which customer account is making the request. Is the account allowed to start a ride? They might be blocked. They might need to confirm the rules first. Is this ride part of a group ride, and is the customer allowed to start multiple rides at once? Let's also get a small deposit by putting a hold of a small sum on their credit card. Or are they a reliable customer? Then let's not bother them. Or is there a fraud risk? And do we need to trigger special code paths to work around known problems for payment authorization for cards issued by this bank?
Everything good so far? Then let's start the ride.
First, let's lock in the necessary data. Which rental pricing did the customer agree to? Is that actually available to this customer, this geographical zone, for this bike, at this time, or do we need to abort with an error? Otherwise, let's remember this, so we can calculate the correct rental fee at the end.
We normally charge an unlock fee in addition to the per-minute price. Are we doing that in this case? If yes, does the customer have any free unlock credit that we need to consume or reserve now, so that the app can correctly show unlock costs if the user wants to start another group ride before this one ends?
Ok, let's unlock the bike and turn on the electric motor. We need to make sure it's ready to be used and talk to the IoT box on the bike, taking into account the kind of bike, kind of box and software version. Maybe this is a multistep process, because the particular lock needs manual action by the customer. The IoT box might have to know that we're in a zone where we throttle the max speed more than usual.
Now let's inform some downstream data aggregators that a ride started successfully. BI (business intelligence) will want to know, and the city might also require us to report this to them. The customer was referred by a friend, and this is their first ride, so now the friend gets his referral bonus in the form of app credit.
Did we change an unrefundable unlock fee? We might want to invoice that already (for whatever reason; otherwise this will happen after the ride). Let's record the revenue, create the invoice data and the PDF, email it, and report this to the country's tax agency, because that's required in the country this ride is starting in.
Or did things go wrong? Is the vehicle broken? Gotta mark it for service to swing by, and let's undo any payment holds. Or did the deposit fail, because the credit card is marked as stolen? Maybe block the customer and see if we have other recent payments using the same card fingerprint that we might want to proactively refund.
That's just off the top of my head, there may be more for a real life case. Some of these may happen synchronously, others may hit a queue or event bus. The point is, they are all tied to a single request.
So, depending on how you cut things, you might need several services that you can deploy and develop independently.
- auth
- core customer management, permissions, ToS agreement,
This was an excellent explanation of a complex business problem, which would be made far more complex by splitting these out into separate services. Every single 'if' branch you describe could either be a line of code, or a service boundary, which has all the complexity you describe, in addition to the added complexity of:
a. managing an external API+schema for each service
b. managing changes to each service, for example, smooth rollout of a change that impacts behavior across two services
c. error handling on the client side
d. error handling on the server side
e. added latency+compute because a step is crossing a network, being serialized/de-serialized on both ends
f. presuming the services use different databases, performance is now completely shot if you have a new business problem that crosses service boundaries. In practice, this will mean doing a "join" by making some API call to one service and then another API call to another service
In your description of the problem, there is nothing that I would want to split out into a separate service. And to get back to the original problem, it makes it far easier to get all the logging context for a single problem in a single place (attach a request ID to the all logs and see immediately everything that happened as part of that request)
That's a good summary of the immediate drawbacks of putting network calls between different parts of the system. You're also right to point out that I gave no good reason why you might want to to incur this overhead.
So what's the point?
I think the missing ingredient is scale: how much are you doing, and maybe also how quickly you got where you are.
The system does a lot, even once in place, there's enough depth and surface to your business and operational concerns that something is always changing. You're going to need people to build, extend and maintain it. You will have multiple teams specializing in different parts of the system. Your monolith is carved into team territories, which are subdivided into quasi-autinomous regions with well-defined boundaries and interfaces.
Having separate services for different regions buys you flexibility in the chosen implementation language. This makes it easier to hire competent people, especially initially, when you need seasoned domain experts to get things started. It also matters later, where you may find it easier to find people to work on your glue code parts of the system, where you may be more relaxed about language choice.
Being able to deploy and scale parts of your service separately can also be a benefit. As I said, things are busy, people check in a lot of code. Not having to redeploy and reinitialize the whole world every few minutes, just because some minor thing changed somewhere is good. Not bringing everything down when inevitably something breaks it also nice. You need some critical parts to be there; but a lot of your system can be gone for a while no problem. Don't let those expendables take down your critical stuff. (Yes, failure modes shift; but there's a difference between having a priority 1 outage every day, or much less frequently. That difference is also measured in developer health.)
About the databases: some of your data is big enough that you don't want to use joins anyway. They have a way of suddenly killing db performance. Those who absolutely need it are on DynamoDb. Some others are still okay with a big Postgres instances, where the large tables are a little bit denormalized. (BI want to do tons of joins, but they sit on their separate lake of data.) There's a lot of small fry that's locally very connected, and has some passing knowledge of the existence some big, important business object, but crucially not its insides. If you get a new business concern, hopefully you cut your services and data around natural business domains, or you will need to do more engineering now. Just like in your monolith, you don't want any code to be able to join any two tables, because that would mean that things are to messy to reason about the system anymore. Mind your foreign keys! In any case, if you need DynamoDb, you'll be facing similar problems in your monolith.
A nice side effect of separate services is that the resist an intermingling of concerns that must be prevented actively in monoliths. People love reaching into things they shouldn't. But that's a small upside against the many disadvantages.
Another small mitigating factor is that a lot of your services will be IO bound and make network requests anyway to perform their functions, the kind that makes the latency from your internal network hop much less of a trade-off.
It's all a trade-off. Don't spin off a service until you know why, and until you have a pretty good idea where to make a cut that's a good balance of contained complexity vs surface area.
Now, do you really need 15 different services? Probably not. But I could see how they could work together well, each of them taking care of some well-defined part of your business domain. There's enough meat there that I would not call things a mistake without a closer look.
This us by no means the only way to do things. All I wanted is show that it can be a reasonable way. I hope there's more reason now.
As for the logging problem: it's not hard to have a standard way to hand around request ids from your gateway, to be put in structured logs.
> All of this arises from your failure to question this basic assumption though, doesn't it?
Haha, no. "All of this" is a scenario I consider quite realistic in terms of what needs to happen. The question is, how should you split this up, if at all?
Mind that these concerns will be involved in other ways with other requests, serving customers and internal users. There are enough different concerns at different levels of abstraction that you might need different domain experts to develop and maintain them, maybe using different programming languages, depending on who you can get. There will definitely be multiple teams. It may be beneficial to deploy and scale some functions independently; they have different load and availability requirements.
Of course you can slice things differently. Which assumptions have you questioned recently? I think you've been given some material. No need to be rude.
I don't think I was rude. You're overcomplicating the architecture here for no good reason. It might be common to do so, but that doesn't make it good practice. And ultimately I think it's your job as a professional to question it, which makes not doing so a form of 'failure'. Sorry if that seems harsh; I'm sharing what I believe to be genuine and valuable wisdom.
Happy to discuss why you think this is all necessary. Open to questioning assumptions of my own too, if you have specifics.
As it is, you're just quoting microservices dogma. Your auth service doesn't need a different programming language from your invoicing system. Nor does it need to be scaled independently. Why would it?
Diagnosing "failure" in other people is indeed rude, even if you privately consider it true and an appropriate characterization. It's worse if you do that after jumping to the conclusion that somebody else has not considered something, because they have a different opinion than you. At least that's my conclusion of why you wrote that. (And this paragraph is my return offering of genuine and valuable wisdom.)
Of course you can keep everything together, in just very few large parts, or even a monolith. I've not said otherwise.
My point is that "architecture" is orthogonal to the question of "monolith vs separate services"; the difference there is not architecture, but in cohesion and flexibility.
If you do things right, even inside a monolith you will have things clearly separated into different concerns, with clean interfaces. There are natural service boundaries in your code.
(If there aren't, in a system like this, you and the business are in for a world of pain.)
The idea is that you can put network IO between these service boundaries, to trade off cohesion and speed at these boundaries for flexibility between them, which can make the system easier to work with.
Different parts of your system will have different requirements, in terms of criticality, performance and availability; some need more compute, others do more IO, are busy at different times, talk to different special or less special databases. This means they may have different sweet spots for various trade-offs when developing and running them.
For example, you can (can!) use different languages to implement critical components or less critical ones, which gives you a bigger pool to hire competent developers from; competent as developers, but also in the respective business domain. This can help your company off the ground.
(Your IoT and bike people are comfortable in Rust. Payments is doing Python, because they're used to waiting, and also they are the people you found who actually know not to use floats for money and all the other secrets.)
You can scale up one part of your system that needs fast compute without also paying for the part that needs a lot of memory, or some parts of your service can run on cheap spot instaces, while others benefit from a more stable environment.
You can deploy your BI service without taking down everything when the new initialization code starts crash-looping.
(You recover quickly, but in the meantime a lot of your IoT boxes got lonely are now trying to reconnect, which triggers a stampede on your monolith, you need to scale up quickly to keep the important functions running, but the invoicing code fetches a WDSL file from a slow government SOAP service, which is now down, and your cache entry's TTL expired, and you don't even need more invoicing right now... The point is, you have a big system, things happen, and fault lines between components are useful.)
It's a trade-off, in the end.
Do you need 15 services? You already have them. They're not even "micro", just each minding their own part of the business domain. But do they all need their own self-contained server? Probably not, but you might be better off with more than just one single monolith.
But I would not automatically bat an eye to find that somebody separated these whatever-teen services. I don't see that as a grievous error per se, but potentially as the result of valid decisions and trade-offs. The real job is to properly separate these concerns, whether they then live in a monolith or not.
And that's why that request may well touch so many services.
Does anyone have examples of organizations that have leveraged SQLite and written about their experience? I've read a lot of theory and benchmarks about it lately and it seems extremely impressive, but I'm wondering if anyone has written about pushing it to its limits "in production"
I agree, "politics" exist in any human institution. The issue is when political maneuvering becomes the sole or primary role of a software engineer -- that is a symptom of dysfunction. The primary role of a software engineer should be software engineering.
>The primary role of a software engineer should be software engineering.
No. Before you were a software engineer, you were a member of society, choosing what to do, what not to, and who to trust. The software engineer exists on top of that person. We might want to constrain ourselves to that column of work, software engineering, but it is plain to see in just the last year where so many unethical systems have been unified and exploited that it is not sufficient to simply hand these powerful computational primitives out willy nilly without taking into account what is desired to be done with them. Politics is not above your pay grade, it is becoming the essence of your pay grade, because it is through technology that normal political systems are being upended or coopted beyond any intention of their designs, into which at their time of design, was factored in a degree of friction we remove in leaps and bounds, destabilizing the human systems underneath.
If you haven't been thinking a little bit about the ethics/political consequences of what you're working on, you haven't been doing your job.
This isn't really what my article is about. I'm talking about workplace politics, ie, internal organizational corporate dynamics, not like, world politics.
reply