Like any TDD proponent (and most cultists, really), the author insists that „you're not doing it right”, despite reading the scriptures. You're always misunderstanding, YOU are the reason TDD doesn't work, no, TDD is itself flawless. There's always a slight misinterpretation of the magic words! But TDD works, you're the sinner for not using it right!
And then you have shocked people when they find out that 100% test coverage doesn't mean that you really have a bug-free codebase.
And on the flip side: what if they're right and you are just doing it wrong?
I understand the point you're making. But, I find this retort to be every bit as frustrating as the rhetoric you're criticizing. It gives us permission to dismiss things we don't fully understand or that require experience and practice to master as snake oil.
Surely most of us would struggle to pick up general relativity or neurosurgery even after weeks or months of training. But we're not--I hope--going to dismiss our neurosurgeon instructor when they say we were cutting something incorrectly even though we REALLY THOUGHT we were doing it right this time.
Maybe TDD really is awesome. And maybe, simultaneously, it's not easy to write a 100% prescriptive guide for how to apply it to every kind of project.
I feel similarly about the rhetoric around OOP. Nine times out of ten, someone complains about OOP and cites some kind of Poodle-Dog-Animal class hierarchy. Then someone comes and says that program object relationships aren't really supposed to be taxonomical. Instead of taking that to heart and wondering if maybe they can try OOP again with a different mindset, the response is defensive. "Surely OOP is still terrible because I was taught the wrong way. And if it's possible to employ a technique ineffectively, then it must be a terrible technique. OOP sucks and they're in a cult so they can't admit it."
Do you have any idea how many wrong ways there are to use the controls in an automobile? Yet we still mostly blame the drivers when they cause a collision because they were using it wrong.
For what it's worth, I think the article is right about the evolution of the term "unit test" and the author mentions the "classicist" approach to unit testing, which is really the one that makes sense with TDD. The "mockist" approach to unit testing, is "simpler" because it just makes an individual class or function the "unit", but that will make tests brittle and make a test-driven approach much more verbose and cumbersome. It also so happens that the mockist approach is the default approach in today's programming languages, testing-frameworks, IDEs, etc.
I don't know, your examples feel a bit off. Surgery spends a ton of time on the techniques of surgery, sure. And yet we still had a proliferation of excess back surgery that was, in retrospect, unnecessary and we have taken effort to stop that. Similarly for driving, you are right that media often blames the drivers; but civil engineering looks at the roads that have more wrecks than others and looks to see why that is.
TDD suffers because it is a tool. As soon as it exists for its own sake, expect that it will be over used. This is more complicated because Tests are themselves a tool. So, having a tool that exists only for the sake of another tool, and it is not too far to see how this one is less clear cut.
I definitely didn't spend a lot of mental energy on my analogies, so I'm not going to defend them per se. But, your counter points actually make me more comfortable with the analogies- not less.
I'm not claiming that TDD is good and that it's only the practitioner's fault when things go wrong. Rather, my point is that it's hard to know like the civil engineering example you describe, and that we should be humble enough to acknowledge that "you're doing it wrong" might be a true statement despite how smart we believe ourselves to be.
In that, I agree. I don't know of many (any?) tools that are by nature bad. I just think there are many that are oversold.
That is, my point wasn't to say that you can't do it wrong. Rather, doing it right may not further the end goal. Just look up the article of someone trying to TDD a sudoku solver. It is painful, even though there really isn't any one thing that the person did wrong.
> But TDD works, you're the sinner for not using it right!
I often tell my colleagues that if technologies or
methodologies are widely misunderstood, then practitioners aren’t to blame.
TDD might be great, but I have yet to see it widely succeed because its adoption is troublesome. It’s a bit like any of these newest products that promise to address everything consumers demand, only to fail miserably against the same old, leaving the few adopters asking themselves why commoners didn’t get it.
I would agree… except that MOST programming technologies and processes are widely misunderstood, and it feels increasingly so.
Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture. I’ve been part of summits trying to do this several times over the decades, and the only agreement we wound up with was that “Layered abstractions are sometimes a good idea”.
There is a lot of duplication of effort in languages, frameworks, and tools driven by “I can’t be bothered to learn this new idea, I know better”, and a lot of misunderstanding of even first principles.
Something like TDD (and other XP practices like pairing) reminds me of diet/fitness /nutrition regimens that are notoriously difficult to comply with. This isn’t to say they don’t work: athletes and bodybuilders do exist, as do many successful TDD practitioners. It’s just human nature to avoid things that require mental change, or to be prone to self deception in their application.
> Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture.
I see this often when the problem is abstract. When the problem is concrete, I rarely see this come up in practice.
I routinely have 4-5 senior engineers in a room agree on architecture, design and methodology for a single, specific, project. To me, these abstract disagreements are a symptom of a generalization mismatch.
> Something like TDD (and other XP practices like pairing) reminds me of diet/fitness /nutrition regimens that are notoriously difficult to comply with.
But these regimens do work. It's just that they don't work for the reasons most people believe.
Take fasting. Most benefits of fasting in casual practitioners come from the obvious fact that they aren't consuming as much calories as they did before, so they lose weight. That doesn't mean fasting works in a different way from, say, just eating less.
Same goes for TDD. It may be beneficial because it forces a culture of testing and code coverage, but the root of such benefits may not be writing tests before code, yet that's what most people agree TDD is about.
Another reason why it's so hard to actually say "tdd does/doesn't work" is that everyone has their own idea what it is.
Its like everything in software engineering: everyone has an opinion, few have actually tested theirs with performant and stable production deployments
There are problems where TDD is actually a decent approach. If you have solid requirements beforehand, and your problem does reduce to small bite sized units with trivial state, then it's amazing for producing code that doesn't surprise you when you run it.
There are problems where it really really really isn't. Things where there is ambiguity in the expected outcome. Language processing for example is fairly difficult to tackle with TDD. There is no simple state, there is no unambiguous expected outcome. Whether "jump" is a verb or a noun non-trivially depends on the context, and it may even ambiguous. The correctness of such code is a percentage value, not a boolean.
For context, when I debug the keyword extraction for my search engine crawler, I'm looking at test output that looks like this: <https://memex.marginalia.nu/pics/frog-text.png>
(blue are individual keywords, red are potential n-grams).
All that means to me is that when we start a project we think about what it means for an implementation to be correct and write a test. If we agree on the specification, the test, then a passing implementation means that we have implemented it correctly -- for that particular test.
One of the challenges of using unit tests for this kind of testing is, as you say, difficult with modelling continuous values.
It is also difficult because the "proof" is so small and trivial. You need thousands of examples to gain some confidence... and for even moderately complex software this can not be enough.
Unfortunately, the path to writing good tests and verifying software requires a bit of mathematical vocabulary and reasoning which many developers are too shy to learn or outright hostile towards ("I've never needed no stinking math, this is programming!").
My point is: as long as you're driving the development of your software using testing, I think you're practising TDD.
Unfortunately, it's hard to be put into a position where you can apply one, let alone multiple new approaches to software development on a team developing a large application for a long time (say at least 2 years). You'll have team churn, you'll have requirements churn, you'll have understanding of methodologies and practical realities change, etc.
So by "testing their (opinions)", I usually understand people to mean that they've been on a project where they've enjoyed working on something in a particular way (people always underestimate what someone is simply enjoying doing, I have no idea why). I understand that almost nobody is able to compare and contrast two significantly different approaches, let alone establish things like incident rates or cost of improvement and such resulting from application of those approaches (because there's no reference point).
What we do need is for all developers to contribute to setting up a foundation dedicated to establishing software development methodology quality, and then have that set up a dozen teams of 4-8 engineers working on a dozen longish problems (6-36 months) with different methodologies, and then aggregate and analyze results in 12 years. All the engineers would have market-rate salaries, so we'd all have to be chipping in monthly :D
I mean, just imagine where we'd be if we've done that in 2010? (probably at the same place, but just maybe not!)
Or there's hidden assumptions that differentiate between a Google solution and a coffee shop solution.
Performant... stable... both have ridiculously variable definitions based on who you talk to. That alone is probably enough to get your 10 Sr. developers to provide different solutions.
I don’t think the assumptions are hidden. There is plenty of literature with a reasonable discussion of the range of definitions of performance and scale. It’s that people aren’t even looking for them. It’s about “Team Red vs Team Blue”. Or whatever buzzword needs to be on your resume alongside leetcode for the next interview. NoSQL vs SQL, FP vs OOP. Can’t get my raise if I’m building coffee shop solutions!
The counter to TMTOWTDI is a set of principles and discipline, at least within a team or a project. Not to outright reject the alternatives, but to recognize the power of consistent behaviour and prioritization of tradeoffs, to try to counter the brand allegiances.
> Get 10 random senior programmers in a room, and see if they agree on matters of design, code structure, testing, frameworks, system architecture
Programming consist of different schools of thought, similar to philosophy or economics, thus conflicting view points are not necessarily wrong per se, but they collide with your school of thought. Therefore there will always be some disagreement even between the most experienced programmers.
Now there are of course still things that a right or wrong on a factual level within programming, but because different schools also labels things as right or wrong it can be hard to distinguish what is grounded philosophical reasons and what is based on factual reasons.
In my experience, there is little disagreement with how to solve a problem as with the problem being solved.
I've found that within the high performing engineers, they tend to agree quite often with the solution once they agree upon the priorities of features and the problems being prevented.
Development practices are not difficult to comply with because you make them work for you, not the other way around.
The issue that I see in software engineering is the XOR approach. Top-down vs bottom up, abstraction vs implementation. This is the wrong thinking. One of the classic books on software architecture said paraphrased "The architecture drives the implementation and the implementation drives the architecture". A classic book on software development said paraphrased "You design for the implementation you need and implement for the design you want.
It's a yin and yang. But many software engineers completely fail at this. They get stuck at implementation or design. You need both at the same time. They're trying to represent a many dimensional problem one or more dimensions lower than required. They're flat-worlders trying to describe a cube or even a tesseract. Of course there's going to be misunderstandings and mistakes. Everyone has a difference view of the cube. They're technically talking about the same thing, yet fail spectacularly when the parts don't quire align when they come together. The problem is very simple in a higher dimension.
Take Agile. It clearly outlines that you cannot have managers getting in the way. Yet, I don't know how many companies I've seen trying "Agile" without giving up their managers.
If you're not even willing to move past the first step, indeed it is doomed to failure. But surely you should recognize that when you decide to go down that road?
Removing the managers from the equation requires a team of developers who have a very business-oriented mindset. I expect a lot of developers don't have that, and may not even be capable of it. For any given business, chances are that does not describe your team. I think it is fair to say that Agile is not realistic in the vast majority of cases.
Is the fault Agile for being written for a narrow scope of situations or for practitioners trying to adopt it where it was not designed to fit? I would suggest the latter.
> Is the fault Agile for being written for a narrow scope of situations or for practitioners trying to adopt it where it was not designed to fit? I would suggest the latter.
I believe Agile is one of the main offenders here. You suggest that Agile is not "realistic" in most cases, and I agree with that, but Agile consultants and practitioners like to say that Agile is about taking what works for you, and adapting it to your needs.
I think Agile adoption by people who believe in this, is doomed. When Agile is seen as a set of nebulous guidelines from where you can pick and choose, it just doesn't work, and in turn, these experiences feed the idea that Agile doesn't work as a whole.
It's like everything in programming. The 1% of the elite can use almost any methodology with great success, but the other 99% constantly fail to perform.
The problem solving aspect of programming is hard for most people. Nearly all of the high performers that I know are on the high functioning autism spectrum. They spearhead the hard technical problems, and set the stage for everyone else to tag along wit the busy work.
It takes all kinds to make a team, but we still need to recognize everyone's strengths and weaknesses.
And not every problem in programming is a technical problem. There are more issues in communication and understanding the problem being solved. Doesn't do much good to have a technically correct product that doesn't quite solve the problem.
What software engineering needs is a methodology in management to identify and properly utilize everyone to their unique abilities.
Like everything in programming, the cultural part is more difficult than the technical part. I've seen more than one a fellow coworker comment all my tests. If you don't create the correct culture, and there isn't management buy in, it won't work.
I believe that in medicine, if a treatment fail because the patient doesn't follow it correctly, it is considered a failure of the treatment (of course better education can be part of the solution).
I worked with a guy who has since transitioned to teaching Agile methodologies, and one day he was talking about his mentor, 'Jay' and the methodology they used within Jay's consulting practice.
Most of the successful projects, Jay brought in a team of people he had worked with before. So at some point you have to ask if it's the methodology that makes the project successful, or the people who make the methodology work.
If you have ever been a Lead, or even a viable candidate for one, you've had meetings with others to conspire to be successful within the bounds of whatever rules management won't budge on. How to make this number say what we want it to say. How to make that graph go in the right direction. In these cases we are camouflaging our personal methodologies to pass for someone else's. We are succeeding in spite of them, because if we told them what we were doing they might make us stop.
So when they look back at the project, all they see is the things we let them see. The opinion of the person who introduced a change is never the one you can trust. You should ask the people in the trenches what they think.
How many of the people who misunderstand the technology or methodology actually learned it from some kind of official or primary source, though?
Compare that to how many of us learned "OOP" or "TDD" or "FP" or "DDD" from a few blog posts.
Is it physicists' fault that I can "learn" that quantum mechanics says CERN opened up a wormhole and alternate timelines from a crystal salesperson on Facebook?
There are a number of tech fads that were unknowingly widely promoted as a panacea whose applicability is limited to particular specific circumstances.
Microservices is another.
TDD gets pushed especially hard (I think) because when it works well it works REALLY well and because it can be quite literally addictive - the red/green being like sounds on a slot machine that generate a hit of dopamine.
Thats a recipe for some passionate promotion.
In fairness to the original author, his heterodox form of TDD does widen its applicability beyond its traditional scope of complex, stateless algorithmic code with simple APIs to the more common integration code that involves databases, etc. that predominates in most commercial code bases.
But, what tech innovation DIDN’T start as something with limited applicability? The only way to know is to discuss the benefits and tradeoffs in something approaching social science.
But humans struggle with rigor, it’s much easier to brand and market something, or to buy in to brands. So “ideas” like microservices become brands. And they’re misunderstood and misapplied because people don’t read the copious literature that discusses the tradeoffs and variations. And they don’t practice it as a discipline with someone that has mastered the technique successfully: they do it blind.
Same goes for TDD or other XP practices that are often deride as cultish. being a discipline is even harder to adopt than a design philosophy like microservices. Disciplines are about consistent behaviour. To an outsider, it’s freaky. But calling it cultish as some do is like saying Karate or another martial art is a cult. From the outside it kind of looks like it, but discipline or kata (practice of form) is known to be a success multiplier for the sustained successful application of practices.
If you don’t have a dojo or a sensei, could you teach yourself such a set of martial arts to mastery? If not, why do we expect everyone to pick up TDD after reading a book?
TDD gets pushed because it creates great, easily trackable metrics one can gesture to as evidence that a) your code is good and b) that you're doing a good job and should be paid more.
It makes developers happy because it translates the somewhat arcane nature of the work into something easily digestible by management, and is a fig leaf for shoddy work.
It makes management happy because it goes nicely on a chart that can be shown to the director/customer/shareholders, and looks good at status meetings. It also gives them something to poke at and micromanage.
It makes the customer/shareholders happy because it provides a metric that their money is being spent _doing something_.
TDD may have started as a coding best practice, but it exists and endures - and will continue to exist and endure - because it's performative, and the performance has value to every layer of a business, even though it has nothing to do with actually making the product better at this point.
The author has described what they thing would make the practice of TDD better. The rationale presented isn’t ridiculous. Reasonable people could disagree on it.
> you have shocked people when they find out that 100% test coverage doesn't mean that you really have a bug-free codebase.
This applies specially to business people.
100% test coverage just means you tested that s*t with all you expected...
...but your software does not live isolated.
Some external system may have a bug, or may inject some unforeseen values, or a solar flare might flip the value of a bit in your system and crash your software.
I see completely different problem. Just because there are branches in your code, doesn't mean there are branches in its inputs. And just because there are branches in the inputs, doesn't mean they are reflected in the code. So you may have 100% test coverage, but be testing branches which aren't going to be taken, at the same time completely missing the branches that are in the inputs, which you'll fail to handle. Example:
fn abs(x: i32) -> u32 {
x as u32
}
fn test_abs() {
assert_eq!(5, abs(5));
}
Boom, 100% test coverage! But the tests are actually very low quality. There is a bug when x is negative. That's why property-based testing is nice. It uncovers branches in the inputs (although the task of generating a representative set for the inputs is sometimes non-trivial).
After discovering the bug, one may even write a branchless implementation of this function for performance without updating the test, and it will still be 100% coverage. But the arithmetic has "logical branches" which do not look like ifs, instead they generate qualitatively different results for different inputs.
I tend to find logical/algorithmic bugs overemphasized. Property tests are nice and work very well for extremely complex algorithmic code that has simple inputs and outputs but most code I write at work simply isnt that.
In school I wrote parsers. At work I have done it but it's rare.
In fact if you write commercial code you'll often find that the code that you rely on that is like that weirdly gets concentrated in open source libraries which you will be testing only indirectly.
Certainly when writing commercial code I find that the majority of bugs lurk in the interstitial spaces between subsystems I am integrating or in misunderstandings about how the overall system or subparts of it are supposed to behave.
And property tests are not much help there and unit tests are often a hindrance (because theyre as likely to bake in wrong assumptions).
TDD is some help with this but only if A) it's paired with BDD to exorcize specification bugs and B) done with integration tests that exercise all parts of the system together.
The sort of programs you are writing about can easily exhibit the sort of problem that yakubin is giving an example of - here's a comparable example:
wasBusinessDay(date): Date.dayOfWeek(date) not in (Date.Saturday, Date.Sunday)
Which is as trivial as yakubin's example, but can fail for more, and much more nuanced, reasons - and no, you are not, in general, going to find an off-the-shelf answer to the question of whether your particular business was operating on a given day.
Both your claim that a big problem is things not working as they are supposed to[1], and the fact that the same old basic security problems get repeated in every new platform that comes along, show that libraries have not, in practice, been any sort of panacea for business software.
[1] I suspect you are referring to application code rather than library code, but either will do for the point I am making here.
I think property tests work well on larger things, with a super basic assertion they're essentially just fuzzers. You can point hypothesis at a swagger spec and let it test your API like this.
I built a property testing library back in the day when I made a library for creating UIs. I was then able to write tests like
* Given an arbitrary UI
* And an arbitrary up/down/left/right list of user direction (this was the only way of navigating)
* If they press a direction and the focus moves, pressing the opposite direction takes them back to where they were before
This uncovered bugs in interfaces like this with 3 items, bottom left is in focus and you press right. Now press left, users probably want to go back to the bottom left rather than top left.
* Given an arbitrary list of API calls to add/remove/change the UI and user direction presses
* There is always only one item in focus, or no items at all
This actually uncovered a specification bug. We had two requirements
1. Always have an item in focus if there are any that can be in focus
2. If you delete all the items, then add one back in, it's not focussed
Those conflict, but we never noticed, and even had passing unit tests for both cases.
I think property tests can map very nicely to the level of system description that we typically want. I'd love to see larger integration with BDD tools.
Maybe it's true for CRUD apps. I was speaking from the experience of writing compilers. From what I've read, logical/algorithmic bugs are also common in gamedev. I'm most interested in algorithmic code though, so that may be a bias.
Logical and algorithmic code (& bugs) were a lot more common in gamedev before the onset of the open source game/physics engines. It has followed a similar evolution to the rest of software development whereby core algorithmic components tend to get shared in generic, battle hardened OSS systems like Unity. Most games will rely on their APIs rather than implement their own physics or rendering engines and this is only becoming more common over time.
College students who start out by writing toy compilers, quicksort implementations and physics engines will often get a distorted view of what professional development is actually like.
Someone has to write those libraries you're using, and they need to test it. Maybe you're satisfied with having a job consisting of gluing libraries together, but not everyone is like that and there is no need for being patronizing towards them.
abs on machine-sized integers has another easy-to-make mistake: what's the absolute value of the signed byte -128?
You face some hard decisions on that one. None of the error handling paradigms are good at handling the class of numeric errors that people really, really want to pretend don't exist because no matter what kind of error they throw they're a major pain to deal with correctly.
I'd agree, if the result type was the same as the input type. However, notice that the function I wrote returns an unsigned type, so in the case of byte-sized types it would be fn abs(x: i8) -> u8, i.e. there is enough space for 128. Otherwise, good point.
100% test coverage only covers one dimension of the code, the lines. The other dimension is the data range, and proper coverage is generally not measured.
So 100% test coverage you still have tons of possible codepaths that can go sideways, way before including external variability.
My best testing practice is writing tests as needed based on instinct. YES. I can feel it when a function needs test or not. It will hint by giving a little anxiety on expected result of that function. And I don't care 100% coverage anymore.
So currently there is no any approach that scale at all. Everyone will write test differently.
And that's why we have code review process, and senior software engineers?
---
EDIT: add an example
When you look at a body of function and think if someone else changes some codes and you will have a hard time figuring out what goes wrong, that, the time you need to add some tests to it. Often it's a function that took numbers of iteration to work as you expect, or a function that is not simple but look so easy you write it right in one go.
I’ve never seen an approach that scales. Once a team passes a certain size, it all turns into a disorganized shit show. Communication is hard, and as the pressure to earn revenue intensifies, there’s less & less time or buy in for craftsmanship.
Yep, true. I'm really familiar with the vibe of time-to-market vs not to fuck customers up. Modules that are directly related to values to customers get more tests.
Why is 100% coverage stupid? I agree that there could be configuration or data classes that does not make sense to test, so these could be ignored by the coverage tool. Is it still stupid to aim for 100% coverage in the rest of the relevant not-ignored code?
And (this gets into personal opinions) sacrificing depth on tests on critical codepaths in order to spend time getting to 100% coverage is also stupid.
Time is finite. Money is finite. 100% coverage provides nothing but false assurances and pretty green metric lights.
100% coverage measured by lines of code being hit is actually insufficient for proper test coverage (which is what most, if not all tools that measure code coverage do).
It's easy to get "100% code coverage" for the below function:
This is an area where suitable extracurricular activities can give more perspective on things.
In many disciplines there are activities that end up feeling like 'scaffolding', learning exercises that you don't necessarily do all the time. They make you a better practitioner the first time you do them, continue to add value for some time, and then are useful from time to time as a refresher.
I think if you've never done TDD, you're missing out. If you had to be dragged kicking and screaming into it, you're missing out (possibly have damaged yourself in the process) You don't need to make a cult out of it (and I suspect some of the resistance to it is from people who feel like they're being asked to join one), anymore than you have to religiously clean your shoes to see value from having a set of cleaning tools.
The same for pair programming. It makes you look at some classes of problem in an new way, and those lessons can stick with you even when you are working alone on a personal project.
Most of the time these days I tend to do a testing sandwich. When I'm still trying to figure out what the APIs will allow I'm free-form writing code trying to stick the concepts together. At some point I get stuck trying to juggle corner cases, and I realize that I'm grinding gears and need to write test code for a while. At this point I still get a lot of the ergonomics benefits of TDD because I'm not yet wedded to my implementation (low sunk cost) - because it doesn't work anyway.
Outside Smalltalk, TDD has never "worked" for me. But in a Smalltalk environment it was an absolute no-brainer and super slick because the environment was set up for it to just work. The nuance is the code-build-test cycle. In Smalltalk, they were all one and the same. In other languages and with other VMs, that cycle is disconnected. Likewise, once we moved to web applications, things changed. Now the unit we want to test is the API, or the widget, not really the function per se, but every TDD tradition has been at a much lower level. It overlooks that it takes real time and energy to build out tests at that higher level, and there are better tools than what have traditionally been "TDD."
We are in the year 2022. We don't have the technology to produce bug free software based solutions.
100% code coverage, as other mentioned in this thread, does not guarantee you exercised 100% of all the inputs values possible.
That is why:
- You can pay millions of dollars to a Software Company, and when you install the product the usual license text, says something in capitals along the lines of:
NO GUARANTEES WHATSOEVER FOR ANYTHING...
plus some will mention you can't use for controlling X-Ray machines, Nuclear Power generators and so.
and also why
- Software where life depends on, normally will use a type of summation or
consensus based software like the Shuttle had with 3/4 computers.
I don't think it's entirely technological limitation. More often than not, the discovery of bugs is increased understanding of what we require from the code. We often say a code is buggy when it surprises us in some fashion, then we tacitly invent a new requirement we say it has violated.
If you have a function
F(signed int32 n) -> (signed int32 A, signed int32 B)
that returns two integers (A, B) so that
A*B = n
and we discover that F(2) -> (-2, 2147483647), which is entirely correct in a language that permits integer overflow; then we call it a bug because A and B must be smaller than n (or whatever). This was not a requirement until the bug was discovered.
I’m not really sure if this was understandable, but I’m basically in the same camp as you are.
It’s next to impossible to have 100% bug free code. Automatic testing helps and TDD helps to write tests, but it’s not the goal. The goal is to create code that is well tested and easy to change.
I talked about Software based solutions that is also why I referred to summation/consensus based systems like the Shuttle had. I even believe
they used different programming languages, since of course compilers can also have bugs.
To use an example like seL4, is formally proven correct against its specification. But how we got the specification itself correct?
It is a lot easier to get the spec right than getting the implementation right.
> How seL4 or CompCert protect me against against hardware issues like ECC errors, or the Pentium FDIV ?
That's a straw man argument. Nobody claims that formal proofs can protect you from hardware failures. However formal proofs can protect you against Pentium FDIV style bugs (see ARM formal specification efforts). Intel started massively investing in formal proof methods because of the not-formally-proven-correct Pentium FDIV fiasco.
Thanks for replying, but I insisted in my previous post that I referred
to Systems and Solutions.
In real world use, and other than in academic contexts, is of reduced usefulness that I claim my component used TDD and was formally verified, while
throwing over the wall, the responsibility of delivery for a system that can't fail, to somebody else.
So my claim is, we don't know how to do it, we can formally verify some components, and we hope the spec is correct.
Forgetting the spec only has the known known's and unknown known's,
but not the unknown unknowns, is a kind of intellectual hubris
I am not willing to bet lives on. Certainly not under the
disguise of using formal methods.
So I am referring to examples like this:
"A380 Flight Controls overview" [1]
• 3 PRIMary Flight Control and Guidance Computers
integration of Auto Flight (ex FGEC) and Flight Control (ex FCPC)
• 3 Auto-Pilots
• 3 SECondary Flight Control Computers dissimilar Software and Hardware,
simpler Control Laws
There's a hierarchy of smartness: good enough to write perfect code and need no tests, good enough to catch problems with tests, good enough to believe that better techniques would have allowed catching bugs that slipped through tests.
Any technique, in retrospect, could have been applied better, in an improved form.
Quite a few people are saying it doesn’t work. I have worked on TDD codebases where anytime you changed a function or tried to refactor something trivial it took 3x the amount of time to fix the tests even though the code worked correctly as written. This is due to mocking or naming or expecting functions to never change. TDD proponents would say that is the “wrong” way to do it, which Im sure it is but that doesn’t stop it from happening and killing productivity.
This matches with what the articles says:
> "You change a little thing in your code and the only thing the tests suite tells you is that you will be busy the rest of the day fixing false positives."
particularly since TDD organizations often maintain the constraint that the tests _should never change_ since they somehow embody the requirements. personally I'm not settled on an architecture until I'm part of the way through..I can't really understand how its all going to fit together until I'm in the process of building it.
...and the 3x amount of time isn't even the biggest downside when that happens, that honor goes to how much the value of those tests erodes while they are being adopted.
I have encountered this too, just as you and the author did. However, when I switched to integration testing behaviors rather than unit testing low level functions as the author suggests it went away.
Traditional TDD proponents would say that what we're doing is definitely not orthodox TDD but IMHO it's the only way that actually works.
> But how many test suite have you seen which are less helpful?
I got ya. I have worked on software with a buggered mix of mocked and integration tests that fails about one time in four. But since the bug was in some asynchronous code and would appear in random tests, with random errors (even on thoroughly mocked tests), we can't easily smash it. It's one of those heisenbugs where enabling logging would cause all the tests to pass consistently.
And to put icing this cake, the software worked - this bug wouldn't appear in the production deploy.
So, yeah. The test suite was less than helpful. It certainly couldn't prove any qualities about the software we were writing.
BDD is more about having conversations about behavior with stakeholders using examples.
If it's translated into tests it becomes ATDD. These make excellent tests which catch bugs and provide a safety harness for refactoring. It is the sole place where "TDD" actually shines.
Lower level TDD that isnt about that at all are where all the problems are (including, frequently, driving poor design).
Let's say you are doing a classicist style TDD, where you try to mock at the highest level you can (files, network, UI) and then just exercise all the classes in between until you reach your target piece of code.
Can you give me examples of TDD producing a bad design?
I can see how a mockist approach, where you mock everything for each function you test, can be detrimental very quickly. But I have the feeling I'm mastering the classicist approach, and I don't see how tests can negatively impact design.
This second definition was retroactively added once it became clear that TDD’s not good at testing. Unfortunately it’s even worse at designing since it’s only capable of “designing” how something may be tested and used without addressing any e.g. non-functional requirements, data structures or algorithms.
There’s tension between testability and good design (encapsulation, achieving more through abstraction, hiding/reducing complexity, usability, etc). Starting the design process from API or method calls makes no sense at all unless one wanted to reanimate a failed testing method.
I would call code coverage "good" when I trust that:
- When I change the logic of a statement somewhere in the code, at least one test will fail in the suite.
A high % in a test report may give an indication but is far from giving me confidence. Having good code coverage means that, once you are back to green, and you have done to test code what will keep or improve your code coverage confidence, you can send the PR being tranquil that you didn't break anything else.
> The big tdd misunderstanding is to believe that it provides good test coverage.
Have you considered that, because TDD is a design practice (a 'meta-design') for writing code, there is no "one correct" interpretation, or misunderstanding, just different understandings? Yes, the people might be misusing it as per your understanding of it, but you are not the center of the universe here.
You are correct that "it's about X", the people who are misusing it are also correct about their interpretation of it. Fundamentally, all TDD is the 5 steps of "writing a test first, and writing code that makes that test pass", along with some mentions of other meta-design approaches like KISS, etc.
The way someone understands a meta-design is driven by a lot of factors, the way someone uses a meta-design is subject to the cultural context in the place you use it, the design restrictions and constraints of your tooling, the meta-design constraints of what practices management is willing for you to devote time to, etc.
So fundamentally, you cannot create a design practice (or meta-design) that is applicable to all scenarios and circumstances. TDD will be outright inappropriate for some situations. This is exactly what "design patterns" are about, finding similar or identical ways of designing code, categorizing them, and figuring out what circumstances they do and don't work in. Now, I guess we need "design practices patterns", so we as an industry can move on to figuring out what practices are fit for certain scenarios, and what scenarios they are unfit for.
Ultimately, anything less than the understanding that "context is king" is likely to lead to people just going around in circles and talking past each other.
Have not read the article yet but I guess it can be solved using incremental tests. First we write the tests for the core features of the API then while implementing it to pass the tests, we'll find that there arw some edge cases that we have not consider, so we write tests for them.
In addition tests framework like jest provides good API around the coverages.
I had one interview a few years ago when I had to solve FizzBuzz in a TDD way; I realised quite quickly that to solve the problem as described with TDD resulted in a more complicated solution, including things like changing the API to allow me to inject a "printer" so that the test could pass a double instead.
Still a very easy problem, but the irony of TDD making FizzBuzz harder made me smile.
> including things like changing the API to allow me to inject a "printer" so that the test could pass a double instead.
In what language? Isn't the basic case of fizzbuzz a procedure/function that takes a number and returns a string (if you are strongly typed, not able/willing to return a complex type)?
Printing and reading input would be side-effects, surely? (so you plug the input to your fizzbuzz, and the output to a stream, maybe via a formater?)
I don't think treating it as two subproblems: generate a sequence xyz, print the sequence to standard out - is terrible over-engineering - especially if it enables simpler tests?
Ed: or in the case of fizz-buzz; a function to transform a number (returns "<number>", "fizz", "buzz" or "fizz-buzz"), a function that maps a sequence (eg 1..100) via fizbuzz(), and then a function that prints such a sequence.
As a corollarly, it often results in mostly functional code that is easy to test without any mocks, so good coverage usually happens "by accident" (not really).
Tests are executable examples of code under use in different scenarios, usually showing both input and output.
Textual documentation is an attempt to extract the most useful bits of the first two into a more manageable form, and potentially explain the reasons for said code's existence if it's not obvious. It will always go out of sync with code in small ways that will bite you, unless you've got executable documentation like some OpenAPI implementations or the venerable "doctests" in Python.
All of them are ways to learn about the API, or iow, "documentation".
Another huge problem with TDD is a lot of practitioners do not create negative tests. A negative test verifies that something the system shouldn't do is not being done. For example, if you shouldn't allow unauthenticated users unauthenticated users to delete something, there should be a test to verify that authenticated users can't delete something. But if you only add tests that will fail and then make succeed via implementation, you would probably never think to make those kind of test. You certainly can create negative tests with TDD, but most docs on TDD never discuss it.
This is how I’ve been making sure multitenancy in my app doesn’t leak data. It shouldn’t leak but how can I be sure? Run tests that try to access data on one tenant from another tenant and fail if the data is accessible.
Most of the time we make tests when something is in our radar.
I guess you have use case in your spec that tells you that's the expected behaviour, or at least some clever developer figured out it should be done that way.
But it is not always so "easy": Sometimes you don't have the domain knowledge, or the spec doesn't say a word about negative cases...
Run tests that try to access data on one tenant from another tenant and fail if the data is accessible.
That tells you that the method your tests use to try to access data don't work, not that it's impossible to access data across tenants. If there's a leak you haven't tested for then your app is still insecure.
It'd be better to think of a design that actually makes it impossible (assuming that's even possible for your scenario).
I think it's well-designed, but I would definitely want someone to go over it and look for any obvious issues if I made it public. I wanted it to be annoying as possible to do something outside of the tenant context and it is definitely annoying. Absent an explicit flag to disable tenancy for a specific operation, the operation will fail, raising an exception that the tenant id was not set. This is mostly enforced at the database interface right now i.e. the interface ensures that a tenant id is present in the query. Then in the database itself, null values for the tenant id aren't allowed. I also have a test that runs over every table and ensures there is a tenant id column except for those that are explicitly listed in the test as not needing a tenant id. I also have it set up to not expose the tenant id to the user at all, instead opting for a "public id" with limited utility and only used where needed.
Agreed: tests only test for discrete "points" of the input space (even with things like hypothesis testing and randomized input testing), and in general, in pure TDD, it is sufficient to test for one each of the class of points (eg. only one negative case to ensure negative part of the code is triggered).
But we should not forget we are all working against legacy code bases and libraries outside our control, and sometimes, tests might be the best available tool for achieving something other than their primary purpose, which I'd happily accept as a fun hack.
You can certainly sometimes write a test that fails first to have a test that an unauthenticated user can't delete something.
You write the test before have implemented the authentication system. Either generally entirely, or just in the "delete something" subsystem. You write a test that says when unauthenticated user tries to delete something, an error of the appropriate kind happens. The test fails. Then you implement the authentication system to make the test pass.
I have written this exact kind of test.
I do not universally do "TDD" though, I probably write a failing test before I've written code less than 25% of the time. I do write what you call "negative" tests, sometimes TDD sometimes not -- I think I don't write "negative" tests TDD any less often than any other kind of test. I may not write "negative" tests enough, but it's not because of TDD (since I don't mostly write tests first!).
So I'm not here being a universal TDD adherent. I guess I'm just being boring and unhelpful "it depends, and I just use my intuition built on years of developing." (I think this kind of intuition generally grows after years of developing on the same platform, so people switching tech all teh time doesn't help. But that' anotehr topic).
I am aware "you just need to have skilled developers using their judgement" isn't particularly helpful. The attraction of TDD is the idea that you don't need that, a novice can follow this process and still achieve reliabile and robust results, efficiently. I don't really agree... but also think it's important to know about and have experienced TDD so you will have it in your toolbox to apply sometimes. I don't think "negative tests" is a particularly useful category for deciding when -- I have definitely done TDD usefully with "negative tests", you can totally write a failing test for something the system shoudln't do... in many circumstances.
I agree that TDD in the hands of an inexperienced developer isn't a substitute for stepping back and thinking about design/architecture, you still need to think about design/architecture. But also that TDD has definitely sometimes helped me think about design/architecture -- sometimes in a really profoundly useful way. The very unhelpful "it depends".
The amount of code paths that even simply code can traverse can become vastly large quickly. Negative testing is often like shooting in the dark.
Negative tests from what I've seen accrue by encountering unexpected results from code in the wild and writing tests to make sure they are handled. They are a result of code being used, not by proactive or predictive design.
If you would design it why wouldn't you test for it? If I were writing authentication and authorization I would test that unauthenticated and unauthorized activities produced the correct errors.
The big TDD misunderstanding is that like any tool it is to be wielded when appropriate, not followed rigidly for every single piece of code you write.
There are some safety standards and practices in engineering that are matters of life and death. If you're working high above the ground, you need two points of safety clipping, and if you're moving you never disconnect both of them. This is non-negotiable. There is never an appropriate time to violate this. If there is no way to complete the job without constant safety harness restraint, the job doesn't get done. (At least in the developed world.)
This is not the category where TDD or rules like "Never change your code without having a red test." fall in.
I adore TDD. It made me a better engineer, and I followed the process rigorously in my first 5-10 years of my career.
But at some point I realized that what that did is teach me how to write modular, extensible, testable code, and now I can write modules like that upfront, and add tests later. And truthfully, I do go faster. Because I also then focus my tests on JUST the important logic, not the intermediate abstractions in between. I write unit tests for the critical functionality and *integration* tests for the whole system to make sure the whole thing works as expected (because the dirty truth of unit tests is you can have 100% test coverage but a non-functional systems due to frameworks and glue code being incorrectly configured.
High test coverage is the worst metric one can aim for. As soon as frameworks and external APIs are involved, it's basically impossible to achieve without bending over backwards and needlessly creating mocks for half your system.
My biggest pet peeve in this context are test coverage requirements for OO languages that feature properties or getters and setters. 100% test coverage would require writing "tests" for those, which basically amounts to testing whether the compiler works - madness!
> "Originally the unit in “unit test” did not refer to the system under test but the test itself. Meaning the test can be executed as one unit and does not depend on other tests to run upfront (see here and here)."
I find the links provided to support for this proposition underwhelming. In any case it matters zero. No-one interprets unit test that way any more. Everyone interprets it the other way round, like defined on Wikipedia:
> "In computer programming, unit testing is a software testing method by which individual units of source code—sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures—are tested to determine whether they are fit for use. Unit tests are typically automated tests written and run by software developers to ensure that a section of an application (known as the "unit") meets its design and behaves as intended. In procedural programming, a unit could be an entire module, but it is more commonly an individual function or procedure. In object-oriented programming, a unit is often an entire interface, such as a class, or an individual method." -- https://en.wikipedia.org/wiki/Unit_testing
It's generally accepted good practice to not have dependencies between unit tests. Dependencies between unit tests are generally accepted as bad practice.
The central idea of a ‘unit’ test is that its failure should implicate one, and only one, part of the codebase. So while isolating tests is a necessary part of this (stale state from other tests would cloud the reasons for failure), I’ve never heard this isolation proposed as the primary feature of a unit test.
I've come to disagree with the idea that a test is for only one part of the code. In the real world if a test breaks you know what broke it: the last change you made. Thus it doesn't matter in the worst case, and in the best case it warns you of a weird situation more isolated tests wouldn't have caught.
There is such a thing as tests that fail randomly of course. Investigate and fix them.
You could also say that no dependencies between unit tests imply no dependencies between modules under test. Hence, the article's claim is incorrect, as their interpretation implies the interpretation they are refuting.
> Instead the pyramid says we should write a lot of unit tests. This leads to an inside out approach testing the structure of the system rather its behaviour.
No it does not. Unit tests should test behavior.
> Only use mocks for truly external systems (e.g. mail service). The database should be part of the tests. Do not stub it.
First, mocks and stubs are not the same thing.
Secondly, stub fragile dependencies in unit tests is a good rule of thumb if you want easy, fast and non-erratic tests.
> Never change your code without having a red test. This is pretty common practice in TDD.
No, In TDD we change our code all the time without a red test, we call it refactoring and we only do that when the test is green.
The biggest misunderstanding about TDD is that you need to do TDD to be good engineer and write good code as promoted by some TDD evangelists. Writing tests is important, writing good tests is important... what is not important is the order you write the tests or if you follow an arcane set of "rules" about how you should write tests as if they are some ancient divine wisdom someone discovered on a stone tablet that's the key to writing good software.
No... forget about writing tests inside out or outside in, top down or bottom up. Forget about the bs about "TDD is actually there to help you design the code so you SHOULD write tests first to expose flaws in your design". It does not matter as long as you write good tests and feel productive while writing code.
Maybe TDD "rules" and rituals are useful when you are a relative beginner and instead of overwhelming them with a ton of information it is nice to have a set of rules and steps you can follow that have a good chance of resulting in a decent final product. But at a certain point you need to what works best for you and ignore the TDD dogma.
I am not anti-TDD, I practice TDD for parts of the code but do it in a way that if a TDD purist saw it they would term it blasphemous. I never use TDD as a design tool, on the contrary I always design the general structure of a class or a package based on intuition and experience without writing a single line of test code. Then when I have the general design and skeleton functionality of the module fleshed out I jump into tests and start doing TDD to start filling out the functionality, addressing edge cases and so on. Coupling design with tests slows me down as it results in a lot of thrashing between tests and real code. I think a large number of developers feel this way and have never really bought into the "TDD is for design" hype. If on the otherhand TDD helps you design, go for it. The point is that TDD should not be treated like religious dogma that it often is.
The biggest advantage of TDD is the dopamine hit it gives you when a test goes from red to green. That hit is so powerful that I continue to do TDD maybe 50% of the time but don't it dogmatically like TDD purists would want people to do it.
I think the thing is, you can absolutely write the code and then write the tests afterwards, but you might end up with something that is hard to test (which is probably an indication your code is not properly modularized).
You should take that as a sign you need to rewrite the code you’ve just written, but a lot of people don’t want to trash all (or part of) their hard work and instead bend the tests so that they deal with the code.
You now have bad code, and you have bad/fragile tests.
I think TDD (when enforced) is mostly meant to prevent people from falling for that.
I don't always write the tests first, I do what seems practical and feasible. Sometimes it feels more natural to write some code to understand the problem I'm solving.
That said - there is a problem with writing tests last: You don't actually know if your tests are going to fail when they should, because you haven't run them on a code-base that isn't fully implemented.
You can deal with this by stashing the implementation changes and running the tests to confirm they fail where expected, but sometimes this is very challenging because you need a partial implementation to reach some of the failed assertions. This requires thoughtfulness and discipline and so it is hard to maintain this consistently across a team. It isn't possible in a code review to confirm if people have done this, and often it isn't practical to do it yourself as part of reviewing code.
TDD is "just" a technique, and being dogmatic about it, is as you say totally pointless. Still I would consider knowing and practicing this technique a "must known" in the industry. When I look back to my 15 years coding :
- the time I lost creating bad designed code and manual testing is huge
- all the companies and projects I have joined always have terrible designed/unmaintainable code, made sometime by 10y+ experienced people and explaining at least how to write proper test and how to design a code that is testable is always the game changer to increase a project velocity and down the defect rate
I will start that I think TDD is very valuable. But reading this article that quickly goes into a numbered list of "rules" is a little disheartening. Already some other commenters are talking about more nuanced views of when to test. I think the real tragedy is that I see a lot of engineers still not writing tests. So when speaking to the broader audience, you have to be really gung-ho about it to get people to do what I think is a very necessary step. People should err on the side of writing the test first, both to get most people in the right mindset, and also to avoid the broken window syndrome of a lot of missing tests and no one bothering to actually start doing what should be obvious: most code should have at least some level of coverage. Most in fact. If there are "reasons", sure, if you're going to pull the senior engineer move and skip the test, then you better be able to back it up clearly. If not, write the damn test. If not you, someone will appreciate it later.
This statement made me think: "The database should be part of the tests. Do not stub it".
Databases are both 1) complex piece of software and 2) hard to stub.
5-10 years ago having a database for testing was expensive.
Today you don't need to pay licenses for most databases, and you can spin a disposable one in a container. Easier than mocking the DB itself.
If, for other reasons, I follow the Hexagonal architecture, replacing the PostgresProjectionsRepository with an MemoryProjectionsRepository is both trivial and easier. Replacing an S3LogStore with a StdOutLogStore just as easy and trivial.
But, if, God forbid, I'm stuck with a tangled mess of Rails ActiveRecord Models that break every SOLID principle with both feet, are highly coupled to the database and often even arbitrarily stick business logic in the DB or in the models, then certainly: it is hard.
And while this sounds like a rant on AR (it is!), ActiveRecord has its positive trade-offs, or, often, simply is there. Pragmatism dictates that sometimes the database is hard or impossible to stub and you're far better off just giving up and treating the entire directory of models+the-fully-seeded-database as a single unit instead.
But, what the OP of the article overlooks entirely - and what many haters of TDD miss completely: the tests are yelling important information at us: the design is a mess, there is too much coupling, we lack abstractions, there's too much going on, we are building God-classes and so on.
But, again, sometimes pragmatism dictates we ignore this and go for the bad design anyway. As long as we pro-actively chose to go for the Bad Design (rather than unknowingly grow into it) this is fine, IMO.
> Not as simple if you are using any "advanced" PG feature.
If those "advanced" features leak into your domain, you have tight coupling, dependencies and poor testability (the latter is screaming that you have the first problems above all).
If anything, such advanced features are best tucked away behind abstractions.
This "projection" can use postgis, may store data in json columns, might have advanced materialized views adjoined on read, and so on. For the adapter, (edit: for the user of the adapter) it matters not.
Which has only benefits. And one tradeoff: it requires carefull design and thought, which requires information that you often lack at time of designing.
If you're writing software for an audience of "one" (one organization or such) Is it really that bad to have it be tightly coupled? You're likely writing something so specific that no one else will be using it anyways
If you write software that no-one uses I guess its fine to write it using brainfuck on a atari, I guess.
But I consider myself a user too. I consider myself-in-ten-years a user too. I'm sometimes still maintaining some crap that I wrote 20 years ago and boy I wished I didn't cut that many corners back then.
Tight coupling brings a large range of problems, loosely coupling a large range of benefits. The only two major downsides of loosely coupled code that I have encountered, are when the coupling is done at the wrong place: poorly abstracted code is worse than not-abstracted code, often.
The second downside that in order to properly (un)couple code, you need knowledge that you often lack at the point of writing (often resulting in downside #1) and that gaining the knowledge can significantly stall a project (aka: we cannot start building because we haven't found the perfect model yet).
Other than that: loosely coupled code is much more fun to work with, for one, which makes that I like working on some projects, bu really dread working on others. Having fun, alone, is enough benefit for me. Even in my private one-off-tools (which then turn out to not be one-off but are dragged along for literally decades, sigh).
From a purist's point a of view maybe. But as soon as it's a non-trivial DB with constraints/referential integrity/triggers/etc. it quickly become impossible to test those without an actual DB.
> But as soon as it's a non-trivial DB with constraints/referential integrity/triggers/etc
Which is probably a reason not to use them? For me it clearly is. Everything is a trade-off, and "not being able to test it" such a big downside that I'd rather not use it at all. Amongst all the other downsides that tight coupling comes with, for me, it almost always balances towards the "just avoid it".
For example: I've worked on projects that were highly coupled to Oracle and MySQL databases, where large pieces of businesslogic (arbitrarily) were scattered over the database. So when Oracle went directions (and for prices) that my customer could not warrant, and when MySQL simply no-longer could handle the performance, we had to rewrite a lot (in postgres, mostly). Often pieces from scratch entirely. One downside of tight coupling is "lock-in". There are many more. So in that project we avoided the same mistake that brought us there: we avoided most fancy features and kept all business logic in the business-logic-layer and out of the DB. We tucked away fancy PG features behind simple and clean "facade patterns".
By my personal experience to make TDD work (where you make many small changes, run the tests, make more changes - step by step), tests need to run FAST - as in ideally less than 1 second.
Even if it is easy to spin up a container, seed a database inside that container and drop and recreate that database after every single test (so that tests remain a unit and can't influence each other) - it tends to take a few seconds, which instantly makes TDD tedious to do. Thus you start making more changes at once, run the tests less, and drift away from doing TDD...
The most typical solution is to run databases in-memory rather than in a container.
My personal approach though is to make my code more modular, so that only the "database accessor" class needs to mock the actual database, while the "query builder" needs to mock only that database accessor (much easier) - and the database class needs to only mock the query builder (even easier) - and the actual application code needs to only mock the database class (super easy).
> Even if it is easy to spin up a container, seed a database inside that container and drop and recreate that database after every single test
You would create a container image with everything already prepared how it should be, and you destroy it after every run - that's already a lot faster than what you're proposing.
It is not hard to isolate and stub a database the problem is you will not catch a lot of errors, e.g. writing a 40 char string into a VCHAR(10), DateTime conversations, ...
In Ruby on Rails it is quite common to use a real database in the unit tests. Even before containers came out and it is still fast (yes, less than 1 second).
In rails you use the same database instance for all the tests, so you do not have spin up a database instance for every case. Instead you use a database cleaner.
You will, though, end up fighting the system in some ways. CI/CD systems, unit test frameworks, configuration data approaches, etc, aren't equipped to (easily) spin up databases and connect to them. At least not in the more rapid cycle early end of development, versus slower cycling integration tests.
It can still be very expensive after a while. I assume TDD practitioners also practice pruning their test suites to keep them from snowballing indefinitely. If you don't, then eventually running hundreds of thousands of tests against a real database is painful both in terms of waiting time and resource usage.
If you've got hundreds of thousands of tests you've got a bigger problem than a slow test suite. Solution is to go to a modular or service architecture to break the application into bits that can be reasonably tested.
That's certainly true, but in a typical web application it's quite common that the specific behaviour of a given database isn't interesting enough to warrant testing so rigorously. In those cases, integrating with the database is perhaps a less sensible choice economically.
The GUI example:
1) Make the webdriver based test click a button which does not exists yet. The test will fail, you know that there is no other button which by chance matches this selector.
2) Add the button to the app, the test turns green,
3) you Continue to extend the test to click another botton, etc till it fails again
The Game example:
e2e tests for games probably do not make much sense (I've no experience here). So you will have components/unit tests. Those have a defined interface and behaviour. You want to change this, you change the test first like in any other project.
It sounds like you are testing the wrong things on the wrong level.
The test checking the correct color for a 'save button' in specific configuration, would most likely be a unit test for the "SaveButtonComponent" which simply checks that the correct attributes are set or passed on to a library.
When you insist on having some coverage for the colors in the layer of e2e tests there are numerous tricks. Most of which you'd need to do only once and certainly not for every permutation of state. Like making a screenshot and diffing that with a fixture then failing on a too-big-delta. Or like running an accessbility-report over the app and failing if the values for things like "readability" drop below a threshold. And so on.
You mention XAML, and I have little experience with windows GUI dev, but many setups, frameworks etc employ some adapter or layered architecture. Where the layer that drives the actual GUI library is but an adapter. So that the tests can swap out the actual GUI drawing layer for test-version theref, and do things like "confirmationModal.confirm()" without going through the entire UI. Or where tests can simply call a "confirmatioModal.buttons.map (b)=> b.colors.backGround" to get to the colors that are otherwise passed on to the actual GUI lib.
This is why testing-pyramid is so important: to test on the correct level.
I can write the `SaveButtonTest` before writing the `SaveButton`. I can write a `VisualComparisonTest.itKitchenSinkPageScreenshotMatchesFixture()` before having the KitchenSink view. I can write an AccessibilityReportThreshold.save_game_modal_meets_a11y_reqs() before having any application code to run the report on (it will fail with an error, which is expected and good).
The fact that you condescendingly call it "TDD religion", however, tells me, I'm not going to convince of the value of tests at all, let alone the difference in value from the moment of writing the test.
When I did TDD, we'd start by writing a test. The tests wouldn't compile (Java) or link (C++) because the function being tested didn't exist yet. Then we'd write the (empty) function. The test would fail, because the function didn't exist. (It's very important that you see the test fail! You need to know that the test actually can detect if the function doesn't work.)
Then we'd implement the function, and re-run the test.
>Do not isolate code when you test it. If you do so, the tests become fragile and do not help you in case you refactor the software.
I don't agree with this bit. This is how you make a fragile test suites that are difficult to refactor. Why do we go out of our way to hide information in our code? Answer because code that spaghettis in an out of a lot of other code means small changes cause unexpected regressions, and the changes to fix that regression causes other regressions. Suites that don't isolate code have exactly this problem. They are like a scavenger hunt of assumptions and inevitably lead to people fixing the tests instead of the code. In addition because you have introduced a network effect a small change will likely invalidate a huge number of tests. Meaning you have to update them all.
Integration suites have value, but I use them with discretion because I know that they are a challenge to maintain. Tight unit tests that test small pieces of logic thoroughly and make ample use of mocks to ensure deterministic outcomes are a must. These tests are highly durable. The code can change all around it but that function will work as advertised until someone changes it. Then they can update its test suite and you're back in the golden zone. In my experience if you build a complex structure out of a lot of quality components that complex structure will tend to be high quality itself. My pattern usually pushes the complexity into the units and I leave "orchestration" units that tie everything together untested but intentionally simple.
I know the code coverage folks will chime for sure, but I don't aim for 100% code coverage. I aim for a balanced test suite that ensures the best possible outcome for the minimum investment with the least amount of hassle when introducing new changes. Trying to test everything never seems to get me there.
In reality it depends if you should isolate the code or not, are you writing a unit test or an integration test?
If you wrote an algorithmic function crucial to your app, isolate that MF and write a unit test. If you're testing a transformer or API response, you don't need to isolate each layer and run/write your integration tests at this level. If you can't isolate your code for unittest and you're writing OOP your architecture probably needs to be rethought.
That said my goal when writing something new is to have all the critical/core functionality covered on the backend, with a test case for edge cases. The biggest goal is I don't know who will be working on it next, and if they miss-understand what a function is doing because they are just trying to fix a "bug", the test case should catch it. That's why it's import in TTD to have one person write the tests and someone else write the code. More so if your team , is a clusterfuck particularly some of the senior devs are hacks.
> Answer because code that spaghettis in an out of a lot of other code means small changes cause unexpected regressions
That's not a fragile test suite, that's a test suite working well: you want tests to start failing if you introduced a regression to the overall program behaviour.
> the changes to fix that regression causes other regressions
It sounds like what you're really getting at is something different from test fragility: specifically, if a test fails, how easy is it to identify where in the code a fix needs to be applied. That is also an important consideration when writing tests but it's completely separate from test fragility.
> Integration suites have value, but I use them with discretion because I know that they are a challenge to maintain.
This is the opposite of how the test pyramid is supposed to work. Integration tests are supposed to be less fragile because they don't rely on specific implementation details or mocks: they're mimicking a real user or a consumer of your API. If you are getting false positives from your integration tests a lot, then either the tests are written poorly, or you're making too many breaking changes to your library/product/whatever.
> Tight unit tests that test small pieces of logic thoroughly and make ample use of mocks to ensure deterministic outcomes are a must. The code can change all around it but that function will work as advertised until someone changes it. Then they can update its test suite and you're back in the golden zone.
That's completely useless as a test then. Of course if you change code around it the test won't break. The whoe point of a test is that it doesn't break even for (some) changes to the code being tested, and if you always have to update the unit test when you change the code then it is providing no vlaue and giving you a false sense of security that all your tests are green.
Unit tests are great when you have a particular irreducable complexity in a piece of code, or more generally they are good when there is some constraint that should be locally upheld that is likely to be more permanent than the specific implementation.
> Integration tests are supposed to be less fragile because they don't rely on specific implementation details or mocks
I think you misunderstand what I mean by fragility. By fragility I mean a few things.
First, it's sensitive to conditions other than the logic under test. Imagine I have an integration test that round trips to a database. What if there is a momentary TCP/IP burp and the connection drops? The test will fail. Nothing with the wrong the test; nothing wrong with the logic, yet it still fails. Then under different circumstances same test/same logic passes. What do now? This is a fragile test.
Second, fragility can happen when the outcome of a test is not deterministic generally because the function under test is not referentially transparent. Good functions are referentially transparent by definition. Mocks can turn non-RF functions into RF functions because you can exercise complete control over the mock's return value. Integration suites struggle to embody this property. If you are careful with setup data you can get around this but its tedious. If you are querying external systems then good luck you are at the mercy of whatever data happens to be there. Same deal as number 1. Sometimes the same test will pass or fail without changes to the logic. Fragile.
Third, Fragility can mean that small changes have cascading impact. If I write an integration suite that tests a RESTful endpoint and I refactor that endpoint's return type I run the risk of invalidating huge swathes of tests. Here I would be judicious. If the endpoint was critical and didn't churn much I'd probably invest in an integration suite.
I cannot fathom a situation where an integration test that requires setup data, database connectivity, possibly network connectivity, specific sequencing, etc. Is going to be less fragile than a test that does nothing but test the specific logic conditions within a function and has no other dependencies. Your style is not invalid I just wouldn't adopt it myself.
>This is the opposite of how the test pyramid is supposed to work.
Where I'm from we'd call this having a case of the "sposedas". Nothing is "supposed" to work one way or another. There is no universal truth here despite what conference speakers would have you believe. There is only the trade-offs implied between a specific course of action vs. another.
I'm a bit of a cowboy coder and completely skip all but the very top of the testing pyramid. I strictly run E2E tests and albeit slower than other tests, it tests a much broader spectrum than just a single function. For instance I change the expiration date of a user's subscription, run expire logic, test of this user has received the expected expiration email, this automatically tests the Amazon SES integration, the email template parser and everything in between. Clicking on the expire email call to action button I expect to land on a certain page with elements "missing" and others added, which validates changes in the user's roles. Let's say I swap Amazon SES for Postmark, no new tests have to be written. In my experience this approach works really well for a SaaS like product. It also looks satisfying to see the test suite blast through the site clicking and typing like there is no tomorrow.
This is contrary to most testing tutorials and documentation I've read.
I get frustrated and forego a lot of testing because the idea of mocking out basically the whole system in order to unit test is insane to me. If function A calls function B, you've got to mock function B out. If you refactor function B, you better update the mock as well. Writing an app twice hardly seems like a good use of time.
> If function A calls function B, you've got to mock function B out. If you refactor function B, you better update the mock as well
This is why isolation and encapsulation are important.
Your tests were telling you that A has intricate dependencies on B. If the tests for A start failing when you refactor B, they were (too) tightly coupled. The tests are telling you this (whether or not you act on this is another discussion).
And when the tests for A keep passing even when you broke B -false positives- you should see at least one integration (or even e2e) test failing. If not, the tests are showing a hole in the coverage.
If function A does not much more than calling B, maybe testing A is not strictly necessary.
If A does something important/complex/critical/etc maybe the important/complex/critical/etc part can be refactored into a dedicated function A1 without external dependency and then tested on its own? That way, the tests focuses on where the money is and gets easier to work with at the same time.
Now you've added an additional, untested point of failure (the glue code), and you've made the interaction between A, A1, and B more difficult to understanding and more error prone.
Exactly, fully agree. I don't understand why people advocate this insane over-abstraction and over-modularization.
> more difficult to understanding
I think this is the most important one. I hate those code bases where everything is split up into these tiny tiny units and you have no idea what the whole thing does. Reminds me of this:
Please accept that I made no argument for a-priory abstraction and modularization. Personally, I find any abstraction problematic, where I cannot understand from the import statements of the current code which other code will be executed.
But there are the times where abstractions and modularization can make our lives so much easier. Who would want to dismiss the abstraction of the filesystem and work rather directly on the disc blocks?
All I am advocating here is, that everything is a tradeoff, and we need to weight the costs of mocking some stuff against refactoring the code in a way that we can live without such a mock. And I have personally been at the point where refactoring some code into a separate function made it easier to unit test this specific highly complex functionality.
Fair point, thank you. Used in moderation, the technique you explained above can be worthwhile. I just think it is overused, generally, and the costs are rarely considered. Sorry that my above comment was worded a bit harshly.
But you do have e2e, acceptance or integration tests, don't you? Whenever I see a need to prove, or documentation "how the hole thing does it", that is the moment to write an integration test: isolate "the whole thing" as much as possible, make it as small as possible, then fire some things at it, and see if the outcome works. It's not hard (but not easy either). The hardest point for those tests is to keep the setup and teardown sane.
I'm lamenting the complexity of those over-abstracted solutions. Your proposed solution (adding additional integration tests) is adding even more complexity onto the pile. This makes the entire system (of which those integration tests are a part of, when it comes to understanding) even more difficult to understand.
And I'm trying to explain that complexity is not cause by over-abstractions, but instead, solved by it.
Yes, certainly the integration tests are part of the codebase (allthough some people argue that tests are not part of it, I strongly disagree). But they are abstracted away.
Neatly in their own directories. Using their own class and dependency hierarchies, having the proper dependecy-directions (the UI tests depend on the UI-layer, but the UI-layer does not depend on the tests). And so on.
If you truly believe that the solution to hard-to-understand-complexity is to have less complexity, I'm afraid you'll be dissapointed: all projects will compe with ever growing essential complexity. Even if you avoid all accidental complexity, you'll have to deal with the essential complexity. And our main tool to do that is abstractions. From an OS abstracting the hardware away, to database-servers to libraries for that database-service and so on. The solution may be to say: I don't want a database (and can forego the db-lib connecting to the db-servers, which run on an OS) but if the database is essential complexity: abstract it away.
That's fair - I'm not against abstraction in general. I agree that we wouldn't be able to handle the essential complexity of software without it.
I'm just saying that each abstraction also has a cost, and makes understanding the whole thing more difficult. Abstraction is, in my understanding, a necessary but costly tool.
A lot of my thoughts are summarized well by Brian Will in this video:
And I have absolutely read that evil lizard people invented COVID to inject us with microchips. Doesn't make it true.
What I mean with this, is that there'll be many opinions and feelings posed as a fact. About testing too.
Which is why I dislike the OP's article, as I pointed out elsewkere in the comments: the things he poses as "rules" (earlier on he calls them tips, which is better imo) aren't rules: they are highly controversial opinions. As would the statement you read be.
Daily TDD, is more of an "art" than a science. It requires experience, and fingerspitzengefühl.
Though, the science on TDD and BDD is strong and convincing, summarized in "Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations"[1]: TDD and BDD significantly helps to build better software.
Glue code should really only be tested in your integration or E2E tests, you have those, don't you?
Now, if A would look like below, we might accept not testing it at all, as the initial and upkeep costs to mock B might be higher then the potential cost of the one-time failure in the internal logic of A.
That way leads to madness, and I don't follow that style/philosophy of mocking components.
Only mock the things you can't easily set up in your unit tests to get them to work, like database connections. For spring-based components, try to use the real components where possible, but where needed use mock components that do enough to get the system working.
That does mean that if you are testing function A you are also indirectly testing function B, but that is fine.
Clickbait title I would say. The 'big' issue with TDD is whether it's worth the time needed to do it. If it was obviously always worth the time, then I think we wouldn't even talk about it. (Do we worry about whether breathing is worth it? Nope, we never talk about that because the answer is apparent.) Whether it's worth it depends on the context.
I sitll don't understand whether TDD applies outside of some rather specific domains.
In most cases, apart from utility methods, I don't actually know how something will work before I have at least partially implemented it. We have implemented a payments change recently (Paypal to Recurly) and not only are the two very different but we have added additional functionality that we could have done with Paypal but didn't. How do you TDD that?
Even writing a simple test like "Go to this page and click <Order>" doesn't really work since when you come to it, you realise you can't just order but you also need to go to the create account page first etc.
Counterquestion: how do you know, at each release, that the paypal-stuff still works?
If the answer is "I test it by making a purchase at each release", the obvious question is: why not automate this? That is one benefit of TDD.
So why not start of with that automation?
In your specific case, I would write a simple test first, that, says "on the page of the orange Y-combinator swag, order a shirt and receive a timely confirmation.".
Which is subtly, yet importantly different from "go to X and click order". On the level of such acceptence tests, trying to keep away from implementation details like "pages", "buttons" and so on, helps with solving your problem too: you write down the intent&outcome, rather than the implementation. Focus on the business value. Solve that first. And fixate the outcome for all future: regardless of the PSP, pages, exact values (timely vs within 1 sec), actual confirmation-channels and whatnot, the business value is "green" when it works, and "red" when it doesn't.
Then I start implementing it. Maybe just add one of these <iframe> things that Paypal offers, even. I don't really care: the quest thing to make it work. as long as it gets me to have that acceptance test green with least work, I'm happy.
Happy, but certainly not done. Because the most overlooked part of TDD is the Refactoring. Red-Green-Refactor. Once it is green, start refactoring. Replace the iframe with some library. Go a step further and replace that with my own paypal-API-client. Connect that to my event-stream, or not.
I think the parent was objecting to writing tests first, not to writing them at all.
My experience is the same as the parent. Design, outside of a very rough and very high level upfront phase, is very much intertwined with the act of coding. This can be writing unit tests, but elevating them specifically to be the root of all design decisions is imho not necessary nor productive.
Lately I have been writing a lot of python code (nothing fancy, mostly CRUD stuff) and I have been taking advantage of a persistently running python instance and sending code snippets to it as a quick way to explore the data and test code from my IDE. Doing the same via tests would both be slower and less productive as I would end up deleting 90% of them anyway.
> I have been taking advantage of a persistently running python instance and sending code snippets to it as a quick way to explore the data and ... deleting 90% of them anyway
How is that different from TDD other than the name "test" vs "snippet"? Tests are excellent tools to explore an API or a dataset. Tests are perfect for quick automation.
And, most importantly: tests, like any code, should be constantly and consistently get refactored. So throwing 90% away is expected and perfectly fine. Just like I expect you'll write 110x as much production code than what ends in production in the end: it's quite common for me to write 200+ lines over hours, only to end up with a five-line diff in the final commit.
> but it kind of waters down the meaning of tests.
From the C2Wiki:
> UnitTests are programs written to run in batches and test classes. Each typically sends a class a fixed message and verifies it returns the predicted answer.
Which already is very "watery" afaik. And a snippet that verifies something does X, fits this perfectly. Your approach may "violate" many other testing- best-practices and even "rules", such as "tests should verify only one thing" or "each branch should have one test" or "tests may only go through public interfaces" and whatnot, but I consider it TDD, as long as the scripts and snippets are written before any code. And if written afterwards, it counts as automated testing, though with a broad definition of "automated, I guess.
Well, I expect the snippets to run and possibly uncover bugs, so the underlying code is already written (or is written at the same time). I run them manually but I keep a log and I periodically rerun them. So I guess it is not TDD but semi-automated testing (or better, an interactive way to end up with automated tests).
A lot of effective functional testing is getting fixtures ready so that your test can reliably hit the important part of the system under test. For external integrations this becomes difficult. Presumably your test uses the api sandbox, so hopefully it gives reproducible results. If not, then you need to test manually and focus on unit testing to assure the software.
The issue with TDD as originally described by Kent Beck (there are as many variations as there are ... ) is the belief that a good enough design emerges from the process.
While I agree that a good architecture cannot be derived from (unit-)tests alone, tests do indeed help a lot with good design.
As soon as you need elaborate setup in order to test a single piece of functionality (or function), you will start to rethink your design. In such cases there's too much coupling, too many dependencies, or too many responsibilities within the function under test.
Seeing tests as separate from implementation and design is a much bigger issue IMHO. TDD only provides value as opposed to overhead, if you understand it as an integral part of the design and implementation process and use them with that in mind.
Red-green-refactoring doesn't work if you are dealing with existing software, for example (unless there's a well-documented and -understood defect).
It's not TDD (emphasis of the "driven"-part) if tests are only used to confirm already implemented behaviour. I found that especially in larger organisations there's all too often hard requirements for unit tests and high test coverage. This doesn't work as intended, though, if the team doesn't "live" by the TDD concept and unit tests are only there to check some box on a "best practices" list.
IMO it's perfectly fine to only use TDD for some components of an application and rely on integration- and acceptance tests only for others. Glue code and cross-cutting components don't benefit from TDD. An external API or component that you don't control can also prevent meaningful unit tests and force certain design decisions, rendering TDD an exercise in writing (and maintaining) mocks, etc.
What I tried to express is that TDD is one tool of many and not always the best choice; especially so when it is tacked onto an existing process that doesn't use it as intended or cannot benefit from it.
I know it's good commenting style on HN to avoid quick one-liners, but TDD is one of those topics that we've kicked around forever. Perhaps a one-liner would help clarify things.
TDD is a coding practice to help the programmer understand that their mental model of the code does not reflect the actual code itself.
That's it. There are some follow-ons, of course, like if your mental model is exactly in-line with the code then logical errors are unlikely (although business errors can be quite common!). Another follow-on is that TDD is much more about design than testing. Once you realize that it's a practice or habit that's all about sending information from the code back to the mind of the coder, all of that other stuff sorts itself out.
> The test that hits the endpoint is commonly referred to as an “integration test”, and the other test is commonly referred to as a “unit test”.
> Kent Beck (the originator of test-driven development) defines a unit test as “a test that runs in isolation from other tests”. This is very different from the definition of a unit test as “a test that tests a class/method in isolation from other classes/methods”. The test that hits the endpoint satisfies the original definition.
> But it doesn’t really matter if you want to call a given test an “integration” test or a “unit” test. The point is the test fails when something breaks and passes when something is improved. If it does the opposite, it’s not a good test.
The sadly fact is that lots of people cannot tell the differences between Unit Tests, Functional Tests and Integration Tests even after in the field for over a decade.
I'm going to give it a crack. Caveat that I have not read the TDD scriptures.
Unit test: tests a bit of code not much larger than a function, to make sure that the function runs properly and gives the expected outputs.
Functional tests: tests some code not much bigger than a module (source file) to make sure it does the right things, from a business logic standpoint.
Integration tests: tests some systems not much bigger than one subsystem making an API call to another system, to make sure that the systems can talk to each other properly.
My problem with this definition is what "tests a bit of code" means.
For example, in a codebase I work on we use a custom sort function, and memory manager. We have tests for both (as you would expect).
However, both sorting and allocation are fairly large pieces of code. Does that mean any test of any function which calls sort or allocates memory isn't a unit test, as it's (indirectly) using something much bigger than a single function? But "stubbing" out either sort, or memory allocation, seems stupid (the stub would just have to do the same thing).
1) focus on business logic: is your code "about" sorting and allocation, or could you treat those as built-ins?
2) It's OK if a function is calling other functions. The point is that it's a business-logic-level unit of functionality, which is smaller than a module or so.
I don't take testing advice from anyone who uses generic terms like "test coverage". Test coverage is a complex subject, but without a basic understanding, tests can create a false sense of security. Saying something like "100% coverage" means almost nothing to anyone that understands testing.
Reading this article and reading the comments I am reminded that writing automated testing is still an art and less of a science.
I highly value testing and it does eat up project time in weird ways. I believe as I get more experience it won't eat as much time. My hope is for productivity gains in the future.
I hope in the future most documentation is written with a testing focus. Maybe then I would not have to understand as much as I do now before finishing testing.
Every proof or point, including the linked articles are at the least underwhelming and at most controversial.
They are certainly not "rules". Author calls them "tips" before, which is more apt. But as the "tips/rules" go, they lack all nuance and background.
With TDD, as with anything "it depends". All of the rules (tips) proposed, I consider to be harmful and plain wrong in many situations (though not always and everywhere!).
> The database should be part of the tests. Do not stub it.
I've never seen this work well. The tests either rely on a shared/remote database or on a local database instance that has to be set up/torn down between tests. Both approaches cause unending headaches in terms of false positives, onboarding new developers, dependency problems and maintenance.
There are no "magic bullet" in software development.
TDD is my go-to approach if the situation allows it. But it is not always feasible. Sometimes you have to write a bit of code first to find your solution path.
Ppl should stop with their dogmatism and start thinking about their approach by themselves.
I didn't quite understand why it was a problem for start with the integration test. okay, some things need to be mocked while writing the integration test, but I think the plus is more than the minus.
Integration tests are top-down and focus on checking that the system under test behaves as expected. They usually require elaborate setup and configuration and can - depending on the system - take considerable effort to start and run.
The idea behind TDD focuses on a bottom-up approach instead - if all the parts behave as expected, it is reasonable to assume that the entire system will.
TDD was also intended as a tool to help identify problems with design early (e.g. too many dependencies, tight coupling, too many responsibilities, reduce/eliminate side effects, etc.), allowing for rapid feedback (tests should execute quickly and require little to no setup).
Both approaches have different goals in mind and serve a different purpose. Ideally you always have both - run unit tests after each code change (i.e. often), run integration tests after each added or changed behaviour (e.g. daily or as part of an acceptance test process).
Starting with integration tests much work up-front and adding unit tests after the fact can introduce considerable overhead for questionable gains.
It’s good to start with. Integration tests are great for finding out that ‘something’ is broken. Unit tests are better at finding what exactly is broken.
Thank you for your comment. you're right, it gets harder to find the source of the error, which makes it harder to make changes. I will pay attention to this.
A "unit test" is a test that covers a single unit of code, as opposed to an "integration test" that that covers several integrated units. This is the common definition.
This other definition of "a test that doesn't depend on other tests" seems of little use, since no tests, even in something like a UI regression, should depend on other tests.
What is a “unit of code” though. Most units of code call (“integrate”?) other units of code in their implementation. Libraries, language runtime, runtime environment (OS), other functional modules of the application, utility functions, … What is the defining property where it becomes an integration test?
That is more subjective and depends on the team, but for any particular definition, there will typically be a clear distinction (for that particular team/code base) between the two.
A typical distinction is that a Unit Test only covers a single function/class method, or tightly coupled group of classes, and almost always no IO (network calls, disk writes). Conversely, integration tests typically cover the public API of an entire publically released library, network service, or CLI executable, usually run in the targeted deployment.
For example, for a network service, an integration test may do a PUT and then a GET and check that the resource is returned. It would be running against a full copy of the service which is connected to a real database, but probably one populated with mock data. The same service would typically have unit tests checking something like "the handler for PUT generates an SQL query string that contains the fields defined in the PUT body" (assuming this is a code base without an ORM). Another unit test would check that the same method returns an error if a field is not supported. Another one may check that it doesn't create unsanitized SQL, and so on.
My question is, what are the distinguishing properties of unit tests vs. integration tests that motivate treating them differently, and in what ways do they have to be treated differently? In broader terms, why does that distinction exist and how is it relevant?
The biggest difference is typically that unit tests don't require a test fixture of any kind, and are expected to run wherever the compiler runs. You typically expect a huge battery of unit tests to run in seconds, and they are normally expected to run before every commit, even for every build on a dev machine. A new contributor is expected to be able to do something close to "apt get langauge-toolchain && git clone $repo && make build && make unit" and see that all unit tests pass. However, they are also expected to sometimes miss obvious bugs, like "the queries we generate are not valid with the new MySQL flags we set". They often need to be changed even for refactorings, as they typically cover implementation details, not just the public API.
In contrast, integration tests typically require a test fixture of some kind and are thus expected to require more complex setup instructions, perhaps even access to shared resources (like a DB server with a large batch of test data that can't realistically be replicated, or a testing instance of a real microservice you integrate with). By virtue of often requiring IO (network operations, DB operations etc), they are often considerably slower, so that they become impractical to run at every build. The upside is that they catch errors that were not obvious from the code, and they usually give you a high degree of assurance that a user of your service will at least not encounter silly bugs. A break in the integration tests after a redactor is also often a potential sign you've broken backwards compatibility of your API.
The author is talking about what the "unit" in unit test means. The conventional definition is that it refers to the granular unit of your application being tested. He instead argues it means the test itself is an independent unit within the rest of your test suite.
And then you have shocked people when they find out that 100% test coverage doesn't mean that you really have a bug-free codebase.