About the same time the 500-mile email problem happened (mid 1990s), I had a difficult to understand issue with my office PC. Every morning, I'd come in, slide my hard drive sled in, and turn the computer on. We had 128 Kbps ISDN internet at the office and I had the same at home, but that was too slow to do much work. So I'd take the drive home so I could work at night, especially in the winter when the office was too cold at night.
Suddenly one winter morning, the PC wouldn't boot. I had to run to a meeting. When I got back, I turned the PC off and on again and everything was fine. The next morning, the same thing happened. The third day, I didn't have a meeting. I turned it off and back on, still no boot. I'd gotten in late, so I just turned it off and took an early lunch. When I got back, it still wouldn't boot. But I had a meeting, so I ran to that, leaving the computer on. When I got back, it booted fine.
The next morning, same thing. I decided to look inside, not having any idea what might cause such symptoms. As I took the shell off, a tiny mouse came out, jump off my desk, and ran across my lap before jumping on the floor and scurrying out of sight. From inside the computer came the smell of mouse urine. Apparently he'd been crawling in through the open drive bay to keep warm every night, and urinating while he was in there. Once the computer had been on for a while, the heat and airflow would dry it out enough to eliminate whatever electrical short was keeping it from booting. I went to the store and bought an empty drive sled to put in the drive bay whenever I took my drive out, and the problem never came back. I felt lucky that the liquid didn't cause permanent damage.
Someone posted a similar story on one of the other times the 500 mile email was posted - where a car would fail to start if the owner bought strawberry ice-cream from the store, but would work if they have vanilla. I love the processes that go into finding the actual issue (regardless of if the ice cream story is true!): https://www.snopes.com/fact-check/cone-of-silence/
OT, but I find this a perfect example why "data-driven design" is an empty term if you don't know what it's being designed for - i.e. what metrics are used to evaluate it in the end.
Both, optimizing for ease of shopping and optimizing for stringing the customer along as long as possible rely on the same purchase data - they just use diametrically opposite metrics for evaluation.
Mice can fit through tiny holes. An old rule says that if a pencil can get through - a mouse will get through.
Some mice even fly.
I once had a bat clinging on my good old CAT cable. So even leaving windows open at night might affect bandwidth...
Another classic is the "Frog on Keyboard error". Software developers have to be prepared for everything...
There used to be an option called "Cat guard" built into several historical (BBS ) software. On (and cannot remember the name) one software that did synchronization with other networks (e.g., FIDO, uunet) it was considered a major feature.
Primary purpose was to lock the keyboard so when the cat walked all over it, it would not disconnect.
Yes and no. There's a group called "true bugs" (https://en.wikipedia.org/wiki/Hemiptera as linked above). "Bug" in the common sense doesn't have a precise definition (small arthropod that may or may not be a pest to humans is about as precise as I feel I can get), but there _is_ a scientific definition of "true bug".
He doesn't give the chairman due credit, IMHO. The chairman collected information to help solve the problem AND it actually was the information needed. Without it, the author might look for "randomly unreachable servers" for a long time.
It's almost raw data -- exactly what you would wish for. By lecturing people that "email does not work that way", next time you either get no data at all because people don't even try, or no data because people hide it thinking email doesn't work that way, or a misguided conclusion when a layman tries to make a better guess at the cause of the problem.
Absolutely. It's one of my all time favourites stories and this is pretty much the reason why. I wish my users gave me such specific steps to reproduce!
What's my recent annoyance is that users will describe their problem in great detail if they are talking to LLM, yet same people make just as shit support tickets as before
(1) disguise as an LLM to have them give better problem descriptions to you
(2) provide an LLM for your users that lets you read their chat to understand their problem
and:
(3) try to understand why they are communicating differently to an LLM. Immediate replies? Different feelings knowing they don't talk to a human? Genuinely better help? Not getting treated as stupid?
All or none of these may be true, but if it's consistent behaviour then there is a reason for it.
your dream is coming true - most SMBs are quickly moving to have LLMs as their Level 1 support anyway. Makes sense unf, too many people fail at writing the proper ticket.
I do wonder if they already had a feeling it was not supposed to work that way hence the info gathering. This is one of my favorite all time IT stories because the client was right, and the engineer was left almost going crazy.
When I was a Junior I asked an honest question to the senior I was working with at the time, great dude, I basically asked him because everyone joked about the "works on my machine" crowd, so I said, so what the heck do I do if it only works on my machine? He said you have to figure out what's different. It sounds obvious or simple, but if you go with that mindset, when someone's stuck in the "it works on my machine idk why" sure enough I ask "what is DIFFERENT from your machine and this one?" and it almost always leads to the right answers. It triggers something in our brains. I usually follow up whats different with "what was the last change?" in the case of a production issue.
These kind of posts are why I check HN pretty much every day for 15+ yrs now. Hard to believe I've missed this one. Glad I caught it this time! This posts reminds me to stay humble and avoid jumping to conclusions without analysis.
You guys beat me to it - I was working on the list!
Btw for those wondering about reposts: reposts on HN are just fine after a year or so (https://news.ycombinator.com/newsfaq.html), and reposts of classics every now and then are good because it's important for new users to learn the classics!
> Btw for those wondering about reposts: reposts on HN are just fine after a year or so (https://news.ycombinator.com/newsfaq.html), and reposts of classics every now and then are good because it's important for new users to learn the classics!
It would be nice if HN would simply "float" to the front page the classics, a year or two after their submission. That would avoid duplication (specially of comments!), and allow people in the lucky 10,000 group know about it.
This gets posted just infrequently enough that I remember that I've read it before but forget why those emails weren't delivered, so I read it every time :-)
Every time see this story I think "oh, this is the story about the packet TTLs being set stupidly low or something but you wouldn't be able to narrow that exactly to 500 miles" and have to click and learn again the the first time it's about the connection timeout being set stupidly low.
Last night I downloaded a TV episode and played it in VLC. 30 seconds in, the power failed. Fine, it's an old laptop I'm using as a media server, battery is long dead - this never happened before but maybe something is loose. I checked the power supply and restarted it. It failed again at the same point in the video, and again a third time. Something about that video causes my laptop to die.
I turned it off and went to bed. Maybe I'll troubleshoot it today. But I'd love to understand what could have happened. The closest thing I know of is the Janet Jackson video that could crash hard drives [0]. In this case the sound was playing on a different device (my TV) so I don't think it's the same explanation.
For extra weirdness, the episode was Black Mirror S7E01. Exactly the kind of thing the creators would like to build into a Black Mirror episode.
Dying on the exact same frame, or just generally in the same spot?
In the case of the latter my first thought would be thermals. Different video codecs have significantly different decoding costs, and may also stress different parts of your system. You could check for that by playing that same video but not starting at the beginning and see if it's the same duration. Or jump to just before it dies and see if it plays through.
If by "downloaded" you mean The High Seas, those who provision the high seas are often on the cutting edge of using codecs with every last feature turned on to make the videos smaller to squeeze every last bit out of the encodings that they can, which can make them unusually expensive to decode. Or so I've heard.
I didn't get to dig much further into it, but for those of you who suggested ideas:
- not always the same frame. The first three failures were within seconds of each other, possibly the same frame. I tried again the next night and it got through that part of the video, but crashed a minute later
- I was able to play the video using a different app (Ubuntu's built-in Videos app from an old Ubuntu release, maybe 20.04)
Some of these video codecs have pathological cases that might be maxing out your video while doing the decoding. If you're only using it as a media server, that might exceed the (possibly age-degraded) capacity of your power supply. Replacing the power supply might help in that case.
It's also possible that something in a particular frame is triggering a bug in your driver and crashing that way. In that case, your best bet might be to transcode the video to a different codec or something.
Maybe your particular video download is from an entirely trustworthy source, but it's not unheard of that untrustworthy folks will modify a file with the intent of causing this to happen.
This, Stalking the Wiley Hacker[1], and others were the stories that got me into computers. I wish so much the experience of working in this industry hadn't so thoroughly annihilated the joy they once brought.
I had a chance to meet Cliff Stoll a couple summers ago. He was giving a presentation to a quilt society and it was great. If you ever have a chance to see him in person, no matter the topic, you will be greatly rewarded! He is such an energetic and enthusiastic person and he finds the beauty in all sorts of everyday things. I was captivated by him the entire time on a topic I only had a passing interest in at the beginning.
You just reminded me of my time working at Sendmail, where I often had to telnet to port 25 of some machine, and pretend to be a mail server sending email.
I used to be able to send all the commands without having to look them up. Not sure I could still do that today.
I think can still do it, 30 years after I last had to. The trauma of debugging sendmail m4 config issues for
hours while the company e-mail remained dysfunctional has permanently etched it into my mind.
EHLO example.com
MAIL FROM:<foo@example.com>
RCPT TO:<bar@example.com>
DATA
Subject: Hello, World
I have crawled through the depths of hell to deliver unto you this message.
.
I haven't worked at sendmail or even anything e-mail related, and I can do that… just enough e-mail fixing as side work. Let's call it sysadmin calluses.
What made me stumble recently was having to talk LMTP to fix a mailman setup. Cheeky fuckers changed EHLO into LHLO for LMTP. (To avoid any mixups between the protocols, which is fair.)
Also TO doesn't need to match. When you send to a group of BCC the envelope To has to specify the exact recipient, but the DATA doesn't. Similar with the envelope From and the one in the DATA - also useful to control bounces or who gets a reply.
Yeah I know the protocol and can do that manual, because I had to debug it often enough.
Units is a cool piece of software, but I have since switched to qalculate.
Mostly units has some silly defaults like needing to type tempC(30) instead of 30C; and it's nice to have a full calculator.
I know it's a way to specify that the conversion is absolute rather than relative, but qalculate just asks you about it the first time you convert, and since converting oven and outside temperatures is most of what I do, I don't havr to bother with remembering a different syntax.
Also qalculate is an awesome piece of software in general, so if you're excited by units you should check it out!
Our email systems are mostly mediated by giant hyper-scale companies (Microsoft, Google etc). The location of mail servers being where the recipient is seems quaint (and wonderfully decentralised).
And even if we do manage our own servers they are automated, and apps often containerised. Nobody ends up with older MTA due to an OS upgrade.
Remember reading this like 20 years ago nice to see it again.
…I almost choked on my breakfast bacon reading this. This is some fabulous “greybeard wizard” lore from the early days of the WWW that I just love hearing about.
Bless OP for sharing this gem today. I needed the laughter.
Never get tired of seeing this resurface every once and a while. There needs to be a /greatest for posts like these (while still allowing people to repost them every so often)
What I don't get is how the author can't pin the year down to anything narrower than "between 1994 and 1997," especially considering he wrote the article in 2002: only a few years later.
I'm not at all implying the story was fake; just this particular thing feels weird.
Honestly burst out laughing as I saw the FAQ section covering the timeout.
Thanks for sharing the link.
The ultimate explanation that he just pinged known distances to calculate the time and distance relation is actually brilliant I'm not sure it would have occured to me particularly quickly to just experiment.
It was a long time before I understood that remark.
Since Mel knew the numerical value
of every operation code,
and assigned his own drum addresses,
every instruction he wrote could also be considered
a numerical constant.
He could pick up an earlier "add" instruction, say,
and multiply by it,
if it had the right numeric value.
His code was not easy for someone else to modify.
Suddenly one winter morning, the PC wouldn't boot. I had to run to a meeting. When I got back, I turned the PC off and on again and everything was fine. The next morning, the same thing happened. The third day, I didn't have a meeting. I turned it off and back on, still no boot. I'd gotten in late, so I just turned it off and took an early lunch. When I got back, it still wouldn't boot. But I had a meeting, so I ran to that, leaving the computer on. When I got back, it booted fine.
The next morning, same thing. I decided to look inside, not having any idea what might cause such symptoms. As I took the shell off, a tiny mouse came out, jump off my desk, and ran across my lap before jumping on the floor and scurrying out of sight. From inside the computer came the smell of mouse urine. Apparently he'd been crawling in through the open drive bay to keep warm every night, and urinating while he was in there. Once the computer had been on for a while, the heat and airflow would dry it out enough to eliminate whatever electrical short was keeping it from booting. I went to the store and bought an empty drive sled to put in the drive bay whenever I took my drive out, and the problem never came back. I felt lucky that the liquid didn't cause permanent damage.
reply