Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, chat GPT excels at comprehending and explaining things that have a consistent structure, restructuring, and and synthesising variations. If you keep it in its lane, it’s an excellent tool.

It’s really really bad at counting though. For example, try asking it to produce a line of 40 asterisks.



It’s bad at counting because counting relies on a stateful O(N) algorithm you run in your brain.

GPT is trained to reproduce human text, which tends to simply have the output of this O(N) counting process, but not the process itself. So GPT “thinks” it should be able to just spit out the number just like human text implies we do. It doesn’t know we are relying on an offline O(N) algorithm.

If you have it emit a numbered list of 40 elements, it will succeed, because producing a numbered list embeds the O(N) process and state into the text, which is the only thing it can see and reason about.


That’s very interesting. I assumed it was something about the fact that it is a language model rather than a calculating machine. So printing 44 asterisks instead of 40 is kind of close.

I wonder if would it be possible to teach the machine to recognise situations it’s better at and be less confident other answers? Or does it need to be confident about everything in order to produce good answers where it does.

It’s kind of funny how confident chatgpt is about giving out bullshit, and then even when you correct it, it says oh I’m terribly sorry, here is definitely the correct answer this time and then it gives you another wrong answer. Just an observation, I realise it is just a tool that you have to understand the limitations of.


> here is definitely the correct answer this time and then it gives you another wrong answer.

My favorite is when it gets into some weird context loop, apologizes and claims to have corrected an issue, but gives you literally, character-for-character, the same answer it gave before.

Fortunately, it mostly happens to me when I am asking particularly ambiguous or weird questions -- e.g., asking for any assembly in AT&T/GAS syntax seems to always go wrong, not necessarily in terms of the logic itself, but rather that it ends up mixing Intel and AT&T, or asking explicitly for POSIX-compliant shell often gives weird Bash/GNUisms, presumably since so many StackOverflow posts seem to conflate all shells with Bash and always expect GNU coreutils.


We can check our answers, we can spit out bullshit like it does but then take the time to check them. It has no process for checking the answers or analyzing them and I'd rather not ask it how confident it is because that's just not what I care about.

I find it amazing that it can actually sort of run code "in its head", all the code output it does is not actually run through an interpreter but it's still pretty close if not perfect each time. But trying to run code with it is mostly for kicks, rather I asked it to produce a simple API for me and then produce a python script that tests it. it had no bugs and I could check it myself fairly fast; certainly faster than it would've taken me to write all that code without any bugs. I'd have had to check my own code for bugs anyway.

So if you accept that chatGPT is sort of like a guy that looked over millions of programmers shoulders but never actually communicated with any of them to understand the code, it has a perfect memory while not being able to compute much in its head then it can still be a great tool. Just understand its limitations and its advantages. Just because it can't reverse a string in its head doesn't mean it's "dumb" or not useful for everyday tasks.


I code with GitHub Copilot. I liken it to pair programming with an brilliant, insigntful & more experienced colleague who is always slightly drunk.


So basically a chat routine that’s been designed to hit the Ballmer peak.


Note that language models get much better at pretty much any reasoning task when they are prompted to use chain-of-thaught (Cot) reasoning. The difference between "Solve x" and "Solve x, let's think step by step" comes from the language model using the context window as short term memory in some sense. Perhaps your explanation in terms of complexity is better, but I'm not sure whether it explains the effectiveness of CoT in general.


Shouldn't RHLF help with this? So it learns that when people specify a number, they mean something very specific.


You cannot RL learn an O(N) algorithm in an O(1) feed forward neural network.

You could RL learn that when someone specifies a number, the appropriate thing to say is "Ok, 40 asterisks, let's count them, 1, *, 2, *, 3 , *, ..." and then it would indeed produce 40 asterisks. But not as a single string. Because producing them as a single contiguous string requires some offline state/memory/processing, and all the neural network has access to is the last ~page of text.

Embedding the counting process into the text itself kind of embeds the state of the O(N) algorithm in the O(N) text itself, that is, "unrolling the loop" externally.


It doesn’t have any logic; it just tries to complete strings in the most plausible way. It’s training material probably did not have a lot of “write five at signs: @@@@@“. RLHF might help steer it in the right direction, but probably wouldn’t product the concept of counting or loops.


So, this is where I guess I just don't understand. I've had ChatGPT produce code for me that there is absolutely no way it already had in its training set. I realize it can't actually "think", but then I also don't know how to describe what I'm seeing.


It gave me 40 short Gaulish warriors..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: