More

j2kun · 2026-01-23T06:04:15 1769148255

They counted multiple hallucinations in a single paper toward the 100, and explicitly call out one paper with 13 incorrect citations that are claimed (reasonably, IMO) to be hallucinated.

Lerc · 2026-01-23T08:12:52 1769155972

So you are saying their claim of

>GPTZero's analysis 4841 papers accepted by NeurIPS 2025 show there are at least 100 with confirmed hallucinations

Is not true. [Edit - that sounds a bit harsh making it seem like you are accusing them, it's more that this is a logical conclusion of your(imo reasonable) interpretation.

j2kun · 2026-01-23T17:23:33 1769189013

I think it is true and intentionally vague for marketing purposes. And FWIW, I support the effort writ large

j2kun · 2026-01-22T18:52:28 1769107948

I spot-checked one of the flagged papers (from Google, co-authored by a colleague of mine)

The paper was https://openreview.net/forum?id=0ZnXGzLcOg and the problem flagged was "Two authors are omitted and one (Kyle Richardson) is added. This paper was published at ICLR 2024." I.e., for one cited paper, the author list was off and the venue was wrong. And this citation was mentioned in the background section of the paper, and not fundamental to the validity of the paper. So the citation was not fabricated, but it was incorrectly attributed (perhaps via use of an AI autocomplete).

I think there are some egregious papers in their dataset, and this error does make me pause to wonder how much of the rest of the paper used AI assistance. That said, the "single error" papers in the dataset seem similar to the one I checked: relatively harmless and minor errors (which would be immediately caught by a DOI checker), and so I have to assume some of these were included in the dataset mainly to amplify the author's product pitch. It succeeded.

i_am_proteus · 2026-01-22T19:49:42 1769111382

>this error does make me pause to wonder how much of the rest of the paper used AI assistance

And this is what's operative here. The error spotted, the entire class of error spotted, is easily checked/verified by a non-domain expert. These are the errors we can confirm readily, with obvious and unmistakable signature of hallucination.

If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.

Checking the rest of the paper requires domain expertise, perhaps requires an attempt at reproducing the authors' results. That the rest of the paper is now in doubt, and that this problem is so widespread, threatens the validity of the fundamental activity these papers represent: research.

neilv · 2026-01-22T22:44:58 1769121898

> If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.

I am troubled by people using an LLM at all to write academic research papers.

It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.

I'd see a failure of the 'author' to catch hallucinations, to be more like a failure to hide evidence of misconduct.

If academic venues are saying that using an LLM to write your papers is OK ("so long as you look it over for hallucinations"?), then those academic venues deserve every bit of operational pain and damaged reputation that will result.

derefr · 2026-01-23T00:25:30 1769127930

I would argue that an LLM is a perfectly sensible tool for structure-preserving machine translation from another language to English. (Where by "another language", you could also also substitute "very poor/non-fluent English." Though IMHO that's a bit silly, even though it's possible; there's little sense in writing in a language you only half know, when you'd get a less-lossy result from just writing in your native tongue, and then having it translate from that.)

Google Translate et al were never good enough at this task to actually allow people to use the results for anything professional. Previous tools were limited to getting a rough gloss of what words in another language mean.

But LLMs can be used in this way, and are being used in this way; and this is increasingly allowing non-English-fluent academics to publish papers in English-language journals (thus engaging with the English-language academic community), where previously those academics they may have felt "stuck" publishing in what few journals exist for their discipline in their own language.

Would you call the use of LLMs for translation "shoddy" or "irresponsible"? To me, it'd be no more and no less "shoddy" or "irresponsible" than it would be to hire a freelance human translator to translate the paper for you. (In fact, the human translator might be a worse idea, as LLMs are more likely to understand how to translate the specific academic jargon of your discipline than a randomly-selected human translator would be.)

gus_massa · 2026-01-23T02:11:15 1769134275

Autotranslating technical texts is very hard. After the translation, you muct check that all the technical words were translated correctly, instead of a fancy synonym that does not make sense.

(A friend has an old book translated a long time ago (by a human) from Russian to Spanish. Instead of "complex numbers", the book calls them "complicated numbers". :) )

QuercusMax · 2026-01-23T06:27:11 1769149631

I remember one time when I had written a bunch of user facing text for an imaging app and was reviewing our French translation. I don't speak French but I was pretty sure "plane" (as in geometry) shouldn't be translated as "avion". And this was human translated!

jfim · 2026-01-23T11:49:58 1769168998

You'd be surprised how shoddy human translations can be, and it's not necessarily because of the translators themselves.

Typically what happens is that translators are given an Excel sheet with the original text in a column, and the translated text must be put into the next column. Because there's no context, it's not necessarily clear to the translator whether the translation for plane should be avion (airplane) or plan (geometric plane). The translator might not ever see the actual software with their translated text.

derefr · 2026-01-23T15:48:36 1769183316

The convenient thing in this case (verification of translation of academic papers from the speaker's native language to English) is that the authors of the paper likely already 1. can read English to some degree, and 2. are highly likely to be familiar specifically with the jargon terms of their field in both their own language and in English.

This is because, even in countries with a different primary spoken language, many academic subjects, especially at a graduate level (masters/PhD programs — i.e. when publishing starts to matter), are still taught at universities at least partly in English. The best textbooks are usually written in English (with acceptably-faithful translations of these texts being rarer than you'd think); all the seminal papers one might reference are likely to be in English; etc. For many programs, the ability to read English to some degree is a requirement for attendance.

And yet these same programs are also likely to provide lectures (and TA assistance) in the country's own native language, with the native-language versions of the jargon terms used. And any collaborative work is likely to also occur in the native language. So attendees of such programs end up exposed to both the native-language and English-language terms within their field.

This means that academics in these places often have very little trouble in verifying the fidelity of translation of the jargon in their papers. It's usually all the other stuff in the translation that they aren't sure is correct. But this can be cheaply verified by handing the paper to any fluently-multilingual non-academic and asking them to check the translation, with the instruction to just ignore the jargon terms because they were already verified.

gus_massa · 2026-01-23T19:49:41 1769197781

> with the native-language versions of the jargon terms used

It depends on the country. Here in Argentina we use a lot of loaned words for technical terms, but I think in Spain they like to translate everything.

neves · 2026-01-23T12:24:23 1769171063

When reading technical material in my native language, I sometimes need to translate it back to English to fully understand it.

Davidzheng · 2026-01-23T11:57:39 1769169459

idk I think Gemini 2.5 did a great job at almost all research math papers translating from french to english...

noooooooph · 2026-01-23T12:43:09 1769172189

To that point I think it's lovely how LLMs democratize science. At ICLR a few years ago I spoke with a few Korean researchers that were delighted that their relative inability to write in English was no being held against them during the review process. I think until then I underestimated how pivotal this technology was in lowering the barrier to entry for the non-English speaking scientific community.

melagonster · 2026-01-23T02:21:55 1769134915

If they can write a whole draft in their first language, they can easily read the translated English version and correct it. The errors described by gp/op were generated when authors directly required LLM to generate a full paragraph of text. Look at my terrible English; I really have the experience of the full process from draft to English version before :)

abbassix · 2026-01-23T07:43:13 1769154193

We still do not have a standardized way to represent Machine Learning concepts. For example in vision model, I see lots of papers confused about the "skip connections" and "residual connection" and when they concatenate channels they call them "residual connection" while it shows that they haven't understood why we call them "residual" in the first place. In my humble opinion, each conference, and better be a confederation of conferences, work together to provide a glossary, a technical guideline, and also a special machine translation tool, to correct a non-clear-with-lots-of-grammatical-error-English like mine!

neves · 2026-01-23T12:22:48 1769170968

I'm surprised by these results. I agree that LLMs are a great tool for offsetting the English-speaking world's advantage. I would have expected non-Anglo-American universities to rank at the top of the list. One of the most valuable features of LLMs from the beginning has been their ability to improve written language.

Why is their use more intense in English-speaking universities?

neilv · 2026-01-23T01:06:55 1769130415

Good point. There may be a place for LLMs for science writing translation (hopefully not adding nor subtracting anything) when you're not fluent in the language of a venue.

You need a way to validate the correctness of the translation, and to be able to stand behind whatever the translation says. And the translation should be disclosed on the paper.

bjourne · 2026-01-23T01:19:53 1769131193

There are legitimate, non-cheating ways to use LLMs for writing. I often use the wrong verb forms ("They synthesizes the ..."), write "though" when it should be "although", and forget to comma-separate clauses. LLMs are perfect for that. Generating text from scratch, however, is wrong.

thaumasiotes · 2026-01-23T05:04:06 1769144646

> I often ... write "though" when it should be "although"

That is a purely imaginary "error". Anywhere you can use 'although', you are free to use 'though' instead.

bjourne · 2026-01-23T13:29:23 1769174963

Yeah, but you cannot use although anywhere you can use though, though.

thaumasiotes · 2026-01-23T16:47:09 1769186829

That's true, but the one-way substitutability still means there is no such thing as "writ[ing] 'though' when it should be 'although'".

bloppe · 2026-01-23T08:27:21 1769156841

I agree, but I don't think any of the broadly acceptable uses would result in easily identifiable flaws like those in the post, especially hallucinated URLs.

rustystump · 2026-01-23T05:51:13 1769147473

I do similar proofing (esp spelling) but u need to be very careful as it will nudge u to specific styles that rob originality.

piyh · 2026-01-23T17:43:08 1769190188

>I am troubled by people using an LLM at all to write academic research papers.

I'm an outsider to the academic system. I have cool projects that I feel push some niche application to SOTA in my tiny little domain, which is publishable based on many of the papers I've read.

If I can build a system that does a thing, I can benchmark and prove it's better than previous papers, my main blocker is getting all my work and information into the "Arxiv PDF" format and tone. Seems like a good use of LLMs to me.

thomasahle · 2026-01-23T06:56:32 1769151392

> And also plagiarism, when you claim authorship of it.

I don't actually mind putting Claude as a co-author on my github commits.

But for papers there are usually so many tools involved. It would be crowded to include each of Claude, Gemini, Codex, Mathematica, Grammarly, Translate etc. as co-authors, even though I used all of them for some parts.

Maybe just having a "tools used" section could work?

the__alchemist · 2026-01-23T17:43:17 1769190197

I suspect the parent post was concerned about plagiarizing the author of training data; not software tools.

mapontosevenths · 2026-01-22T23:16:19 1769123779

> It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.

It reminds me of kids these days and their fancy calculators! Those new fangled doohickeys just aren't reliable, and the kids never realize that they won't always have a calculator on them! Everyone should just do it the good old fashioned way with slide rules!

Or these darn kids and their unreliable sources like Wikipedia! Everyone knows that you need a nice solid reliable source that's made out of dead trees and fact checked but up to 3 paid professionals!

usefulcat · 2026-01-22T23:48:49 1769125729

I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably.

Sure, maybe someday LLMs will be able to report facts in a mostly reliable fashion (like a typical calculator), but we're definitely not even close to that yet, so until we are the skepticism is very much warranted. Especially when the details really do matter, as in scientific research.

westurner · 2026-01-23T01:42:31 1769132551

> I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably

Reproducibility and repeatability in the sciences?

Replication crisis > Causes > Problems with the publication system in science > Mathematical errors; Causes > Questionable research practices > In AI research, Remedies > [..., open science, reproducible workflows, disclosure, ] https://en.wikipedia.org/wiki/Replication_crisis#Mathematica...

Already verifiable proofs are too impossibly many pages for human review

There are "verify each Premise" and "verify the logical form of the Argument" (P therefore Q) steps that still the model doesn't do for the user.

For your domain, how insufficient is the output given process as a prompt like:

Identify hallucinations from models prior to (date in the future)

Check each sentence of this: ```{...}```

Research ScholarlyArticles (and then their Datasets) which support and which reject your conclusions. Critically review findings and controls.

Suggest code to write to apply data science principles to proving correlative and causative relations given already-collected observations.

Design experiment(s) given the scientific method to statistically prove causative (and also correlative) relations

Identify a meta-analytic workflow (process, tools, schema, and maybe code) for proving what is suggested by this chat

mapontosevenths · 2026-01-23T00:09:47 1769126987

> whether the researcher's calculator was working reliably.

LLM's do not work reliably, that's not their purpose.

If you use them that way it's akin to using a butter knife as a screwdriver. You might get away with it once or twice, but then you slip and stab yourself. Better to go find screwdriver if you need reliable.

foxes · 2026-01-22T23:23:01 1769124181

Im really not motivated by this argument; it seems a false equivalence. Its not merely a spell checker or removing some tedium.

As a professional mathematician I used wikipedia all the time to lookup quick facts before verifying it myself or elsewhere. A calculator well; I can use an actual programming language.

Up until this point neither of those tools were asvertised or used by people to entirely replace human input.

ekidd · 2026-01-23T01:11:04 1769130664

There are some interesting possibilities for LLMs in math, especially in terms of generating machine-checked proofs using languages like Lean. But this is a supplement to the actual result, where the LLM would actually be adding a more rigorous version of a human's argument with all the boring steps included.

In a few cases, I see Terrance Tao has pointed out examples LLMs actually finding proofs of open problems unassisted. Not necessarily problems anyone cared deeply about. But there's still the fact that if the proof holds, then it's valid no matter who or what came up with it.

So it's complicated I guess?

sodapopcan · 2026-01-22T23:38:50 1769125130

I hate to sound like a 19 year old on Reddit but:

AI People: "AI is a completely unprecedented technology where its introduction is unlike the introduction of any other transformative technology in history! We must treat it totally differently!"

Also AI People: "You're worried about nothing, this is just like when people were worried about the internet."

mikkupikku · 2026-01-23T11:29:54 1769167794

The internet analogy is apt because it was in fact a massive bubble, but that bubble popping didn't mean the tech went away. Same will happen again, which is a point both extremes miss. One would have you believe there is no bubble and you should dump all your money into this industry, while the other would have us believe that once the bubble pops all this AI stuff will be debunked and discarded as useless scamware.

foxes · 2026-01-23T03:07:37 1769137657

Well the internet has definitely changed things; but also it wasnt initially controlled by a bunch of megacorps with the same level of power and centralisation today.

sodapopcan · 2026-01-23T03:29:29 1769138969

:pointing-up-emoji:

andrepd · 2026-01-23T02:07:33 1769134053

> Those new fangled doohickeys just aren't reliable

Except they are (unlike a chatbot, a calculator is perfectly deterministic), and the unreliability of LLMs is one of their most, if not the most, widespread target of criticism.

Low effort doesn't even begin to describe your comment.

jama211 · 2026-01-23T03:56:01 1769140561

As low effort as you hand waving away any nuance because it doesn’t agree with you?

mapontosevenths · 2026-01-23T03:09:17 1769137757

> Except they are (unlike a chatbot, a calculator is perfectly deterministic)

LLM's are supposed to be stochastic. That is not a bug, I can see why you find that disappointing but it's just the reality of the tool.

However, as I mentioned elsewhere calculators also have bugs and those bugs make their way into scientific research all the time. Floating point errors are particularly common, as are order of operations problems because physical devices get it wrong frequently and are hard to patch. Worse, they are not SUPPOSED TO BE stochastic so when they fail nobody notices until it's far too late. [0 - PDF]

Further, spreadsheets are no better, for example a scan of ~3,600 genomics papers found that about 1 in 5 had gene‑name errors (e.g., SEPT2 → “2‑Sep”) because that's how Excel likes to format things.[1] Again, this is much worse than a stochastic machine doing it's stochastic job... because it's not SUPPOSED to be random, it's just broken and on a truly massive scale.

[0] https://ttu-ir.tdl.org/server/api/core/bitstreams/7fce5b73-1...

[1]https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-al...

raddan · 2026-01-23T04:51:04 1769143864

That’s a strange argument. There are plenty of stochastic processes that have perfectly acceptable guarantees. A good example is Karger’s min-cut algorithm. You might not know what you get on any given single run, but you know EXACTLY what you’re going to get when you crank up the number of trials.

Nobody can tell you what you are going to get when you run an LLM once. Nobody can tell you what you’re going to get when you run it N times. There are, in fact, no guarantees at all. Nobody even really knows why it can solve some problems and why it can’t solve other except maybe it memorized the answer at some point. But this is not how they are marketed.

They are marketed as wondrous inventions that can SOLVE EVERYTHING. This is obviously not true. You can verify it yourself, with a simple deterministic problem: generate an arithmetic expression of length N. As you increase N, the probability that an LLM can solve it drops to zero.

Ok, fine. This kind of problem is not a good fit for an LLM. But which is? And after you’ve found a problem that seems like a good fit, how do you know? Did you test it systematically? The big LLM vendors are fudging the numbers. They’re testing on the training set, they’re using ad hoc measurements and so on. But don’t take my word for it. There’s lots of great literature out there that probes the eccentricities of these models; for some reason this work rarely makes its way into the HN echo chamber.

Now I’m not saying these things are broken and useless. Far from it. I use them every day. But I don’t trust anything they produce, because there are no guarantees, and I have been burned many times. If you have not been burned, you’re either exceptionally lucky, you are asking it to solve homework assignments, or you are ignoring the pain.

Excel bugs are not the same thing. Most of those problems can be found trivially. You can find them because Excel is a language with clear rules (just not clear to those particular users). The problem with Excel is that people aren’t looking for bugs.

mapontosevenths · 2026-01-23T15:08:30 1769180910

> But I don’t trust anything they produce, because there are no guarantees

> Did you test it systematically?

Yes! That is exactly the right way to use them. For example, when I'm vibe coding I don't ask it to write code. I ask it to write unit tests. THEN I verify that the test is actually testing for the right things with my own eyeballs. THEN I ask it to write code that passes the unit tests.

Same with even text formatting. Sometimes I ask it to write a pydantic script to validate text inputs of "x" format. Often writing the text to specify the format is itself a major undertaking. Then once the script is working I ask for the text, and tell it to use the script to validate it. After that I can know that I can expect deterministic results, though it often takes a few tries for it to pass the validator.

You CAN get deterministic results, you just have to adapt your expectations to match what the tool is capable of instead of expecting your hammer to magically be a great screwdriver.

I do agree that the SOLVE EVERYTHING crowd are severely misguided, but so are the SOLVE NOTHING crowd. It's a tool, just use it properly and all will be well.

api · 2026-01-22T23:38:37 1769125117

One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.

I do think they can be used in research but not without careful checking. In my own work I’ve found them most useful as search aids and brainstorming sounding boards.

mapontosevenths · 2026-01-23T00:08:21 1769126901

> I do think they can be used in research but not without careful checking.

Of course you are right. It is the same with all tools, calculators included, if you use them improperly you get poor results.

In this case they're stochastic, which isn't something people are used to happening with computers yet. You have to understand that and learn how to use them or you will get poor results.

mapontosevenths · 2026-01-23T00:12:34 1769127154

> One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.

I made this a separate comment, because it's wildly off topic, but... they actually aren't. Especially for very large numbers or for high precision. When's the last time you did a firmware update on yours?

It's fairly trivial to find lists of calculator flaws and then identify them in research papers. I recall reading a research paper about it in the 00's.

ragnarok451 · 2026-01-22T23:55:20 1769126120

One issue with this analogy is that paper encyclopedias really are precise when used correctly. Wikipedia is not.

I do think it can be used in research but not without careful checking. In my own work I've found it most useful as a search aid and for brainstorming.

^ this same comment 10 years ago

mikkupikku · 2026-01-23T11:35:25 1769168125

Paper encyclopedias were neither precise nor accurate. You could count on them to give you ballpark figures most of the time, but certainly not precise answers. And that's assuming the set was new, but in reality most encyclopedias ever encountered by people in reality were several years old at least. I remember the encyclopedia set I had access to in the 90s was written before the USSR fell..

mapontosevenths · 2026-01-23T00:15:27 1769127327

> I do think it can be used in research but not without careful checking.

This is really just restating what I already said in this thread, but you're right. That's because wikipedia isn't a primary source and was never, ever meant to be. You are SUPPOSED to go read it then click through to the primary sources and cite those.

Lots of people use it incorrectly and get bad results because they still haven't realized this... all these years later.

Same thing with treating stochastic LLM's like sources of truth and knowledge. Those folks are just doing it wrong.

neilv · 2026-01-22T23:22:42 1769124162

Annoying dismissal.

In an academic paper, you condense a lot of thinking and work, into a writeup.

Why would you blow off the writeup part, and impose AI slop upon the reviewers and the research community?

HKH2 · 2026-01-23T00:26:24 1769127984

I don't necessarily disagree, but researchers are not required to be good communicators. An academic can lead their field and be a terrible lecturer. A specialist can let a generalist help explain concepts for them.

They should still review the final result though. There is no excuse for not doing that.

dasyud · 2026-01-23T01:05:23 1769130323

I disagree here. A good researcher has to be a good communicator. I am not saying that it is necessarily the case that you don't understand the topic if you cannot explain it well enough to someone new, but it is essential to communicate to have a good exchange of ideas with others, and consequently, become a better researcher. This is one of the skills you learn in a PhD program.

HKH2 · 2026-01-23T11:56:54 1769169414

That is how it should be, yes. Do PhDs always meet that standard though? No.

aydyn · 2026-01-23T03:03:27 1769137407

>also plagiarism

To me, this is a reminder of how much of a specific minority this forum is.

Nobody I know in real life, personally or at work, has expressed this belief.

I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.

Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism", and I would trust their opinions far more than some random redditor.

BobbyJo · 2026-01-23T05:35:06 1769146506

> Nobody I know in real life, personally or at work, has expressed this belief.

TBF, most people in real life don't even know how AI works to any degree, so using that as an argument that parent's opinion is extreme is kind of circular reasoning.

> I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.

I don't see parent's opinions as anti-AI. It's more an argument about what AI is currently, and what research is supposed to be. AI is existing ideas. Research is supposed to be new ideas. If much of your research paper can be written by AI, I call into question whether or not it represents actual research.

michaelt · 2026-01-23T08:43:04 1769157784

> Research is supposed to be new ideas. If much of your research paper can be written by AI, I call into question whether or not it represents actual research.

One would hope the authors are forming a hypothesis, performing an experiment, gathering and analysing results, and only then passing it to the AI to convert it into a paper.

If I have a theory that, IDK, laser welds in a sine wave pattern are stronger than laser welds in a zigzag pattern - I've still got to design the exact experimental details, obtain all the equipment and consumables, cut a few dozen test coupons, weld them, strength test them, and record all the measurements.

Obviously if I skipped the experimentation and just had an AI fabricate the results table, that's academic misconduct of the clearest form.

BobbyJo · 2026-01-24T01:07:19 1769216839

I am not an academic, so correct me if I am wrong, but in your example, the actual writing would probably only represent a small fraction of the time spent. Is it even worth using AI for anything other than spelling and grammar correction at that point? I think using an LLM to generate a paper from high level points wouldn't save much, if any, time if it was then reviewed the way that would require.

My brother in law is a professor, and he has a pretty bad opinion of colleagues that use LLMs to write papers, as his field (economics) doesn't involve much experimentation, and instead relies on data analysis, simulation, and reasoning. It seemed to me like the LLM assisted papers that he's seen have mostly been pretty low impact filler papers.

aydyn · 2026-01-25T05:57:09 1769320629

> I am not an academic, so correct me if I am wrong, but in your example, the actual writing would probably only represent a small fraction of the time spent. Is it even worth using AI for anything other than spelling and grammar correction at that point? I think using an LLM to generate a paper from high level points wouldn't save much, if any, time if it was then reviewed the way that would require.

Its understandable that you believe that, but its absolutely true that writing in academia is a huge time sink. Think about it, the first thing your reviewers are going to notice is not results but how well it is written.

If its written terribly you have lost, and it doesnt matter how good your results are at that point. Its common to spend days with your PI writing a paper to perfection, and then spend months back and forth with reviewers updating and improving the text. This is even more true the higher up you go in journal prestige.

aydyn · 2026-01-23T06:32:52 1769149972

> TBF, most people in real life don't even know how AI works to any degree

How about the authors who do research for NeurIPS? Do they know how AI works?

Intermernet · 2026-01-23T12:52:55 1769172775

Who knows? Do NeurIPS have a pedigree of original, well sourced research dating back to before the advent of LLMs? We're at the point where both of the terms "AI" and "Experts" are so blurred it's almost impossible to trust or distrust anything without spending more time on due diligence than most subjects deserve.

As the wise woman once said "Ain't nobody got time for that".

Davidzheng · 2026-01-23T12:01:40 1769169700

"If much of your research paper can be written by AI, I call into question whether or not it represents actual research" And what happens to this statement if next year or later this year the papers that can be autonomously written passes median human paper mark?

BobbyJo · 2026-01-24T00:59:59 1769216399

What does it mean to cross the median human paper mark? How os that measured?

It seems to me like most of the LLM benchmarks wind up being gamed. So, even if there were a good benchmark there, which I do not believe there is, the validity of the benchmark would likely diminish pretty quickly.

fingerlocks · 2026-01-23T03:25:28 1769138728

I find that hard to believe. Every creative professional that I know shares this sentiment. That’s several graphic designers at big tech companies, one person in print media, and one visual effects artist in the film industry. And once you include many of their professional colleagues that becomes a decent sample size.

jama211 · 2026-01-23T03:52:29 1769140349

Graphic design is a completely different kettle of fish. Comparing it to academic paper writing is disingenuous.

kortilla · 2026-01-23T04:20:20 1769142020

The thread is about not knowing anyone at all who thinks AI is plagiarizing.

jama211 · 2026-01-23T18:09:10 1769191750

Yes, plagiarising text based content. No one in this thread meant graphics.

nick238 · 2026-01-23T04:55:08 1769144108

The LLM model and version should be included as an author so there's useful information about where the content came from.

neilv · 2026-01-23T04:43:29 1769143409

> AI Overview

> Plagiarism is using someone else's words, ideas, or work as your own without proper credit, a serious breach of ethics leading to academic failure, job loss, or legal issues, and can range from copying text (direct) to paraphrasing without citation (mosaic), often detected by software and best avoided by meticulous citation, quoting, and paraphrasing to show original thought and attribution.

aydyn · 2026-01-25T05:59:00 1769320740

Not sure if I am correctly interpreting your implicit point but

> Plagiarism is using someone else's words,

Its right there. LLM is not "someone else"; its a very useful piece of software.

Culonavirus · 2026-01-23T12:31:52 1769171512

Higher education is not free. People pay a shit ton of money to attend and also governments (taxpayers) invest a lot. Imagine offloading your research to an AI bot...

falkensmaize · 2026-01-23T04:08:09 1769141289

“Anti-AI extremism”? Seriously?

Where does this bizarre impulse to dogmatically defend LLM output come from? I don’t understand it.

If AI is a reliable and quality tool, that will become evident without the need to defend it - it’s got billions (trillions?) of dollars backstopping it. The skeptical pushback is WAY more important right now than the optimistic embrace.

cthalupa · 2026-01-23T04:28:18 1769142498

The fact that there is absurd AI hype right now doesn't mean that we should let equally absurd bullshit pass on the other side of the spectrum. Having a reasonable and accurate discussion about the benefits, drawbacks, side effects, etc. is WAY more important right now than being flagrantly incorrect in either direction.

Meanwhile this entire comment thread is about what appears to be, as fumi2026 points out in their comment, a predatory marketing play by a startup hoping to capitalize on the exact sort of anti AI sentiment that you seem to think is important... just because there is pro AI sentiment?

Naming and shaming everyday researchers based on the idea that they have let hallucinations slip into their paper all because your own AI model has decided thatit was AI so you can signal boost your product seems pretty shitty and exploitative to me, and is only viable as a product and marketing strategy because of the visceral anti AI sentiment in some places.

falkensmaize · 2026-01-23T05:06:54 1769144814

“anti-ai sentiment”

No that’s a straw man, sorry. Skepticism is not the same thing as irrational rejection. It means that I don’t believe you until you’ve proven with evidence that what you’re saying is true.

The efficacy and reliability of LLMs requires proof. Ai companies are pouring extraordinary, unprecedented amounts of money into promoting the idea that their products are intelligent and trustworthy. That marketing push absolutely dwarfs the skeptical voices and that’s what makes those voices more important at the moment. If the researchers named have claims made against them that aren’t true, that should be a pretty easy thing for them to refute.

rustystump · 2026-01-23T05:48:43 1769147323

The cat is out of the bag tho. AI does have provably crazy value. Certainly not the agi hype marketing spews and who knows how economically viable it would be without vc.

However, i think any one who is still skeptical of the real efficacy is willfully ignorant. This is not a moral endorsement on how it was made or if it is moral to use but god damn it is a game changer across vast domains.

necovek · 2026-01-23T13:31:00 1769175060

There were a number of studies already shared reporting on the impression of increased efficiency without the actual increase in efficiency.

Which means that it's still not a given, though there are obviously cases where individual cases seem to be good proof of it.

cthalupa · 2026-01-24T03:44:47 1769226287

There was a front page post just a couple of days ago where the article claimed LLMs have not improved in any way in over a year - an obviously absurd statement. A year before Opus 4.5, I couldn't get models to spit out a one shot Tampermonkey script to add chapter turns to my arrow keys. Now I can one small personal projects in claude code.

If you are saying that people are not making irrational and intellectually dishonest arguments about AI, I can't believe that we're reading the same articles and same comments.

techpression · 2026-01-23T05:56:48 1769147808

Isn’t that the whole point of publishing? This happened plenty before AI too, and the claims are easily verified by checking the claimed hallucinations. Don’t publish things that aren’t verified and you won’t have a problem, same as before but perhaps now it’s easier to verify, which is a good thing. We see this problem in many areas, last week it was a criminal case where a made up law was referenced, luckily the judge knew to call it out. We can’t just blindly trust things in this era, and calling it out is the only way to bring it up to the surface.

cthalupa · 2026-01-24T03:48:52 1769226532

> Isn’t that the whole point of publishing?

No, obviously not. You're confusing a marketing post by people with a product to sell with an actual review of the work by the relevant community, or even review by interested laypeople.

This is a marketing post where they provide no evidence that any of these are hallucinations beyond their own AI tool telling them so - and how do we know it isn't hallucinating? Are there hallucinations in there? Almost certainly. Would the authors deserve being called out by people reviewing their work? Sure.

But what people don't deserve is an unrelated VC funded tech company jumping in and claiming all of their errors are LLM hallucinations when they have no actual proof, painting them all a certain way so they can sell their product.

> Don’t publish things that aren’t verified and you won’t have a problem

If we were holding this company to the same standard, this blog wouldn't be posted either. They have not and can not verify their claims - they can't even say that their claims are based on their own investigations.

techpression · 2026-01-24T05:51:56 1769233916

Most research is funded by someone with a product to sell, not all but a frightening amount of it. VC to sell, VC to review. The burden of proof is always on the one publishing and it can be a very frustrating experience, but that is how it is, the one making the claim needs to defend themselves, from people (who can be a very big hit or miss) or machines alike. The good thing is that if this product is crap then it will quickly disappear.

cthalupa · 2026-01-24T06:32:12 1769236332

That's still different from a bunch of researchers being specifically put in a negative light purely to sell a product. They weren't criticized so that they could do better, be it in their own error checking if it was a human-induced issue, or not relying on LLMs to do the work they should have been. They were put on blast to sell a product.

That's quite a bit different than a study being funded by someone with a product to sell.

jama211 · 2026-01-23T03:54:07 1769140447

Yup, and no matter how flimsy an anti-ai article is, it will skyrocket to the top of HN because of it. It makes sense though, HN users are the most likely to feel threatened by LLMs, and therefore are more likely to be anxious about them.

I don’t love ai either, but that’s the truth.

techpression · 2026-01-23T05:50:29 1769147429

Strange, I find it quite the opposite, especially ”pro-ai” comments are often top of the list.

jama211 · 2026-01-23T18:08:04 1769191684

I think there’s a bit of both, with a valley in the middle

neilv · 2026-01-23T05:53:22 1769147602

> Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism",

Or they didn't consider that it arguably fell within academia's definition of plagiarism.

Or they thought they could get away with it.

Why is someone behaving questionably the authority on whether that's OK?

> Nobody I know in real life, personally or at work, has expressed this belief. I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.

It's not "anti-AI extremism".

If no one you know has said, "Hey, wait a minute, if I'm copy&pasting this text I didn't write, and putting my name on it, without credit or attribution, isn't that like... no... what am I missing?" then maybe they are focused on other angles.

That doesn't mean that people who consider different angles than your friends do are "extremist".

They're only "extremist" in the way that anyone critical at all of 'crypto' was "extremist", to the bros pumping it. Not coincidentally, there's some overlap in bros between the two.

aydyn · 2026-01-23T06:34:01 1769150041

> Why is someone behaving questionably the authority on whether that's OK?

Because they are not. Using AI to help writing is something literally every company is pushing for.

tsimionescu · 2026-01-23T13:14:36 1769174076

How is that relevant? Companies care very little about plagiarism, at least in the ethical sense (they do care if they think it's a legal risk, but that has turned out to not be the case with AI, so far at least).

aydyn · 2026-01-23T16:21:19 1769185279

What do you mean how is that relevant? Its a vast majority opinion in society that using ai to help you write is fine. Calling it "plagiarism" is a tiny minority online opinion.

tsimionescu · 2026-01-23T16:58:32 1769187512

First of all, the very fact that companies need to encourage it shows that it is not already a majority opinion in society, it is a majority opinion among company management, which is often extremely unethical.

Secondly, even if it is true that it is a majority opinion in society doesn't mean it's right. Society at large often misunderstands how technology works and what risks it brings and what are its inevitable downstream effects. It was a majority opinion in society for decades or centuries that smoking is neutral to your health - that doesn't mean they were right.

aydyn · 2026-01-23T18:09:15 1769191755

> Secondly, even if it is true that it is a majority opinion in society doesn't mean it's right. Society at large often misunderstands how technology works and what risks it brings and what are its inevitable downstream effects. It was a majority opinion in society for decades or centuries that smoking is neutral to your health - that doesn't mean they were right.

That its a majority opinion instead of a tiny minority opinion is a strong signal that its more likely to be correct. For example its a majority opinion that murder is bad; this has held true for millennia.

Heres a simpler explanation: toaster frickers tend to seek out other toaster frickers online in niche communities. Occams razor.

necovek · 2026-01-23T13:27:21 1769174841

As long as AI companies have paid them to train on their data (see a number of licensing deals between OpenAI and news agencies and such).

fn-mote · 2026-01-22T21:35:53 1769117753

This seems like finding spelling errors and using them to cast the entire paper into doubt.

I am unconvinced that the particular error mentioned above is a hallucination, and even less convinced that it is a sign of some kind of rampant use of AI.

I hope to find better examples later in the comment section.

j2kun · 2026-01-22T22:07:30 1769119650

I actually believe it was an AI hallucination, but I agree with you that it seems the problem is far more concentrated to a few select papers (e.g., one paper made up more than 10% of the detected errors).

gold23 · 2026-01-22T22:48:08 1769122088

Why don't you look at the actual article? There are several more egregious examples, e.g., the authors being cited as "John Smith and Jane Doe"

ted_dunning · 2026-01-22T23:31:59 1769124719

I can see that either way. It could also be a placeholder until the actual author list is inserted. This could happen if you know the title, but not the authors and insert a temporary reference entry.

gs17 · 2026-01-22T23:51:27 1769125887

The first Doe and Smith example I could give that to (the title is real and the arxiv ID they give is "arXiv:2401.00001", which is definitely placeholder), but the second one doesn't match a title and has fake URL/DOI that don't actually go anywhere. There's a few that are unambiguously placeholders, but they really should have been caught in review for a conference this high up.

mikkupikku · 2026-01-23T14:34:17 1769178857

How does a "placeholder citation" even happens? Either enter the citation properly now, or do it properly later. What role does a "placeholder citation" serve, besides giving you something to forget about and fuck up?

I do not believe the placeholder citation theory at all.

recursive · 2026-01-22T22:57:22 1769122642

What's the big deal with one dead canary? This coal mine's productivity is at record highs!

hojinkoh · 2026-01-24T04:02:56 1769227376

> This seems like finding spelling errors and using them to cast the entire paper into doubt.

Well, to be fair, I did encounter this from actual human peer reviewers before the whole LLM thing. People do that.

jvanderbot · 2026-01-22T22:02:57 1769119377

The problem is, 10 years ago when I was still publishing even I would let an incorrect citation go through b/c of an old bibtex file or some such.

0xWTF · 2026-01-22T23:05:34 1769123134

Yeah, errors of omission are so common that "Errors and Omissions" is a category of professional liability insurance.

ls612 · 2026-01-22T20:23:22 1769113402

Google scholar and the vagaries of copy/paste errors has mangled bibitex ever since it became a thing, a single citation with these sorts of errors may not even be AI, just “normal” mistakes.

jasonfarnon · 2026-01-23T00:15:07 1769127307

agree, I dont find this evidence of AI. It often happened that authors change, there are multiple venues, or I'm using an old version of the paper. We also need to see the denominator. If this google paper had this one bad citation out of 20 versus out of 60.

Also everyone I know has been relying on google scholar for 10+ years. Is that AI-ish? There are definitely errors on there. If you would extrapolate from citation issues to the content in the age of LLMs, were you doing so then as well?

It's the age-old debate about spelling/grammar issues in technical work. In my experience it rarely gets to the point that these errors eg from non-native speakers affect my interpretation. Others claim to infer shoddy content.

andy12_ · 2026-01-23T10:38:22 1769164702

> However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations

Given how stupidly tedious and error-prone citations are, I have no trouble believing that the citation error could be the only major problem with the paper, and that it's not a sign of low quality by itself. It would be another matter entirely if we were talking about something actually important to the ideas presented in the paper, but it isn't.

_alternator_ · 2026-01-22T22:52:05 1769122325

The missing analysis is, of course, a comparison with pre-LLM conferences, like 2022 or 2023 that would show a “false positive” rate for the tool.

anishrverma · 2026-01-22T23:40:37 1769125237

Agreed.

What I find more interesting is how easy these errors are to introduce and how unlikely they are to be caught. As you point out, a DOI checker would immediately flag this. But citation verification isn’t a first-class part of the submission or review workflow today.

We’re still treating citations as narrative text rather than verifiable objects. That implicit trust model worked when volumes were lower, but it doesn’t seem to scale anymore

There’s a project I’m working on at Duke University, where we are building a system that tries to address exactly this gap by making references and review labor explicit and machine verifiable at the infrastructure level. There’s a short explainer here that lays out what we mean, if useful context helps: https://liberata.info/

dexdal · 2026-01-22T23:51:45 1769125905

Citation checks are a workflow problem, not a model problem. Treat every reference as a dependency that must resolve and be reproducible. If the checker cannot fetch and validate it, it does not ship.

nazgul17 · 2026-01-22T21:25:34 1769117134

The thing is, when you copy paste a bibliography entry from the publisher or from Google Scholar, the authors won't be wrong. In this case, it is. If I were to write a paper with AI, I would at least manage the bibliography by hand, conscious of hallucinations. The fact that the hallucination is in the bibliography is a pretty strong indicator that the paper was written entirely with AI.

jmmcd · 2026-01-22T22:50:43 1769122243

Google Scholar provides imperfect citations - very often wrong article type (eg article versus conference paper), but up to and including missing authors, in my experience.

samusiam · 2026-01-23T00:36:08 1769128568

I've had the same experience. Also papers will often have multiple entries in Google Scholar, with small differences between them (enough that Scholar didn't merge them into one entry).

arjvik · 2026-01-22T22:37:52 1769121472

I'm not sure I agree... while I don't ever see myself writing papers with AI, I hate wrangling a bibtex bibliography.

I wouldn't trust today's GPT-5-with-web-search to do turn a bullet point list of papers into proper citations without checking myself, but maybe I will trust GPT-X-plus-agent to do this.

joshvm · 2026-01-23T01:04:54 1769130294

Reference managers have existed for decades now and they work deterministically. I paid for one when writing my doctoral thesis because it would have been horrific to do by hand. Any of the major tools like Zotero or Mendeley (I used Papers) will export a bibtex file for you, and they will accept a RIS or similar format that most journals export.

storystarling · 2026-01-23T03:04:31 1769137471

This seems solvable today if you treat it as an architecture problem rather than relying on the model's weights. I'm using LangGraph to force function calls to Crossref or OpenAlex for a similar workflow. As long as you keep the flow rigid and only use the LLM for orchestration and formatting, the hallucinations pretty much disappear.

nativeit · 2026-01-22T19:21:55 1769109715

I see your point, but I don’t see where the author makes any claims about the specifics of the hallucinations, or their impact on the papers’ broader validity. Indeed, I would have found the removal of supposed “innocuous” examples to be far more deceptive than simply calling a spade a spade, and allowing the data to speak for itself.

j2kun · 2026-01-22T21:31:33 1769117493

The author calls the mistakes "confirmed hallucinations" without proof (just more or less evidence). The data never "speak for itself." The author curates the data and crafts a story about it. This story presented here is very suggestive (even using the term "hallucination" is suggestive). But calling it "100 suspected hallucinations", or "25 very likely hallucinations" does less for the author's end goal: selling their service.

gold23 · 2026-01-22T23:36:17 1769124977

Obviously a post on a startup's blog will be more editorialized than an academic paper. Still, this seems like an important discussion to have.

gowld · 2026-01-22T21:02:18 1769115738

The point is that they should focus on the meaningful errors, not the automiation of meaningless errors.

reliabilityguy · 2026-01-22T21:38:23 1769117903

Why these are meaningless? How do I know now that the whole paper is not a slop?

m-schuetz · 2026-01-22T19:33:10 1769110390

Bibtex are often also incorrectly generated. E.g., google scholar sometimes puts the names of the editors instead of the authors into the bibtex entry.

worik · 2026-01-22T20:04:17 1769112257

> Bibtex are often also incorrectly generated

...and including the erroneous entry is squarely the author's fault.

Papers should be carefully crafted, not churned out.

I guess that makes me sweetly naive

m-schuetz · 2026-01-22T20:59:51 1769115591

That's not happening for a similar reason people do not bug-check every single line of every single third-party library in their code. It's a chore that costs valuable time that you can instead spend on getting the actual stuff done. What's really important is that the scientific contribution is 100% correct and solid. For the references, the "good enough" paradigm applies. They mustn't be complete bogus, like the referenced work not existing at all which would indicate that the authors didnt even look at the reference. But minor issues like typos or rare issues with wrong authors can happen.

bonzini · 2026-01-22T21:35:15 1769117715

To be honest, validating bibliographies does not cost valuable time. Every research group will have their own bibtex file to which every paper the group ever cited is added.

Typically when you add it you get the info from another paper or copy the bibtex entry from Google scholar, but it's really at most 10 minutes work, more likely 2-5. Every paper might have 5-10 new entries in the bibliography, so that's 1 hour or less of work?

daveFNbuck · 2026-01-24T01:18:43 1769217523

The complaint here is about this being an insufficient amount of effort because the bibtex entry from Google Scholar is wrong sometimes.

tuckerman · 2026-01-22T20:56:51 1769115411

I don't think the original comment was saying this isn't a problem but that flagging it as a hallucination from an LLM is a much more serious allegation. In this case, it also seems like it was done to market a paid product which makes the collateral damage less tolerable in my opinion.

> Papers should be carefully crafted, not churned out.

I think you can say the same thing for code and yet, even with code review, bugs slip by. People aren't perfect and problems happen. Trying to prevent 100% of problems is usually a bad cost/benefit trade-off.

miki123211 · 2026-01-22T22:35:00 1769121300

What's the benefit to society of making sure that academics waste even more of their valuable hours verifying that Google Scholar did not include extraneous authors in some citation which is barely even relevant to their work? With search engines being as good as they are, it's not like we can't easily find that paper anyway.

The entire idea of super-detailed citations is itself quite outdated in my view. Sure, citing the work you rely on is important, but that could be done just as well via hyperlinks. It's not like anybody (exclusively) relies on printed versions any more.

daveFNbuck · 2026-01-22T20:50:08 1769115008

You want the content of the paper to be carefully crafted. Bibtex entries are the sort of thing you want people to copy and paste from a trusted source, as they can be difficult to do consistently correctly.

simsla · 2026-01-23T09:44:27 1769161467

Pointing out these errors isn't wrong. But making the leap to "therefore: AI hallucinations!" without substantiating those accusations is.

nearbuy · 2026-01-22T20:11:03 1769112663

The rate here (about 1% of papers) just doesn't seem that bad, especially if many of the errors are minor and don't affect the validity of the results. In other fields, over half of high-impact studies don't replicate.

davidguetta · 2026-01-22T19:19:34 1769109574

Yeah even the entire "Jane Doe / Jame Smith" my first thought is that it could have been a latex default value

There was dumb stuff like this before the GPT era, it's far from convincing

ls612 · 2026-01-22T19:27:29 1769110049

There are people who just want to punish academics for the sake of punishing academics. Look at all the people downthread salivating over blacklisting or even criminally charging people who make errors like this with felony fraud. Its the perfect brew of anti AI and anti academia sentiment.

Also, in my field (economics), by far the biggest source of finding old papers invalid (or less valid, most papers state multiple results) is good old fashioned coding bugs. I'd like to see the software engineers on this site say with a straight face that writing bugs should lead to jail time.

miki123211 · 2026-01-22T22:39:00 1769121540

And research codebases (in AI and otherwise) are usually of extremely bad quality. It's usually a bunch of extremely poorly-written scripts, with no indication which order to run them in, how inputs and outputs should flow between them, and which specific files the scripts were run on to calculate the statistics presented in the paper.

davidguetta · 2026-01-24T22:39:00 1769294340

Codebase can bé of high quality but still you have no idea how they got the paper result

worik · 2026-01-22T20:05:46 1769112346

> I'd like to see the software engineers on this site say with a straight face that writing bugs should lead to jail time.

My hand is up.

I do not believe in gaol, but I do agree with the sentiment.

ls612 · 2026-01-22T20:19:59 1769113199

Let he who is without sin cast the first stone…

girvo · 2026-01-22T21:01:24 1769115684

If there were real consequences, we wouldn't be forced to churn out buggy nonsense by our employers. So we'd be able to take the time to do the right thing. Bug free software is possible, the world just says its not worth it today.

ls612 · 2026-01-22T21:31:02 1769117462

>Bug free software is possible, ...

Mr. Turing and his halting problem would like to politely disagree with this assertion.

ted_dunning · 2026-01-22T23:38:18 1769125098

You misread the comment and DR Turing's paper.

Getting all possible software correct is impossible, clearly. Getting all the software you release is more possible because you can choose not to release the software that it is too hard to prove correct.

Not that the suggestion is practical or likely, but your assertion that it is impossible is incorrect.

ls612 · 2026-01-23T00:08:15 1769126895

If you want to be pedantic I’m pretty sure every single general purpose OS (and thus also the programs running under it) falls into the category of not provably correct so it’s a distinction without a difference in real life.

nativeit · 2026-01-22T19:27:29 1769110049

> Between 2020 and 2025, submissions to NeurIPS increased more than 220% from 9,467 to 21,575. In response, organizers have had to recruit ever greater numbers of reviewers, resulting in issues of oversight, expertise alignment, negligence, and even fraud.

I don’t think the point being made is “errors didn’t happen pre-GPT”, rather the tasks of detecting errors have become increasingly difficult because of the associated effects of GPT.

ctoth · 2026-01-22T19:35:49 1769110549

> rather the tasks of detecting errors have become increasingly difficult because of the associated effects of GPT.

Did the increase to submissions to NeurIPS from 2020 to 2025 happen because ChatGPT came out in November of 2022? Or was AI getting hotter and hotter during this period, thereby naturally increasing submissions to ... an AI conference?

mturmon · 2026-01-22T23:14:01 1769123641

I was an area chair on the NeurIPS program committee in 1997. I just looked and it seems that we had 1280 submissions. At that time, we were ultimately capped by the book size that MIT Press was willing to put out - 150 8-page articles. Back in 1997 we were all pretty sure we were on to something big.

I'm sure people made mistakes on their bibliographies at that time as well!

And did we all really dig up and read Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953)?

Edited to add: Someone made a chart! Here: https://papercopilot.com/statistics/neurips-statistics/

You can see the big bump after the book-length restriction was lifted, and the exponential rise starting ~2016.

dekhn · 2026-01-22T23:19:35 1769123975

I cited Watson and Crick '53 in my PhD thesis and I did go dig it up and read it.

I had to go to the basement of the library, use some sort of weird rotating knob to move a heavy stack of journals over, find some large bound book of the year's journals, and navigate to the paper. When I got the page, it had been cut out by somebody previous and replaced with a photocopied verison.

(I also invested a HUGE amount of my time into my bibliography in every paper I've written as first author, curating a database and writing scripts to format in the various journal formats. This involved multiple independent checks from several sources, repeated several times.

mturmon · 2026-01-22T23:39:25 1769125165

Totally! If you haven't burrowed in the stacks as a grad student, you missed out.

The real challenges there aren't the "biggies" above, though, it's the ones in obscure journals you have to get copies of by inter-library agreements. My PhD was in applied probability and I was always happy if there were enough equations so that I could parse out the French or Russian-language explanation nearby.

bsder · 2026-01-22T23:33:06 1769124786

> And did we all really dig up and read Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller (1953)?

If you didn't, you are lying. Full stop.

If you cite something, yes, I expect that you, at least, went back and read the original citation.

The whole damn point of a citation is to provide a link for the reader. If you didn't find it worth the minimal amount of time to go read, then why would your reader? And why did you inflict it on them?

mturmon · 2026-01-23T09:51:16 1769161876

I meant this more as a rueful acknowledgment of an academic truism - not all citations are read by those citing. But I have touched a nerve, so let me explain at least part of the nuance I see here.

In mathematics/applied math consider cited papers claimed to establish a certain result, but where that was not quite what was shown. Or, there is in effect no earthly way to verify that it does.

Or even: the community agrees it was shown there, but perhaps has lost intimate contact with the details — I’m thinking about things like Laplace’s CLT (published in French), or the original form of the Glivenko-Cantelli theorem (published in Italian). These citations happen a lot, and we should not pretend otherwise.

Here’s the example that crystallized that for me. “VC dimension” is a much-cited combinatorial concept/lemma. It’s typical for a very hard paper of Saharon Shelah (https://projecteuclid.org/journalArticle/Download?urlId=pjm%...) to be cited, along with an easier paper of Norbert Sauer. There are currently 800 citations of Shelah’s paper.

I read a monograph by noted mathematician David Pollard covering this work. Pollard, no stranger to doing the hard work, wrote (probably in an endnote) that Shelah’s paper was often cited, but he could not verify that it established the result at all. I was charmed by the candor.

This was the first acknowledgement I had seen that something was fishy with all those citations.

By this time, I had probably seen Shelah’s paper cited 50 times. Let’s just say that there is no way all 50 of those citing authors (now grown to 800) were working their way through a dense paper on transfinite cardinals to verify this had anything to do with VC dimension.

Of course, people were wanting to give credit. So their intentions were perhaps generous. But in no meaningful sense had they “read” this paper.

So I guess the short answer to your question is, citations serve more uses than telling readers to literally read the cited work, and by extension, should not always taken to mean that the cited work was indeed read.

amitav1 · 2026-01-22T20:34:06 1769114046

I guess the way one would verify that this is more general trend in academia would be to run this on accepted papers to a non-AI conference?

bjourne · 2026-01-23T01:27:22 1769131642

Still a citation to a work you clearly have not read...

beowulfey · 2026-01-23T12:18:51 1769170731

As with anything, it is about trusting your tools. Who is culpable for such errors? In the days of human authors, the person writing the text is responsible for not making these errors. When AI does the writing, the person whose name is on the paper should still be responsible—but do they know that? Do they realize the responsibility they are shouldering when they use these AI tools? I think many times they do not; we implicitly trust the outputs of these tools, and the dangers of that are not made clear.

fmbb · 2026-01-22T21:38:17 1769117897

> So the citation was not fabricated, but it was incorrectly attributed (perhaps via use of an AI autocomplete).

Well the title says ”hallucinations”, not ”fabrications”. What you describe sounds exactly like what AI builders call hallucinations.

j2kun · 2026-01-23T04:58:51 1769144331

Read the article. The author uses the word "fabricate" repeatedly to describe the situation where the wrong authors are in the citation.

janalsncm · 2026-01-23T00:54:29 1769129669

This is par for the course for GPTZero, which also falsely claims they can detect AI generated text, a fundamentally impossible task to do accurately.

ainch · 2026-01-23T17:20:07 1769188807

I'm not going to bat for GPTZero, but I think it's clearly possible to identify some AI-written prose. Scroll through LinkedIn or Twitter replies and there are clear giveaways in tone, phrasing and repeated structures (it's not just X it's Y).

Not to say that you could ever feasibly detect all AI-generated text, but if it's possible for people to develop a sense for the tropes of LLM content then there's no reason you couldn't detect it algorithmically.

janalsncm · 2026-01-23T21:54:48 1769205288

> there's no reason you couldn't detect it algorithmically

For any real world classifier there is a precision/recall tradeoff. Do you care more about false positives or false negatives? If you choose to truly minimize false positives you should simply always predict negative.

For your example “it’s not just X it’s Y” I agree it’s a red flag. But the origin of the pattern is from human text which the LLM picked up on. So some people did (and likely still do) use that construction.

currymj · 2026-01-22T20:16:57 1769113017

the earlier list of ICLR papers had way more egregious examples. Those were taken from the list of submissions not accepted papers however.

bjourne · 2026-01-23T01:13:30 1769130810

Sorry, but blaming it on "AI autocomplete" is the dumbest excuse ever. Author lists come from BibTeX entries and while they often contains errors since they can come from many sources, they do not contain completely made up authors. I don't share your view that hallucinated citations are less damaging in background section. Background, related works, and introduction is the sections where citations most often show up. These sections are meant to be read and generating them with AI is plain cheating.

j2kun · 2026-01-23T05:07:09 1769144829

I'm not blaming anything on anything, because I did not (nor did the authors) confirm the cause of any of these errors.

> I don't share your view that hallucinated citations are less damaging in background section.

Who exactly is damaged in this particular instance?

bjourne · 2026-01-23T13:32:00 1769175120

Trust is damaged. I cannot verify that the evidence is correct only that the conclusions follow from the evidence. I have to rely on the authors to truthfully present their evidence. If they for whatever reason add hallucinated citations to their background that trust is 100% gone.

j2kun · 2026-01-23T17:28:45 1769189325

You are speaking in the abstract. Did you read this paper? I suspect you did not.

lou1306 · 2026-01-23T13:08:02 1769173682

> relatively harmless and minor errors

They are not harmless. These hallucinated references are ingested by Google Scholar, Scopus, etc., and with enough time they will poison those wells. It is also plain academic malpractice, no matter how "minor" the reference is.

David_Osipov · 2026-01-22T21:25:52 1769117152

Great job! I've tried to test their tool as well, but was totally paywalled.

StopDisinfo910 · 2026-01-23T08:51:36 1769158296

The example you provided doesn't sit right me.

If the mistake is one error of author and location in a citation, I find it fairly disingenuous to call that an hallucination. At least, it doesn't meet the threshold for me.

I have seen this kind of mistakes done long before LLM were even a thing. We used to call them that: mistakes.

j2kun · 2026-01-20T19:00:22 1768935622

I saw that huge box of decks they printed for this, and I though, oh dear, how are they going to sell that many copies? :(

jackdoe · 2026-01-20T21:33:48 1768944828

:) I actually printed a lot so the price is cheap, and I could sell for 5$ then I sold them until I recoup the printing cost and donated the rest to schools.

I am thinking of doing a reprint, but tbh shipping is so expensive now, and I there is also USA's tariffs and etc.

lrvick · 2026-01-21T02:55:40 1768964140

I would pay triple to not have to worry about how to print and box these all neatly as you did. Please take my money for this and all your other games.

j2kun · 2026-01-15T22:27:51 1768516071

Make a social app whose goal is to get people off their phone as quickly as possible. There used to be a slew of apps where you a press a button to indicate "I'm bored/free, who wants to hang out?" and then you get matched with your contact list and anyone else who pressed the button at the same time. But for whatever reason they all flamed out and died.

gulugawa · 2026-01-15T23:31:16 1768519876

I think the solution is for the app to advertise public social events where people can make connections and exchange contact information in person.

Allowing random people to message each other without meeting in person is a mistake. The nonverbal cues people get from in person interactions are helpful for discovering shared interested and personality compatibility.

hireshbrem · 2026-01-15T22:34:13 1768516453

one issue is that if you leave it to the free market, you will just get more issues. for monopolies, it's more profitable to keep someone unhealthy (and depressed) than it is to make someone healthy once forever.

kccqzy · 2026-01-15T22:53:57 1768517637

Privacy reasons. You don’t want a VC funded startup to know your contact list and your location (because hanging out in real life requires physical proximity).

j2kun · 2026-01-13T23:58:54 1768348734

I think you should read the article again, because this comment is a straw man vis-a-vis the article.

hu3 · 2026-01-14T00:12:58 1768349578

Is it?

The article starts from the premise that LLMs are only good for vibe-coding.

krapp · 2026-01-14T00:32:59 1768350779

No it doesn't.

It starts from the premise that the author finds LLMs are good for limited, simple tasks with small contexts and clearly defined guidelines, and specifically not good for vibe-coding.

And the author literally mentions that they aren't making universal claims about LLMs, but just speaking from personal experience.

hu3 · 2026-01-14T01:53:13 1768355593

You're offering a very generous interpretation. To the point of extrapolating what's written. Allow me to exemplify:

> I genuinely don't mind if other people vibe code. Go for it!

> But that is not enough for the vocal proponents. It's the future!

The author is okay for others to voice their positive opinion about LLMs as long as it is limited to vibe coding.

It starts defining a gatekeeping threshold of what level of positive opinion is acceptable for others to have, according to the author.

krapp · 2026-01-14T01:55:18 1768355718

Nothing in the text you quoted implies anything of the sort, and you're moving the goalposts.

Good day.

j2kun · 2026-01-13T23:56:45 1768348605

Sounds like you should go bundle sub-prime mortgages into some complex securities, if you like intentionally living on the knife's edge of disaster.

ymyms · 2026-01-14T00:13:39 1768349619

Huh? Where did I say that's what I like? I'm just trying to discuss for discussion's sake. Personally, I want a world that rewards the people who put their thought, care, and craftsmanship into something more than those that don't. In order to live in that world, I think we need to discuss the parts (maybe the whole) that don't and why that might be.

hu3 · 2026-01-14T00:16:18 1768349778

don't bother. Your parent commenter is writing some loaded comments in this post.

j2kun · 2026-01-13T23:55:03 1768348503

How would one set this sort of test up? I surely have example domains where LLMs routinely do poorly (for example, custom bazel rules and workspaces), but what would constitute a "showcase" here?

lijok · 2026-01-13T23:58:12 1768348692

To change my mind I’ll be satisfied with a thorough description of the domain and ideally a theory on why it does poorly in that domain. But we’re not talking LLMs here, we’re talking opus4.5 specifically.

j2kun · 2026-01-14T00:04:45 1768349085

A theory besides... not enough training data? Is it even possible to formulate a coherent theory about this? I'm talking about customizing a widely-used build system, not exactly state-of-the-art cryptography. What could I possibly say that you wouldn't counter with "skill issue" (which goes back to the author's point)?

If you say it's demonstrably impossible that someone can't be made more productive with opus4.5, then it should probably be up to you to demonstrate impossibility.

lijok · 2026-01-14T00:10:02 1768349402

How could it possibly be a skill issue? Have you tried in earnest to use opus4.5 for the problem you’re trying to solve?

Not enough training data couldn’t be the problem - Bazel is not an esoteric domain. Unless you’re trying to do something esoteric.

j2kun · 2026-01-13T23:48:48 1768348128

The author claims it's not just that one evangelizes it, but that they become hostile when someone claims to not have the same experience in response. I don't recall Either Willison or Antirez scaring people by saying they will be left behind or that they are just afraid of becoming irrelevant. Instead they just talk about their positive experiences using it. Willison and Antirez seem to be fine to live and let live (maybe Antirez a bit less, but they're still not mean about it).

ymyms · 2026-01-13T23:50:57 1768348257

My gut says that is not a property of LLM evangelists, but a property of current internet culture in general. People with strong, divisive, and engaging opinions seem to do well (by some definition of well) online.

SchemaLoad · 2026-01-14T01:10:33 1768353033

It's weird how some people seem to treat using an LLM as part of their personality in a borderline cult like way. So someone saying they don't use it or don't find it useful triggers an anger response in them.

_kb · 2026-01-14T01:31:56 1768354316

That is not novel - see language/framework choice, OS (or even distro) preferences, editor wars, indentation. People develop strong opinions about tools, technology, and techniques regardless of domain. LLM maximalists just have the unfortunate capability to generate infinite content about their specific shiny thing.

esoterae · 2026-01-14T19:28:41 1768418921

If a vegan LLM evangelist crossfitter comes up to you at party, which one do they talk about first?

bird0861 · 2026-01-15T10:03:54 1768471434

Rust.

vunderba · 2026-01-14T00:17:28 1768349848

This. For every absurd LLM cheerleader, there’s a corresponding LLM minimalist who trots out the “stochastic parrot” line at every possible occasion along with the fact that they do CrossFit and don’t own a TV.

CjHuber · 2026-01-14T00:11:23 1768349483

I think the actual problem is everyone tries to assert how capable or not coding agents currently are, but how useful they are depends so much on what you are trying to get them to do and also on your communication with the model. And often its hard to tell whether you're just prompting it wrong or if they're incapable of doing it.

By now we at least agree that stochastical parrots can be useful. It would be nice if the debate now was less polarized so we could focus on what makes them work better for some and worse for others other than just expectations.

tim333 · 2026-01-14T11:16:05 1768389365

Antirez:

>Skipping AI is not going to help you or your career. Think about it.

xyzsparetimexyz · 2026-01-14T19:24:31 1768418671

Who knows. Maybe all the AI people will have their skills atrophy and by the time the AI crash happens and none of the models can be run, they don't be employable any more. I'm happy to take that gamble if it saves my conscience.

anonnon · 2026-01-14T22:16:28 1768428988

> the AI crash happens and none of the models can be run

How could this happen with local models already in service?

jraph · 2026-01-15T10:21:14 1768472474

I assume they'd go outdated pretty fast.

codr7 · 2026-01-14T11:50:00 1768391400

Or maybe it will.

Smart guy, but he can speak for himself.

LAC-Tech · 2026-01-14T00:04:32 1768349072

Thanks for clarifying for people.

And yeah, as I laid out in the article (that of course, very few people actually read, even though it was short...), I really don't mind how people make code. It's those that try so hard to convince the rest of us I find very suspect.

InfamousRece · 2026-01-14T01:06:55 1768352815

In my case I don’t even mind if these evangelists try so hard to convince other developers. What I do mind is that they seem to be quite successful in convincing our bosses. So we get things like mandatory LLM usage, minimum number of Claude API calls per day, every commit must be co-authored by Claude, etc.

slow_typist · 2026-01-14T06:32:07 1768372327

That sounds horrible.

xupybd · 2026-01-14T03:21:52 1768360912

I'll convince you one day

j2kun · 2026-01-13T16:19:34 1768321174

I do wish I could have a good, in-depth tutorial on how to set this up myself. Along with (pipe dream) an explanation of how it would interact with my local utility. I worry that due to some silly technicality, I won't be able to export to my local utility, or else I won't be able to run off-grid when there's an outage.

jstsch · 2026-01-13T16:55:02 1768323302

I will do a write-up in a couple of days. It's all relatively simple, you just have to expect terrible documentation and do a bit of reverse engineering and serial sniffing. I expected the battery to be complicated, but it turned out that the inverter was.

You'll encounter stuff like: manual says use RS485 port on Battery for GroWatt inverter → need to use CAN port on Battery. Meter Port (RS485 [serial] over RJ45) wiring on GroWatt is unknown (A: white orange / B: white blue, cross them over). Dinky RS485 serial → USB converter needs a 120ohm resistor between pins for line termination. Growatt meter port expects a SDM630 meter, not a DTSU666 (hardcoded), so vibe code another emulator. DIP switches for RS232 connection need to be both on the ON position (undocumented). CH340 USB→serial converter for RS232 does not work, but one with a Prolific chip does. Etc. etc. etc :)

Oh, and the biggest one... I was expecting to be able to just send a command, 'charge at 500watts', now... 'discharge at 2000watts'. But no. You have to emulate a power meter and the inverter will try to bring the net power to 0. Fun! :)

rmcpherson · 2026-01-13T22:18:45 1768342725

I would appreciate this write up as well. Looking to do a DIY setup.

j2kun · 2026-01-10T00:39:34 1768005574

They are good for a jump start on literature search, for sure.