From sauer at technologists.com Tue May 27 04:45:44 2025 From: sauer at technologists.com (Charles H Sauer (he/him)) Date: Mon, 26 May 2025 13:45:44 -0500 Subject: [COFF] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> Message-ID: <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> TUHS->COFF > > It's like Wikipedia. > > No, Wikipedia has (at least historically) human editors who > supposedly have some knowledge of reality and history. > > An LLM response is going to be a series of tokens predicted based on > probabilities from its training data. ... > > Assuming the sources it cites are real works, it seems fine as a > search engine, but the text that it outputs should absolutely not be > thought of as something arrived at by similar means as text produced > by supposedly knowledgeable and well-intentioned humans. > > An LLM can weigh sources, but it has to be taught to do that.  A human > can weigh sources, but it has to be taught to do that. Before LLMs, Wikipedia, World Wide Web, ... adages such as "Trust, but verify," and "Inspect what you expect," were appropriate, and still are. Dabbling in editing and creating Wikipedia articles has enforced those notions. A few anecdotes here -- I could cite others. 1. I think my first experience was trying in 2008 to fix what is now at https://en.wikipedia.org/wiki/Vulcan_Gas_Company_(1967%E2%80%931970), because the article had so much erroneous content, and because I had worked/performed at that venue 1969-70. Much of what I did in 2008 was accepted without anyone else verifying. But others broke things/changed things, even renamed the original article and replaced it with an article about a newer club that adopted the name. A few years ago, I tried to make corrections, citing poster images at https://concerts.fandom.com/wiki/Vulcan_Gas_Company. Those changes were vetoed because fandom.com was considered unreliable. I copied the images from fandom to https://technologists.com/VGC/, and then citing those images was then accepted by the editors involved. (The article has been changed dramatically, still is seriously deficient, IMO, but I'm not interested in fixing.) 2. Last year, I created https://en.wikipedia.org/wiki/Hub_City_Movers, citing sources I considered reliable. Citations to images at discogs.com were vetoed as unreliable, based on analogous bias against that site. Partly to see what was possible, I engaged with editors, found citations they found acceptable, and ultimately produced a better article. 3. Later last year, I edited https://en.wikipedia.org/wiki/IBM_AIX to fix obviously erroneous discussion of AIX 1/2/3. Even though I used my own writings as references, the changes were accepted. I still use the Web, Wikipedia, and even LLMs, but cautiously. Charlie -- voice: +1.512.784.7526 e-mail: sauer at technologists.com fax: +1.512.346.5240 Web: https://technologists.com/sauer/ Facebook/Google/LinkedIn/mas.to: CharlesHSauer From flexibeast at gmail.com Tue May 27 14:17:45 2025 From: flexibeast at gmail.com (Alexis) Date: Tue, 27 May 2025 14:17:45 +1000 Subject: [COFF] On the unreliability of LLM-based search results In-Reply-To: (George Michaelson's message of "Tue, 27 May 2025 13:08:02 +1000") References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <87frgqequk.fsf@gmail.com> Message-ID: <87a56yelg6.fsf@gmail.com> [Redirected from the TUHS list] George Michaelson writes: >> We're way off topic. Warren should send a kill. > > That said: please don't repeat the "hallucinate" label. It's > self > -aggrandisement. Its deliberate, to foster belief "it's like > thinking" > > It's not a hallucination, it's bad model data and bad constraint > programming. They're not thinking, or dreaming, or demanding not > to be > turned off, or threatening or bullying: They're not Markov > chains > either but they're a damn sight closer to a machine than a mind. Point taken, although i think trying to change that language might be tilting at windwills at this point. Still, i'll try to use alternate phrasing, e.g. "LLMs are known to output nonexistent sources". (Alternative phrasings welcome.) i'd also be interested in an analysis of this: > An artificial intelligence model created by the owner of > ChatGPT has > been caught disobeying human instructions and refusing to shut > itself off, researchers claim. -- https://www.stuff.co.nz/world-news/360701275/openai-software-ignores-explicit-instruction-switch Alexis. From paul.winalski at gmail.com Wed May 28 01:13:41 2025 From: paul.winalski at gmail.com (Paul Winalski) Date: Tue, 27 May 2025 11:13:41 -0400 Subject: [COFF] [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> Message-ID: Wikipedia quite rightly wants citations for stated facts and this practice goes a long way to prevent inaccuracies. But I recall one instance where it actually caused the establishment of a factual error. Some city in Germany had a new mayor elected and someone duly updated the Wikipedia article on the city to reflect this. Another person made a correction--the new mayor's name had been misspelled. The Wikipedia editors rejected the correction, citing a published article in a major newspaper that had the name spelled as it was in the Wikipedia article. After some back-and-forth the spelling correction was eventually corrected. It seems that the newspaper in question had gone to Wikipedia to find out the new mayor's name and so the original Wikipedia misspelling had gotten published in print. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnc at mercury.lcs.mit.edu Wed May 28 06:52:27 2025 From: jnc at mercury.lcs.mit.edu (Noel Chiappa) Date: Tue, 27 May 2025 16:52:27 -0400 (EDT) Subject: [COFF] [TUHS] Re: Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) Message-ID: <20250527205227.D26FC18C073@mercury.lcs.mit.edu> > From: Paul Winalski > Wikipedia quite rightly wants citations for stated facts and this > practice goes a long way to prevent inaccuracies. But I recall one > instance where it actually caused the establishment of a factual error. Not rare; this pattern was the subject of an XKCD strip, "Citogenesis": https://xkcd.com/978/ https://www.explainxkcd.com/wiki/index.php/978:_Citogenesis The latter URL has a very similar story to the one you gave. Noel From halbert at halwitz.org Wed May 28 09:23:04 2025 From: halbert at halwitz.org (Dan Halbert) Date: Tue, 27 May 2025 19:23:04 -0400 Subject: [COFF] [TUHS] Re: Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: <20250527205227.D26FC18C073@mercury.lcs.mit.edu> References: <20250527205227.D26FC18C073@mercury.lcs.mit.edu> Message-ID: On 5/27/25 16:52, Noel Chiappa wrote: > > From: Paul Winalski > > > Wikipedia quite rightly wants citations for stated facts and this > > practice goes a long way to prevent inaccuracies. But I recall one > > instance where it actually caused the establishment of a factual error. > > Not rare; this pattern was the subject of an XKCD strip, "Citogenesis": > > https://xkcd.com/978/ > https://www.explainxkcd.com/wiki/index.php/978:_Citogenesis > > The latter URL has a very similar story to the one you gave. > > Noel I had to go to considerable lengths to fix a misspelling of my father's first name in a short Wikipedia article about my mother (not written by me!). The original article cited a newspaper engagement notice with the misspelling. After some effort, I found another citation that identified him sufficiently in context to use as a cite for his correct name. Dan H. From steffen at sdaoden.eu Thu May 29 00:52:11 2025 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Wed, 28 May 2025 16:52:11 +0200 Subject: [COFF] [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> Message-ID: <20250528145211.xBdhDA8m@steffen%sdaoden.eu> Paul Winalski wrote in : |Wikipedia quite rightly wants citations for stated facts and this practice |goes a long way to prevent inaccuracies. But I recall one instance where |it actually caused the establishment of a factual error. | |Some city in Germany had a new mayor elected and someone duly updated the .. German journalists are often such that they copy. Some Germans still believe in what certain journalists stated on our Emperors "Hun speech" in 1900, even though an actual recording of the speech was found and made public .. at least 20 years, maybe 25 years ago ("a century after"). Everybody believes what everybody wants to believe, anyway. (I have a copy of that speech, if anyone is interested :-) Georgia (Georgien) wanted to introduce that american law on "NGOs", aka "foreign agents" aka "organizations being funded from parties of foreign interests" or whatever its name actually is. In the Wikipedia page it was "the russian law", and this was linked to some site which also read "typical Putin", with lots of bullshit. So i said it is an american law. I was removed. On the discussions page i then also said "typical Putin. Nonsense!" and claimed other very true things against that bullshit mountain of indifferent context-free trivialiation that is the modern world, and they pointed me to a statistica website page entry claiming that ~"Germans love and want NGOs", but then the entire thing was removed after i asked "Who did that. Who was asked. How many were asked." coming to the end saying the common term "Germany .. first of all means 'do not trust a statistik until it was you who falsified it'. That is a criminal offense". This entire thing was then removed with the words "Here it smells like red socks", referring to a decade old (around Y2K) right-wing aka republican propaganda tour against "red socks". (In Germany left wing is red, right wing is black/blue.) (Alongside i cited (the famous and experienced German journalist) Peter Scholl-Latour's "Russland im Zangengriff" ~(Russia in the cramp) from about 2007/2008, but in the foreword it already mentions the famous Putin speech from the Sicherheitskonferenz in München aka Munich which in not small parts is the foundation of what we see today, and he (Scholl-Latour) was part of the "other political side", namely worked for the republicans. Ie that "other political side" is a citation of mine in that discussion.) Btw, in another context, i am with Trump when he says "you should be thankful, you should be thankful", as it all, and much more, was a gift, and it was terribly mistreated. Actually a re-gifted gift, after terrible treatment! Thus twice!! Both!!! (It was a gift from German "dictator" Ludendorff at first. Then Lenin.) Anyhow, and back to the beginning, gangs of young mislead people, likely only little much elder than teenager age, giving themselves reputation, and citing (very much likely) self-produced, but anyhow very much doubtable, and very much anyhow source-reference-less statistics, and elder members removing entire discussion entries. It must be said that the page then was rewritten to exclude this entire mess. Was it the "smells like red socks" claimer? I have not looked. I am satisfied with the resulting exclusion of that bullshit. They now have that american law, by the way, and others in Europe think about introducing it, too, and if it were me, you know, Germany would have it already. On the other hand a contribution to the Battles of the Sommes and the Przewalski horse was taken in agreement, and the latter even caused the German Academic Lady to start an impressive academic sleuthering, resulting in much better DOI link references than the primitive point of view i offered at first (which was falsely removed at first, thus)! The Sommes thing .. dead soldiers are dead soldiers, they all were humans and died, often a very painful death. So the absolute numbers of horror and political failure is what matters, not whether a certain place had higher German numbers than the other; in that war, they, in the first years, still stopped fighting, and came together on the battlefied for christmas! Today that would be a robotic kill; and one-sided truths were used for killing, too. Both is a shame, may the LML write what it wants. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From paul.winalski at gmail.com Fri May 30 00:23:03 2025 From: paul.winalski at gmail.com (Paul Winalski) Date: Thu, 29 May 2025 10:23:03 -0400 Subject: [COFF] [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: <20250528145211.xBdhDA8m@steffen%sdaoden.eu> References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> <20250528145211.xBdhDA8m@steffen%sdaoden.eu> Message-ID: On Wed, May 28, 2025 at 5:16 PM Steffen Nurpmeso wrote: > This entire thing was then removed with the words "Here it smells > like red socks", referring to a decade old (around Y2K) right-wing > aka republican propaganda tour against "red socks". (In Germany > left wing is red, right wing is black/blue.) > Red was also traditionally considered the color of the left wing, especially Communism, here in the US. During the 1950s the start of the Cold War led to a lot of fear of pro-Soviet Communists taking over Us government and private institutions. This was known as the Red Scare. The slogan among the most militant anti-Communists was "better dead than Red". Somehow this color scheme got switched around in the current liberal vs. conservative ideological divide. Red is now the color of the right wing and Blue is the color of the left wing. I don't know why that change happened. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From imp at bsdimp.com Fri May 30 00:48:09 2025 From: imp at bsdimp.com (Warner Losh) Date: Thu, 29 May 2025 08:48:09 -0600 Subject: [COFF] [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results (was: Listing of early Unix source code from the Computer History Museum) In-Reply-To: References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> <20250528145211.xBdhDA8m@steffen%sdaoden.eu> Message-ID: On Thu, May 29, 2025 at 8:23 AM Paul Winalski wrote: > > On Wed, May 28, 2025 at 5:16 PM Steffen Nurpmeso wrote: >> >> This entire thing was then removed with the words "Here it smells >> like red socks", referring to a decade old (around Y2K) right-wing >> aka republican propaganda tour against "red socks". (In Germany >> left wing is red, right wing is black/blue.) > > > Red was also traditionally considered the color of the left wing, especially Communism, here in the US. During the 1950s the start of the Cold War led to a lot of fear of pro-Soviet Communists taking over Us government and private institutions. This was known as the Red Scare. The slogan among the most militant anti-Communists was "better dead than Red". > > Somehow this color scheme got switched around in the current liberal vs. conservative ideological divide. Red is now the color of the right wing and Blue is the color of the left wing. I don't know why that change happened. TV Coverage. In the 76 election, some networks used blue/red and others used red/blue. This cause a lot of confusion on election night, so they got together and decided on red - GOP, blue Dem, likely in an anti-nod to communism. Everybody know the GOP wasn't communists, but they liked to occasionally lob that attack at the democrats. By having red GOP / blue Dem, the networks made a conscious choice to not reinforce the GOP charge... Warner From coff at tuhs.org Fri May 30 00:42:25 2025 From: coff at tuhs.org (Chet Ramey via COFF) Date: Thu, 29 May 2025 10:42:25 -0400 Subject: [COFF] [TUHS] Wikipedia anecdotes - LLM generalizations [was On the unreliability of LLM-based search results In-Reply-To: References: <769a9c94-055d-4bdd-a921-3e154c3b492f@infinitecactus.com> <71936198-93c1-41bc-8a5f-41d95969da0c@technologists.com> <20250528145211.xBdhDA8m@steffen%sdaoden.eu> Message-ID: <0b4b5f13-9e45-4107-9904-86f6f238983f@case.edu> On 5/29/25 10:23 AM, Paul Winalski wrote: > Somehow this color scheme got switched around  in the current liberal vs. > conservative ideological divide.  Red is now the color of the right wing > and Blue is the color of the left wing.  I don't know why that change happened. Gradual media conversion on those colors, finally cementing after CNN ran a map after the 2000 election that had Republican states in red. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/