NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Update on KDP Title Creation Limits (kdpcommunity.com)
japhyr 12 hours ago [-]
It'll be interesting to see where this goes. Amazon has had ML-generated garbage books for years now, and I assume they haven't taken them down because they make money even when they sell garbage.

Maybe there's so much garbage coming in now that they finally have to do something about it? I feel for people trying to learn about technical topics, who aren't aware enough of this issue to avoid buying ML-generated books with high ratings from fake reviews. The intro programming market is full of these scam books.

m463 11 hours ago [-]
I was thinking about buying an air fryer. My search came up with cookbooks specific to that air fryer, and I was intrigued. I found a good 5-star book, but then I found that ALL the 5-star reviews were submitted the same day.

I complained, but Amazon defended the book as legitimate, and since I hadn't purchased it, they would not take any action. (to be honest, I assume frontline customer service reps don't have much experience or power)

So I purchased it, complained, got a refund and then they were able to accept my complaint (after passing the complaint higher in the food chain).

Seriously, how hard was it amazon? I guess they're starting to notice.

Take a look at air fryer cookbooks - there are books specific to most makes and models. But everything is ML copypasta all the way up and down - the title, the recipes and the reviews all seem to be generated garbage.

japhyr 11 hours ago [-]
I'm the author of Python Crash Course, the best selling introductory Python book for some time now. Years ago, someone put out a book listing two authors: Mark Matthes and Eric Lutz. That's just a simple juxtaposition of my name and Mark Lutz, the author of O'Reilly's Learning Python. The subtitle is obviously taken from my book's subtitle as well. I assume the text is an ML-generated mess, but I haven't bought a copy to verify that.

I used to comment on reviews for books like these explaining what was happening, but Amazon turned off the ability to comment on reviews a long time ago.

I've spoken with other tech authors, and almost all of us get emails from people new to programming who have bought these kinds of books. If you're an experienced programmer, you probably know how to recognize a legitimate technical book. But people who are just starting to learn their first language don't always know what to look for. This is squarely on Amazon; they have blocked most or all of the channels for people to directly call out bad products, and they have allowed fake reviews to flourish and drown out authentic reviews.

rmbyrro 10 hours ago [-]
Why don't beginners start at Python.org, though? It's such a great resource to learn the language.

- it's free, unlike books

- always up-to-date, unlike even the best book after a few months

- easy to choose: heck, there's only one official documentation! No chance of making a mistake here!

japhyr 8 hours ago [-]
Many beginners do start at python.org. However, if you don't know anything about programming, and you don't know someone who can answer all the little questions that come up, it's really hard to learn from documentation alone. Even the official Python tutorial is fairly inaccessible to many people who are trying learn a language for the first time.

Almost every Python author I've spoken with recognizes that no one resource works best for everyone. We each write to offer our particular take on a subject, and hope to find an audience that our perspective resonates with. I've never steered people away from documentation; in fact one of my goals is to steer people to the sections of documentation that they're ready to make sense of. One of my end goals is that people no longer need me as a teacher. That was my goal as a classroom teacher, and it's one of my goals as an author.

The idea that there are no mistakes in official documentation is pretty unrealistic. Technical documentation has certainly improved over the last decade or so, but it will never be perfect. Most of us recognize that some areas of programming are better handled by third party libraries. In a similar way, there will always be room for learning resources that are maintained outside of official documentation sources.

rmbyrro 2 hours ago [-]
I didn't claim the official docs have no mistakes.

Since there's only one documentation, beginners can't get wrong with which docs to use.

Ad opposed to books, which have tons of bad choices available (hence the current discussion).

Armisael16 10 hours ago [-]
Are you suggesting people just go read the documentation like an encyclopedia? I don’t know a single person who got their start programming by doing that - just about everyone wants some sort of guide to help lead them in good directions.
skydhash 9 hours ago [-]
I did. On Windows, Python had (still have?) a good offline help. And it included a nice getting started tutorial. The only book I had was “The C Programming Language”. But they ignited my interest enough to start researching, and I landed on the "Site du Zero" (now OpenClassrooms) platform. The web was sparser, but better, in these days (2010).
jacquesm 9 hours ago [-]
That's more or less exactly how I learned to program. From books, with a few friends. Only after it got to a certain level and I started frequenting more places where we met other people working with computers some of which were professional programmers.

I still have some of them. They've aged surprisingly well.

Dylan16807 3 hours ago [-]
> That's more or less exactly how I learned to program. From books

What kind of books? The person you're replying to is arguing in favor of books, but saying that the documentation in particular is not a good one to start with.

NavinF 8 hours ago [-]
Do the official docs even have tutorials? I'd send beginners to Khan Academy instead.
asicsp 6 hours ago [-]
Yeah, https://docs.python.org/3/tutorial/index.html. But I would say it is good for those who already know another programming language and not for complete beginners.
rmbyrro 10 hours ago [-]
I guess book authors don't like my perspective...
nektro 10 hours ago [-]
i stopped frequenting the dev.to community because the average quality of articles just got so low it stopped being worth my time
arrowsmith 10 hours ago [-]
dev.to is blocked on HN for this reason (try submitting a dev.to link; it won't appear under New.)

There's an old thread where dang explains that it's blacklisted (along with many many other sites) due to the consistently poor article quality.

quickthrower2 10 hours ago [-]
Conversely if you post something sophisticated there it will likely bomb. A bunch of emojis and explaining JS closures for the hundredth time. Does well!
ocdtrekkie 10 hours ago [-]
I think the best way to recognize a legitimate tech book is... visit a Barnes and Noble. If it's a publisher or series you can find printed on the shelf, books are legit.

Unfortunately online market "platforms" are pretty much widely untrustworthy for any sort of informational purposes.

mschuster91 9 hours ago [-]
> If it's a publisher or series you can find printed on the shelf, books are legit.

Not even that is a guarantee, there have been cases of rip-offs making it through a bunch of book-on-demand services.

All "marketplaces" allowing third parties unlimited, unmonitored access to product listings suffer from that issue.

failTide 10 hours ago [-]
also, just doing your research on any platform other than Amazon helps.
hinkley 11 hours ago [-]
Ugh. I hate the, "You're not a customer yet so our CRM system won't let me talk to you."

And what happens when my problem is that your system won't let me place an order?

blululu 11 hours ago [-]
False Negatives and False Positives are always connected. On the other side of the equation, there are plenty of bad actors who will casually flag their competitors to score a quick win. Crime doesn't like to go uphill - raising the stakes for feedback lowers the prevalence of bad actors.
me_again 11 hours ago [-]
I think that's a different issue. Amazon has thorny problems with takedowns. Company A trying to get rival company B's listing taken down probably happens 100's of times a day. I believe Amazon uses "proof of purchase" kinda like a CAPTCHA or proof of work - an extra hoop to jump through to reduce the volume of these things they have to adjudicate.
hinkley 3 hours ago [-]
It should be a term of service that you’re not allowed to interfere with other customer’s listings.

If I found out one of the tenants on my multi tenant system was trying to mess with another’s, I would be livid.

indymike 11 hours ago [-]
CRM should never mean Sales Prevention as a Service.
hinkley 11 hours ago [-]
The great thing about filtering is that you don't have to hear the screams.

These accidents play out in slow motion until someone corners you at a family reunion and asks why their friends can't create accounts and when you ask them how long they say "months".

nanidin 5 hours ago [-]
You'd think... but in a growing b2b company, the CRM is where sales get prevented under a certain threshold. heh.
onlyrealcuzzo 11 hours ago [-]
> Seriously, how hard was it amazon? I guess they're starting to notice.

It's not hard. It's a cost center, and they're in the business of making money - not providing the best service.

kristopolous 10 hours ago [-]
They're biggest risk has always been the perception they peddle fraudulent simulacrums of worthy products.
tiew9Vii 10 hours ago [-]
It’s the same across all big tech. The size/volume for complaint handling doesn’t scale. It’s either filtered out by some machine learning algorithm or some poor person in a 3rd world country getting paid next to nothing who reviews the complaints so quality isn’t of importance.

There been a recent influx of scammers on Facebook local groups. Air con cleaning, car valeting, everyone’s calling out the scammers in the comments yet when you click report to FB the response is we have reviewed the post and it has not breached our guidelines, would you like to block the user.

nanidin 11 hours ago [-]
If I don't get where I want to be with the front door customer service within a decent amount of time, I have always had good success contacting jeff@amazon.com. Their executive support team gets back quickly via email or phone and they really seem to care.
miohtama 11 hours ago [-]
Garbage books are used for money laundering.

You buy books using stolen credit cards and such.


hinkley 11 hours ago [-]
I wonder if that means the Feds made a phone call to Jeff on his private line and said we need to have a little chat.

We can track money laundering when there are X fake books. We can't when there are 10X fake books.

harles 12 hours ago [-]
> … I assume they haven't taken them down because they make money even when they sell garbage.

I’d be surprised if this is the case. The money they make is probably a rounding error compared even just to other Kindle sales. Much more likely is that they haven’t seen it as a big enough problem - and I’m willing to bet it’s increased multiple orders of magnitude recently.

throe37848 12 hours ago [-]
I knew guy who made "generated" text books in 2010. He would absorb several articles, and loosely stitch them into chapters with some computer scripts and from memory. In a week he would produce 400 pages on new subject. It was mostly coherent and factual (it kept references). Usually it was the only book on market about given subject (like rare disease).

Current auto generated garbage is very different.

velcrovan 7 hours ago [-]
For several years now, Amazon KDP will block books whose content is already available on the web. I have printed a few books whose content was either CC-BY or public domain due to its age, and in each case my book was automatically blocked in the early stages. I had to submit an appeal that was reviewed by a person in order to proceed.
franze 11 hours ago [-]
explains the CouchDB Book from OReily from that time.
quickthrower2 10 hours ago [-]
Do people still use CouchDB? Blast from the past!
plagiarist 11 hours ago [-]
I wouldn't even consider that generated. That's like where useful content and copyright infringement overlap on a Venn diagram.
Karellen 11 hours ago [-]
> That's like where useful content and copyright infringement overlap on a Venn diagram.

That sounds like a description of LLM-generated content to me ;-)

delecti 10 hours ago [-]
LLMs only ever accidentally generate useful content. They fundamentally can't know whether the things they're outputting are true, they just tend to be, because the training data also tends to be.
mortureb 11 hours ago [-]
In my opinion, all we learn over time is that we need gatekeepers (publishing houses in this case). The general public is a mess.
chongli 10 hours ago [-]
I think what we’re seeing here is a symptom of the broader and more fundamental problem of trust in society. We’ve gone from a very high trust society to a very low trust society in just a few decades. We, as technology people, keep searching (desperately) for technical solutions to social problems. It’s not working.
pixl97 10 hours ago [-]
Because technology never was the solution for social problems, it's a solution to the few people getting very rich problem.
barrysteve 10 hours ago [-]
The standards for filtering internet data have dropped badly.

Amazon and Google both abuse their filtering systems on a daily basis to effect social change.

We need new companies built with policies to keep the filtering systems rigid, effective and unchanging. We need filterkeepers.

mortureb 10 hours ago [-]
I’m good with Amazon and Google over some unknown. I don’t want some right wing shit to be my gatekeepers.
barrysteve 10 hours ago [-]
Yay, politics in my business soup. That'll generate a quality outcome for my customers!


The politics are ephemeral, the results matter.

mortureb 10 hours ago [-]
Human decency transcends your asinine, barely disguised political talking points.
barrysteve 5 hours ago [-]
You may consider it asisine and political, but the results are measurable. Amazon delivers AI generated whatever and a better filter system doesn't do that.

Politic and insult to your hearts content.

The results are quantifiable and qualitatively measurable. You'know, like, science.

What do you want me to say?

If Amazon can fix the flood of garbage, then good, I don't care. I'll shut up.

All the politics is coming from your side of the table, and all the discussion about measurable results is coming from my side of the table. Whatever happens, let's make that distinction razor sharp and clear.

VancouverMan 10 hours ago [-]
Such systems just result in content that is terribly bland, or worse, intentionally limited to push specific political narratives.

I'd rather have a much more diverse and interesting set of content to choose from, even if some of it might not be to my liking, and even if I'd have to put some effort into previewing or filtering before I find something I want to consume.

ozfive 10 hours ago [-]
Some people value their time, energy, and money more. I can appreciate that you do not as we all have choices but I imagine that most people would disagree.
VancouverMan 4 hours ago [-]
> Some people value their time, energy, and money more.

More highly "curated" media providers have almost always been the least-efficient, most-costly, and least-satisfying for me.

Buying physical books at a bookstore has typically been a costly waste of time, with the selection being poor, and it requiring time, money, vehicle wear, etc., to actually get to the store.

Public libraries are often worse in terms of selection, and thanks to the ones where I am being funded via taxation, I'm stuck paying for them even if I don't use them.

Online and ebook sellers are somewhat better, although they can still be costly, and the delivery of physical books can take some time.

I've had much better success finding fiction and non-fiction content by doing some searches and seeing which random websites, forums, and other less-"curated" online resources I happen to run across.

It has been the same for video media, too.

OTA TV is relatively cheap, but the selection is so limited as to make it useless.

Cable and satellite TV have upfront costs, and then ongoing costs, plus a relatively limited selection of content available at any given time.

Paid online streaming providers have a cost, obviously, and I've found the selection to be quite poor.

Movie theatres are extremely costly for what you get, have a tremendously limited selection, and also involve significant travel and time costs.

Tape and disc rentals no longer exist today where I am, aside from public libraries. They had per-rental costs, late fees, travel costs, and very limited selection. As stated before, I pay for the library even if I don't use it.

YouTube, on the other hand, gives me a much better experience than the more "curated" providers. With just a minute or two of searching, I can find hours and hours worth of content to watch each evening, I can view this content with almost no delay, the cost is minimal, and the content is far more entertaining and informative than the more "curated" options.

Avoiding "curated" media providers has saved me a lot of time, energy, and money, in addition to providing me with much more enjoyable and useful content.

tetrep 11 hours ago [-]
> Maybe there's so much garbage coming in now that they finally have to do something about it?

It seems like this is preventative action rather than reactionary, as they say that there hasn't been an increase in publishing volume, "While we have not seen a spike in our publishing numbers..."

ehsankia 10 hours ago [-]
I thought it was more so filled with low quality mechanical turk garbage books.
gz5 12 hours ago [-]
I think we will see tidal waves of 'not-so-good' AI-generated content. Not that AI can't generate or help generate 'good' content, but it will be faster and cheaper to generate 'not-so-good'.

These waves will mainly be in places in which we are the product. And those waves could make those places close to uninhabitable for folks who don't want to slosh through the waves of noise to find the signal.

And in turn that perhaps enables a stronger business model for high quality content islands (regardless of how the content is generated) - e.g. we will be more willing to pay directly for high quality content with dollars instead of time.

In that scenario, AI could be a_good_thing in helping to spin a flywheel for high quality content.

rwmj 11 hours ago [-]
Assuming not too many people die eating mushrooms while we're waiting: https://www.theguardian.com/technology/2023/sep/01/mushroom-...
hinkley 11 hours ago [-]
Common foraging rhetoric is that you need two independent sources asserting that a wild food is edible. Ones that cite neither each other or the same chain of citations. And preferably a human who says, "I've been eating these for years and no problems." or scientists who did recent blood work to make sure you aren't destroying your organs by eating [1].

In a world with fake books, it would be quite easy for two books to contain the same misinformation or mis-identification (how many times have I found the wrong plant in a google image search? More times than I care to count). Two fake books putting the wrong mushroom picture next to a mushroom because they were contiguous on some other page and you have dead people.

[1] In the ten years since I started working with indigenous plants, wild ginger (asarum caudatum), has gone from quasi-edible to medicinal to don't eat. More studies show subtler wear and tear on the organs (wikipedia lists it as carcinogenic!) and it is recommended now that you don't eat them at all, even for medicinal purposes. I'm not sure I own a foraging or native species book younger than 5 years, and many are older.

eindiran 5 hours ago [-]
Damn had no idea about wild ginger. That is a bummer.
omnicognate 11 hours ago [-]
Except they shouldn't be islands. Unify/standardise the payment mechanism, make it frictionless and only for content consumed. There's no technical reason you shouldn't see an article on hn or wherever, follow the link and read it and pay for it without having set up and pay for a subscription for the entire publication or jump through hoops. It should be a click at most.

There will always be a place for subscriptions, but people want the hypertext model of just following a link from somewhere and there is absolutely no technical reason for that to be incompatible with paying for content. The idea that ads are the only way to fund the web needs to be challenged, and generative AI might just provide the push for that to finally happen.

Or maybe there will be no such crisis and it'll just make the whole thing even more exploitative and garbage-filled.

munificent 10 hours ago [-]
> There's no technical reason you shouldn't see an article on hn or wherever, follow the link and read it and pay for it without having set up and pay for a subscription for the entire publication or jump through hoops. It should be a click at most.

People have been saying this and building startups on this and having those startups crash and burn for decades.

It's not a technical problem. It's a psychology problem.

Paying after you've read an article doesn't provide the immediate post purchase gratification to make it an inpulse purchase [0]. The upside of paying for an article you've already read is more like a considered purchase [1]. But the amount of cognitive effort worth putting into deciding whether or not to pay for the article is often less than the value you got from the article itself. So it's very hard for people to force themselves to decide to commit to these kinds of microtransactions. See also [2].

It's just a sort of cognitive dead zone where our primate heuristics don't work well for the technically and economically optimal solution. It's sort of like why you can't go into a store and buy a stick of gum.

[0]: https://en.wikipedia.org/wiki/Impulse_purchase

[1]: https://en.wikipedia.org/wiki/Considered_purchase

[2]: https://en.wikipedia.org/wiki/Bounded_rationality

omnicognate 3 hours ago [-]
I'm a bit confused here. I never said the click would be after reading the article. You would need to pay to read.

Edit: Ah, I did say

> see an article on hn or wherever, follow the link and read it and pay for it

That wasn't supposed to be a chronological sequence of events, but I see I accidentally implied that. Apologies for the confusion.

pixl97 10 hours ago [-]
>It should be a click at most.

Welcome to new and interesting ways to defraud people over the internet for money school of thought.

At least with Amazon it's a "one and done shop" of who I spent my money with when I bought something.

Imagine tomorrow with your click to pay for random links on the internet you suddenly have 60,000 1 cent charges. They all appear to go different places and to get a refund you need to challenge each one.

bobthepanda 10 hours ago [-]
It sounds like the digital version of the CD scam. https://viewing.nyc/nyc-scams-101-dont-get-fooled-by-the-cd-...
omnicognate 2 hours ago [-]
I think you're imagining this would be open to random individual bloggers, but that wouldn't solve the quality / clickbait / AI generation problem. Sure, individuals could scam, but they could also produce clickbait, low effort crap.

The context of this discussion is the high quality, paid, edited writing that is currently behind site-wide subscription paywalls at sites like the New York Times, Wall Street Journal, Financial Times, Economist, etc. It would be great to lower the barrier to entry for individual writers as far as possible, and maybe even include some sites that are run more like blogging platforms, but there would always have to be content standards and some degree of editorial control for reasons other than avoidance of scams, and with those things in place avoidance of scams is a non-issue because you're dealing with organisations that are trading on reputation. The New York Times isn't going to be defrauding its readers (and neither is Medium if it comes to that).

Supply5411 12 hours ago [-]
While not exactly the same, the invention of the printing press caused a lot of controversy with the Catholic Church. With the printing press, people could mass produce and spread information relatively easily. I'm sure a lot of it was considered "low quality" (also heretical)[1]. Seems like we're going through similar growing pains now. Yes I know it's different, but it rhymes.

1. https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum

dougmwne 12 hours ago [-]
I really dislike the comparison. The printing press democratized knowledge. The LLM destroys it. LLM output is perfect white noise. Enough of it will drown out all signal. And the worst part is that it’s impossible to distinguish it from real human output.

I mean think about it. Amazon had to stop publishing BOOKS because it can no longer separate the signal from the noise. The printing press was the birth of knowledge for the people and the LLM is the death.

ben_w 10 hours ago [-]
> LLM output is perfect white noise.

Not even close to white noise. White noise, in the context of the token space, looks like this:

auceverts exceptionthreat."<ablytypedicensYYY DominicGT portaelight\- titular Sebast Yellowstone.currentThreadrition-zoneocalyptic

which is literally the result of "I downloaded the list of tokens and asked ChatGPT to make a python script to concatenate 20 random ones".

No, the biggest problem with LLMs is that the best of them are simultaneously better than untrained humans and yet also nowhere near as good as trained humans — someone, don't remember who, described them as "mansplaining as a service", which I like, especially as it (sometimes) reminds me to be humble when expressing an opinion outside my domain of expertise, as it knows more than I do about everything I'm not already an expert at.

Specific example: I'm currently trying to use ChatGPT-3.5 to help me understand group theory, because the brilliant.org lessons on that are insufficient; unfortunately, while it knows infinitely more than I do about the subject, it is still so bad it might as well be guessing the multiple choice answers (if I let it, which I don't because that would be missing the point of using a MOOC like brilliant.org in the first place).

vosper 12 hours ago [-]
> The printing press democratized knowledge

That's true, but it also allowed protestant "heretics" to propagate an idea that caused a permanent schism with the Catholic church, which led to centuries of wars that killed who-knows-how-many people, up to recent times with Northern Ireland.

(Or something like that, my history's fuzzy, but I think that's generally right?)

11 hours ago [-]
bbarnett 11 hours ago [-]
I thought it was a king wanting a divorce, and as he couldn't get it from the catholic church, created his own.
mmcdermott 11 hours ago [-]
Henry VIII created the Church of England in 1534 for the purposes of granting himself an annulment. Most histories count Martin Luther's 95 Theses as beginning of the Reformation in 1517 (a crisp date for a less-than-crisp event; Luther did not originally see himself as protesting the Roman Catholic Church). The Protestant Reformation was a heterogeneous movement from the beginning.
TRiG_Ireland 11 hours ago [-]
Not really, no. It was Luther who kick-started Protestantism. Henry VIII attempted to supplant the Pope, and kind of slid into Protestantism by accident.
vladms 10 hours ago [-]
That was the case just for the anglican church, which is only one "part" of the reformation.
verve_rat 10 hours ago [-]
Protestantism started in Germany with Martin Luther nailing his theses to a church door. Henry's reproductive problems came later and where only sort of related.
mrighele 10 hours ago [-]
> Amazon had to stop publishing BOOKS because it can no longer separate the signal from the noise.

That's because they are trying very hard not to check what they are selling, hoping that their own users and a few ML algorithms can separate the signal from the noise for them. It seems to me that the approach is no longer working, and they should start doing it by themselves.

rockemsockem 11 hours ago [-]
I really feel like you can't have used any advanced LLMs if you legitimately think the out put "perfect white noise". The results that you can get from an LLM like GPT-4 are incredibly useful and are providing an enormous amount of value to lots of people. It isn't just for generating phony information to spread or having it do your work for you.

I get the most value out of asking for examples of things or asking for basic explanations or intuitions about things. And I get so much value from this that I really think the printing press is the most apt comparison.

softg 11 hours ago [-]
The problem is advanced LLMs are controlled by large corporations. Powerful local models exist (in part thanks to Meta's generosity oddly enough) and they're close to GPT-3.5, but GPT-4 is far ahead of them and by the time other models reach to that point whatever OpenAI or Antropic, Meta etc. have developed behind closed doors could be significantly better. In that case open models will be restricted to niche uses and most people will use the latest model from a giant corp.

So it is possible that LLMs will centralize the production and dissemination of knowledge, which is the opposite of what people think the printing press did. I hope I'm wrong and open models can challenge/overtake state of the art models developed by tech giants, that would be amazing.

courseofaction 11 hours ago [-]
Precisely. I spent weeks learning about cybersecurity when GPT-4 first came out, as I could finally ask as many stupid questions as I liked, get detailed examples and use-cases for different attacks and defenses, and generally actually learn how the internet around me works.

Now it refuses, because OpenAI's morals apparently don't include spreading openly available knowledge about how to defend yourself.

Scary. I have also been using it to generate useful political critiques (given a particular theoretical tradition, some style notes, and specific articles to critique, it's actually excitingly good). What if OpenAI decides that's a threat? What reason do we have to think that a powerful institution would not take this course of action, in the cold light of history?

blibble 11 hours ago [-]
how do you know what you learnt wasn't completely made up gibberish?
Supply5411 10 hours ago [-]
The same way you know that the things you learn from a person isn't made up gibberish: You see how well it explains a scenario, how well it lines up with your knowledge and experience, and you sample parts to verify.
blibble 9 hours ago [-]
how do you know they didn't learn from the garbage generator too?
SargeZT 7 hours ago [-]
You are literally describing the fundamental problem of truth in philosophy and acting as if it's different because a computer is involved at one step in the chain.
mostlylurks 10 hours ago [-]
What you say is not in conflict with AI-generated content being white noise. Even if you find some piece of AI-generated content useful, it is still white noise if it is merely combining pieces of information found in its dataset and the result is posted online or published elsewhere. There is no signal being added in that process, and it pollutes the space of content. Humans are also prone to doing this, but with the help of AI, it becomes a much larger issue.

"Signal" would mean new data, which is by definition not possible via LLMs trained on publicly available content, since that means the data is already out there, or new and meaningful ideas or innovations beyond just combining existing material. I have not seen LLMs accomplish the latter. I consider it at least possible that they are capable of such a feat, but even then the relevant question would be how often they produce such things compared to just rearranging existing content. Is the proportion high enough that unleashing floods of AI-generated content everywhere would not lower the signal-to-noise ratio from the pre-AI situation?

rmbyrro 11 hours ago [-]
> the worst part is that it’s impossible to distinguish it from real human output

Doesn't that make human content look bad in the first place?

If we can't distinguish a Python book written by a human engineer or by ChatGPT, how can we demonstrate objectively that the machine-generated one is so much worse?

mostlylurks 10 hours ago [-]
That argument might work for content which serves a purely informational purpose, such as books teaching the basics of programming languages, for instance, but it doesn't work for art (e.g. works of fiction) because most of the potential for a non-superficial reading of a work relies on being able to trust that there is an author that has made a conscious effort to convey something through that work, and that that something can be a non-obvious perspective on the world that differs from that of the reader. AI-generated content does not have any such intent behind it, and thus you are effectively limited to a superficial reading, or if were to instist on assigning such intent to AI, then at most you would have one "author" per AI model, which additionally has no interesting perspectives to offer, simply those perspectives deemed acceptible in the culture of whatever group of people developed the model, no perspective that could truly surprise or offend the reader with something they had not yet considered and force them to re-evaluate their world view, just a bland average of their dataset with some fine tuning for PR etc. reasons.
jameshart 6 hours ago [-]
We can distinguish it. That's what publishers and editors do. It's also what book buyers for book chains used to do. Reviewers, writing for reputable publications, with their own editors and publishers, as well.

Humans, examining things, and putting a reputation that matters on the line to vouch for it.

The fact that Amazon doesn't want to have smart, contextually aware humans look at and evaluate everything people propose to offer up for sale on their storefront doesn't mean it can't be done. Same as how Google doesn't want to look at every piece of content uploaded to YouTube to figure out if it's suitable for kids, or includes harmful information. That's expensive, so they choose not to do it.

Nashooo 11 hours ago [-]
The problem is not that no one can distinguish it. It's that the intended audience (beginners in Python in your example) can't distinguish it and are not able to easily find and learn from trusted sources.
rmbyrro 11 hours ago [-]
Aren't there already bad Python books written by humans?

I bet ChatGPT can come up with above-average content to teach Python.

We should teach beginners how to prompt engineer in the context of tech learning. I bet it's going to yield better results than gate-keeping book publishing.

nneonneo 10 hours ago [-]
There are, but it used to take actual time and effort to produce a book (good or bad), meaning that the small pool of experts in the world could help distinguish good from bad.

Now that it’s possible to produce mediocrity at scale, that process breaks down. How is a beginner supposed to know whether the tutorial they’re reading is a legitimate tutorial that uses best practices, or an AI-generated tutorial that mashes together various bits of advice from whatever’s on the internet?

rmbyrro 10 hours ago [-]
Personally I don't subscribe to the "best practices" expression. It implies an absolute best choice, which, in my experience, is rarely sensible in tech.

There are almost always trade-offs and choosing one option usually involves non-tech aspects as well.

Online tutorials freely available very rarely follow, let's say, "good practices".

They usually omit the most instructive parts, either because they're wrapped in a contrived example or simplify for accessibility purposes.

I don't think AI-generated tutorials will be particularly worse at this to be honest...

rmbyrro 10 hours ago [-]
Another great contribution would be fine-tuning open source LLMs on less popular tech. I've seen ChatGPT struggling with htmx, for example (I presume the training dataset was small?), whereas it performs really well teaching React (huge training set, I presume)
emporas 10 hours ago [-]
If beginners in Python programming are not capable of visiting python.org, assuming they are genuinely interested in learning Python, it would be very questionable how good their knowledge on the subject can really be.
rmbyrro 10 hours ago [-]
100% agreed.

I've seen many developers using technologies without reading the official documentation. It's insane. They make mistakes and always blame the tech. It's ludicrous...

courseofaction 11 hours ago [-]
The LLM does democratize knowledge, but you have to be the user of the LLM, not the target of the user of the LLM.

The LLM is the most powerful knowledge tool ever to exist. It is both a librarian in your pocket. It is an expert in everything, it has read everything, and can answer your specific questions on any conceivable topic.

Yes it has no concept of human value and the current generation hallucinates and/or is often wrong, but the responsibility for the output should be the user's, not the LLM's.

Do not let these tools be owned, crushed and controlled by the same people who are driving us towards WW3 and cooking the planet for cash. This is the most powerful knowledge tool ever. Democratize it.

shitloadofbooks 11 hours ago [-]
Asking a statistics engine for knowledge is so unfathomable to me that it makes me physically uncomfortable. Your hyperbolic and relentless praise for a stochastic parrot or a "sentence written like a choose your own adventure by an RNG" seems unbelievably misplaced.

LLMs (Current-generation and UI/UX ones at least) will tell you all sorts of incorrect "facts" just because "these words go next to each other lots" with a great amount of gusto and implied authority.

Supply5411 10 hours ago [-]
My mind is blown that someone gets so little value out of an LLM. I get over software engineering stumbling blocks much faster by interrogating an LLM's knowledge about the subject. How do you explain that added value? Are you skeptical that I am actually moving and producing things faster?
lxgr 10 hours ago [-]
My mind is also blown by how much people seemingly get out of them.

Maybe they’re just orders of magnitude more useful at the beginning of a career, when it’s more important to digest and distill readily-available information than to come up with original solutions to edge cases or solve gnarly puzzles?

Maybe I also simply don’t write enough code anymore :)

Supply5411 10 hours ago [-]
I'm very far from the beginning of my career, but maybe I see a point in your comment, because I frequently try technologies that I am not an expert in.

Just yesterday, I asked if Typescript has the concept of a "late" type, similar to Dart, because I didn't want to annotate a type with "| null" when I knew it would be bound before it was used. Searching for info would have taken me much longer than asking the LLM, and the LLM was able to frame the answer from a Dart perspective.

I would say that that information neither "important to digest" nor "readily available."

lxgr 9 hours ago [-]
Ah yes, gathering information in a particular unfamiliar area probably describes it better.

For me, it's been able to give very good answers when they were within the first few Google results when searched for using the proper terms (but the value is in giving you these terms in the first place!).

For questions from my field, it's been wildly hallucinating and producing half-truths, outdated information, or complete nonsense. Which is also fair, because the documentation where the answers could be found is often proprietary, and even then it's either outdated or outright wrong half of the time :)

pests 10 hours ago [-]
I agree with you but at what point does it change? Aren’t we all just stochastic parrots? How do we ourselves choose the next word in a sentence?
lxgr 9 hours ago [-]
In my view, one big learning from LLMs is that yes, more often than not we are just stochastic parrots. And more often than not that's enough!

But sometimes we're more than that: Some types of deep understanding aren't verbal or language-based, and I suspect that these are the ones that LLMs will have the hardest time getting good at. That's not to say that no AI will get there at all, but I think it'll need something fundamentally different from LLMs.

For what it's worth, I've personally changed my mind here: I used to think that the level of language proficiency that LLMs demonstrate easily would only be possible using an AGI. Apparently that's not the case.

skydhash 10 hours ago [-]
We use languages to express ideas. Sentences are always subordinate to the ideas. It's very obvious when you try to communicate in another language you're not fluent in. You have the thought, but you can't find the words. The same thing happens when writing code, taking ideas from the business domain and translating it into code.
barrysteve 10 hours ago [-]
If you wish to make an apple pie, first you must make the universe from scratch. (carl sagan)

We can generate thoughts that are spatially coherent, time aware, validated for correctness and a whole bunch of other qualities that LLMs cannot do.

Why would LLMs be the model for human thought, when it does not come close to the thoughts humans can do every minute of every day?

Aren't we all just stochastic parrots, is the kind of question that requires answering an awful lot about the universe before you get to an answer.

__loam 10 hours ago [-]
God dammit please stop comparing these things to brains. Stop it. It's not even close.
__loam 10 hours ago [-]
This happened to me looking up am obscure c library. It just confidently made up a function that didn't actually exist in the library. It got me unstuck but you can really fuck yourself if you trust it blindly.
10 hours ago [-]
halfmatthalfcat 11 hours ago [-]
> but the responsibility for the output is the user's, not the LLM's.

The current iteration of the internet (more specifically social media) has used the same rationality for its existence but at a level, society has proven itself too irresponsible and/or lazy to think for itself but be fed by the machine. What makes you think LLMs are going to do anything but make the situation worse? If anything, they’re going to reenforce whatever biases were baked into the training material, of which is now legally dubious.

10 hours ago [-]
lxgr 10 hours ago [-]
For a librarian, they’re confidently asserting factual statements suspiciously often, and refer me to primary literature shockingly rarely.
arrowsmith 10 hours ago [-]
In other words they behave like a human?
ForHackernews 11 hours ago [-]
> and can answer your specific questions on any conceivable topic

Yeah, I mean, so can I, as long as you don't care whether the answers you receive are accurate or not. The LLM is just better at pretending it knows quantum mechanics than I am.

scarmig 11 hours ago [-]
Even if a human expert responds about something in their domain of expertise, you have to think critically about the answer. Something that fails 1% of the time is often more dangerous than something that fails 10% of the time.

The best way to use an LLM for learning is to ask a question, assume it's getting things wrong, and use that to probe your knowledge which you can iteratively use to prove the LLM's knowledge. Human experts don't put up with that and are a much more limited resource.

jedberg 11 hours ago [-]
If you asked the Church back then, they would tell you that the printing press was the death of truth, because to them only the word of god was truth, and only the church could produce it.

It's all just a matter of perspective.

Yes, right now it looks like white noise, just like back then it looked like white noise which could drown out the religious texts. But we managed to get past it then and I'm sure we'll manage now.

duskwuff 11 hours ago [-]
This is an astoundingly bad take. Surely you aren't trying to suggest that original, factual, human-authored content has no more inherent value than randomly generated nonsense?
rileyphone 10 hours ago [-]
That's Wittgenstein's argument.
jedberg 11 hours ago [-]
No not at all, I'm not sure why you would even think that.
duskwuff 11 hours ago [-]
As I read it, your parent comment suggests that the distinction in quality and utility between human-authored and AI-generated content is merely "a matter of perspective", i.e. that there is no real distinction, and that they're both equally valuable.

If you actually meant something else, you should probably clarify.

lovemenot 10 hours ago [-]
I am not the person to whom you replied. I understood their comment to be about paradigms shifting through social awareness of the limits and opportunities of new technology.

It can be both true that right now predominantly low quality content emanates from LLMs and at some future time the highest quality material will come from those sources. Or perhaps even right now (the future is already here, just unevenly distributed).

If that was their reasoning, I tend agree. The equivalent of the Catholic Church in this metaphor is the presumption human-generated content's inherent superiority.

__loam 10 hours ago [-]
LLMs are inherently approximations of collective knowledge. They will never be better than their training sets. It's a statistical impossibility.
dambi0 10 hours ago [-]
Suggesting clarification to suit your imaginary inferences seems puzzling. The parent post pointed out that perspectives on authorship have a historical precedent, I didn’t see the value judgement your reading suggested.
11 hours ago [-]
rmbyrro 11 hours ago [-]
The discussion here is that we're not able to distinguish them.

If we cannot distinguish, I'd argue they have similar value.

They must have. Otherwise, how can we demonstrate objectively the higher value in the human output?

snailmailman 10 hours ago [-]
They can be distinguished. They are just becoming more difficult to. Its slightly-more difficult, but also the amount of garbage is overwhelming. AI can spit out entire books in moments that would take an individual months or years to write.

There are lots of fake recipe books on amazon for instance. But how can you really be sure without trying the recipes? It might look like a recipe at first glance, but if its telling you to use the right ingredients in a subtly-wrong way, its hard to tell at first glance that you won't actually end up with edible food. Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients, but they aren't always that obvious.

I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question. It looked like an answer at first glance, but the file type of the config file chat GPT provided wasn't correct, and on top of that it was just making up config options in attempt to solve the problem. I told the user this, they deleted their response and admitted it was from chatGPT. However, the user asking the question didn't know the intricacies of "what config options are available" and "what file types are valid configuration files". This could have wasted so much of their time, dealing with further errors about invalid config files, or options that did not exist.

duskwuff 9 hours ago [-]
> Some examples are easy to point at, like the case of the recipe book that lists Zelda food items as ingredients

As an aside, the case you're thinking of was a novel, not a recipe book. Still embarrassing, but at least it was just a bit of set dressing, not instructions to the reader.


> I saw someone giving programming advice on discord a few weeks ago. Advice that was blatantly copy/pasted from chat GPT in response to a very specific technical question.

This, on the other hand, is a very real and a very serious problem. I've also seen users try to get ChatGPT to teach them a new programming language or environment (e.g. learning to use a game development framework) and ending up with some seriously incorrect ideas. Several patterns of failure I've seen are:

1) As you describe, language models will frequently hallucinate features. In some cases, they'll even fabricate excuses for why those features fail to work, or will apologize when called out on their error, then make up a different nonexistent feature.

2) Language models often confuse syntax or features from different programming languages, libraries, or paradigms. One example I've heard of recently is language models trying to use features from the C++ standard library or Boost when writing code targeted at Unreal Engine; this doesn't work, as UE has its own standard library.

3) The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.

SargeZT 6 hours ago [-]
> The language model's body of "knowledge" tends to fall off outside of functionality commonly covered in tutorials. Writing a "hello world" program is no problem; proposing a design for (or, worse, an addition to) a large application is hopeless.

Hard disagree. I've used GPT-4 to write full optimizers from papers that were published long after the cutoff date that use concepts that simply didn't exist in the training corpus. Trivial modifications were done after to help with memory usage and whatnot, but more often than not if I provide it the appropriate text from a paper it'll spit something out that more or less works. I have enough knowledge in the field to verify the corectness.

Most recently I used GPT-4 to implement the paper Bayesian Flow Networks, a completely new concept that I recall from the comment section on HN people said "this is way too complicated for people who don't intimately know the field" to make any use of.

I don't mind it when people don't find use with LLMs for their particular problems, but I simply don't run into the vast majority of uselessness that people find, and it really makes me wonder how people are prompting to manage to find such difficulty with them.

rmbyrro 10 hours ago [-]
They can indeed distinguish them, I agree. So why the fuss?

I think the concern is that bad authors would game the reviews and lure audiences into bad books.

But aren't they already able to do so? Is it sustainable long term? If you spit out programming books with code that doesn't even run, people will post bad reviews, ask for refunds. These authors will burn their names.

It's not sustainable.

snailmailman 10 hours ago [-]
It doesn't need to be sustainable as one author or one book. These aren't real authors. Its people using AI to make a quick buck. By the time the fraud is found out, they've already made a profit.

They make up an authors name. Publish a bunch of books on a subject. Publish a bunch of fake reviews. Dominate the search results for a specific popular search. They get people to buy their book.

Its not even book specific, its been happening with actual products all over amazon for years. People make up a company, sell cheap garbage, and make a profit. But with books, they can now make the cheap garbage look slightly convincing. And the cheap garbage is so cheap to produce in mass amounts that nobody can really sort through and easily figure out "which of these 10k books published today are real and which are made up by ai".

It takes time and money to produce cheap products at a factory. But once these scammers have the AI generation setup, they can just publish books on loop until someone ends up buying one. They might get found out eventually, and they will have to pretend to be a different author, and they just repeat the process.

failuser 10 hours ago [-]
What’s the fuss about spam? You can distinguish it from useful mail? What’s the fuss about traffic jams? You’ll get there eventually.

The LLM allow DDoS attack by increasing the threshold needed to check the books for gibberish.

It’s not like this stream of low quality did not exist before, but the topic is hot and many grifters try LLMs to get a quick buck at the same time.

geraldwhen 10 hours ago [-]
It’s sustainable if you can automate the creation of amazon seller accounts. Based on the number of fraudulent Chinese seller accounts, I’d say it’s very likely automated or otherwise near 0 cost.
mostlylurks 10 hours ago [-]
A piece of human-written content and a piece of AI-written content may have similar value if we cannot distinguish between them. But if you can add the information that the human-written content was written by a human to the comparison, the human-written content becomes significantly more valuable, because it allows for a much deeper reading of the text, since the reader can trust that there has been an actual intent to convey some specific set of ideas through the text. This allows the reader to take a leap of faith and put in the work required to examine the author's point of view, knowing that it is based on the desires and hopes of an actual living person with a lifetime of experience behind them instead of being essentially random noise in the distribution.
skydhash 10 hours ago [-]
I'm not a native English speaker, but ChatGPT answers in each interaction I had with it sound bland. And I dislike the bite-sized format of it. I'm reading "Amusing Ourselves to Death" by Neil Postman and while you may agree or disagree with his take, he developed it in a very coherent way, exploring several aspects. ChatGPT's output falls into the same uncanny valley as the robotic voice from text to speech software, understandable, but no human does write that way.

ChatGPT as an autocompletion tool is fine, IMO. As well as generating alternative sentences. But anything longer than a paragraph falls back to the uncanny valley.

rmbyrro 10 hours ago [-]
I totally agree. So why are people so worried about books being written by ChatGPT?

These pseudo-authors will get bad reviews, will lose money in refunds, burn their names.

It's not sustainable. Some will try, for sure, but they won't last long.

Dylan16807 3 hours ago [-]
There's too many names and it's too cheap to do this.

The equilibrium shifts to making it much harder to find good books, and that was already hard enough.

failuser 10 hours ago [-]
If you ask LLM something you know you can distinguish noise from good output. If you ask LLM something you don’t know then how do you know if the output is correct? There are cases where checking is easier than producing the result, e.g. when you ask for a reference.
rmbyrro 10 hours ago [-]
Book buyers should give themselves primarily by who's the author, I think.

Choose a book from someone that has a hard earned reputation to protect.

failuser 9 hours ago [-]
There is bootstrapping process of learning which authors in that field have good reputation before you know anything about the field. That is being disrupted by LLMs as well, though.
emodendroket 11 hours ago [-]
I can't distinguish between pills that contain the medicine that I was prescribed and those than contain something else entirely. Therefore taking either should be just as good.
rmbyrro 11 hours ago [-]
Really. Are you comparing a complex chemical analysis required to attest the contents of a pill to reading text?
emodendroket 10 hours ago [-]
It depends, is the text of a technical nature? How exactly is one to know they're being deceived if, to take one of the examples that has been linked in this discussion, they receive a mushroom foraging guide but the information is actually AI-generated?
rmbyrro 10 hours ago [-]
You first check who published it. Is the author an expert in the matter with years, perhaps decades in the industry?

Heck, we always did that since before GPT.

Good authors will continue to publish good content because they have a reputation to protect. They might use ChatGPT to increase productivity, but will surely and carefully review it before signing off.

emodendroket 9 hours ago [-]
"We" certainly did not "always" do that before.
rmbyrro 2 hours ago [-]
Really? You buy books without searching anything about who wrote it?

If yes, well, there's the problem then. It's not AI, but the lack of guidance and research skills in support of the process of choosing a book.

SketchySeaBeast 11 hours ago [-]
If they were of similar value would there be a problem with the deluge?
rmbyrro 10 hours ago [-]
Can't the deluge be delusional or an overreaction at best?
tzs 7 hours ago [-]
The printing press made books cheap relative to hand copied books, but they were still expensive for most people.

Before the printing press two books cost around the same as a 2 story cottage.

Afterwards a couple books would be about a month of wages for a skilled worker.

That greatly limits ones ability to drown out anything with books.

SketchySeaBeast 11 hours ago [-]
I'd argue that giving a group with unique thoughts and ideas a voice is different than creating a noise machine.
jedberg 11 hours ago [-]
I think the jury is still out on whether an LLM produces ideas any more or less unique than most humans. :)
OfSanguineFire 11 hours ago [-]
> The printing press democratized knowledge.

Not for centuries. Due to the expense of the technology and the requirement in some locations for a royal patent to print books, the printing press just opened up knowledge a bit more from the Church and aristocracy to the bourgeoisie, but it did little for the masses until as late as the 1800s.

ls612 11 hours ago [-]
A big part of this is that literacy didn’t come to the masses until the 1800s. But in England and the Netherlands you had (somewhat) free press by the late 1600s and early 1700s.
peab 11 hours ago [-]
I'm reminded of the Library of Babel
11 hours ago [-]
pacman2 11 hours ago [-]
I was told publishers dont promote a good book anymore these days. They ask how many instagram followers do you have?

Maybe the self-publishing and BoD will decline in the long term due to ML white noise and publishers are a sign of quality again.

Supply5411 11 hours ago [-]
You could argue that speech is literally noise that drowns out the signals of your environment. If you just babbled, it would be useless, but instead you use it intelligently to communicate ideas. LLM output is a new palette with which humans can compose new signals. We just have to use it intelligently.

Prompt engineering is an example of this. A clever prompt by a domain expert can prime an LLM interaction to yield better information to the recipient in a way that the recipient themselves could not have produced on their own.

__loam 10 hours ago [-]
People comparing the AI bullshit spigot to the printing press are clowns.
11 hours ago [-]
woah 11 hours ago [-]
It used to be that a scribe would painstakingly copy a manuscript, through the process absorbing the text at a deep level. This same scribe could then apply this knowledge to his own writing, or just understand and curate existing work. The manual labor required to copy at scale employed many scribes, who formed the next generation of thinkers.

With the press, a greasy workman can churn out hundreds of copies an hour, for whichever charlatan or heretic palms him enough coin. The people are flooded with falsehoods by men whose only interest in writing is how many words they can fit on a page, and where to buy the cheapest ink.

The worst part is that it is impossible to distinguish the work of a real thinker from that of a cheap sophist, since they are all printed on the same rough paper, and serve equally well as tomorrow's kindling.

emodendroket 11 hours ago [-]
Where are the good AI-generated books that serve as the positive side of this development?
BarryMilo 11 hours ago [-]
You're implying that what is being produced has actual value, the problem is they're acting in patently bad faith. Weep not for the spammers.
langsoul-com 5 hours ago [-]
The incentive system is completely different. The new AI generated content is for a quick buck, just spamming out content because $1 x 10,000 is a lot.

If it was written with the aid of AI, that's different. At least someone tried to make something good and just used avalible tools to enhance the quality.

gamepsys 11 hours ago [-]
The rhyme has a lot to do with how existing power structures handle a sudden increase in the amount of written text generated. In this comparison, they both try to apply the breaks. Banned books didn't work well for the Catholic Church. I think increasing QA for Amazon might actually help their book business. Of course, a book seller has a greater responsibility to society than to make money.
thaumasiotes 7 hours ago [-]
> the invention of the printing press caused a lot of controversy with the Catholic Church

> https://en.wikipedia.org/wiki/Index_Librorum_Prohibitorum

The example is from the sixteenth century, but the printing press is from the seventh century.

I don't think the Catholic Church bothered to take any notice at all?

Barrin92 10 hours ago [-]
>similar growing pains

For what it's worth, these 'growing pains' took the form of the wars of religion in Europe, which in Germany killed up to 30% of the population, that's in relative terms significantly worse than the casualties of World War I and II. So maybe the Catholic Church had a point

lovemenot 10 hours ago [-]
>> So maybe the Catholic Church had a point

Is that really the take-away? If the Catholic Church had not been so belligerent, those wars would not have been needed. Now that we are past that time, we should surely be thanking those combatants who helped disseminate knowledge in spite of the Church whose interest was in hoarding it.

Barrin92 9 hours ago [-]
I think that's a pretty bad reading of history frankly. The Church didn't hoard knowledge, in fact they were arguably the primary preservers of knowledge and disseminator of it, through the monastic tradition in Medieval Europe. Many thousands of which were destroyed during the religious wars, which is a common theme as far sectarian wars go. They are first and foremost destroyers of knowledge.

More importantly I certainly wouldn't want to live through that period for any reason, and much less repeat it. If an ordinary printing press caused that much chaos I'm not sure I want to figure out what one on steroids is going to do

hinkley 11 hours ago [-]
How do we...

I'm not entirely sure how to word this question.

How do we make sure that most of the people we talk to are at least humans if not necessarily the person we expect them to be? And I'm not saying that like a cartoonish bad guy in a movie who hates artificial intelligence and augmented humans.

How do I not get inundated by AI that's good at trolling. How do I keep the social groups I belong to from being trolled?

These questions keep drawing me back to the concept of Web of Trust we tried to build with PGP for privacy reasons. Unless I've solicited it, I really only want to talk to entities that pass a Turing Test. I'd also like it if someone actively breaking the law online were actually affected by the deterrence of law enforcement, instead of being labeled a glitch or a bug in software that can't be arrested, or even detained.

It feels like I want to talk to people I know to be human (friends, famous people - who might actually be interns posing as their boss online), and people they know to be human, and people those people suspect to be human.

I have long term plans to set up a Wiki for a hobby of mine, and I keep getting wrapped around the axle trying to figure out how to keep signup from being oppressive and keep bots from turning me into an SEO farm.

timeagain 11 hours ago [-]
This is only a problem for someone terminally online. The vast majority of people talk to their friends and coworkers in person.
munificent 10 hours ago [-]
That was the solution that came to mind to me too, but it doesn't work either.

Even if you're never online and only talk to people in person... over time those people will be increasingly informed by LLM-generate pseudo-knowledge. We aren't just training the AIs. They're training us back.

If you want to live in a society where the people you interact with have brains mostly free of AI-generated pollution, then I'm sorry but that world isn't going to be around much longer. We are entering the London fog era of the Information Age.

hinkley 11 hours ago [-]
I don't trust my friends for medical advice. Some of them trust me for plant advice, and they really probably shouldn't. I am very stove-piped.

We have two and a half generations of people right now most of whom think "I did the research" means "I did half as much reading as the average C student does for a term paper, and all of that reading was in Google."

And Alphabet fiddles while Google burns. This is going to end in chaos.

Spivak 5 hours ago [-]
> "I did the research" means "I did half as much reading as the average C student does for a term paper

What's the alternative? No one who says that is saying they did original research, they're saying they searched around and got what they believe to be at least a consensus among the body of experts they trust.

Like I agree the problem sucks but I have no idea what a solution looks like. For fields someone is totally unfamiliar with they simultaneously don't have enough knowledge to evaluate the truth of a claim nor the knowledge to evaluate if someone is qualified and trustworthy enough to believe them. It's turtles all the way down -- especially because topics of any interest you can find as many experts as you care to of whatever qualification you demand making all sorts of contradictory claims.

mostlylurks 10 hours ago [-]
> This is only a problem for someone terminally online.

Is it? Even those whose social life is entirely IRL, they still have to increasingly interact with various businesses, banks, healthcare providers, the government, and often more distant collegues through online services. Do I want these to go through LLM chatbots? No. Can I ensure that I'm speaking to an actual human if the communication is text based? Not really.

invalidptr 11 hours ago [-]
This is a problem for anyone who is not actively vigilant about the information they consume. A family member (who I would not describe as "terminally online") came to me today in a panic talking about how some major event had just occurred and how social order was beginning to collapse. I quickly glanced at the headlines on a few major news outlets and realized that they just saw some incendiary content designed to elicit that reaction. I calmed them down and walked them through a process they could use to evaluate information like that in the future, and they were a little embarrassed.

The concern isn't necessarily for you. It's for the large swaths of people who are less equipped to filter through noise like this.

romseb 10 hours ago [-]
There is some irony in Sam Altman bringing us the cause (AI) and purported solution (Worldcoin) for your problem at the same time.
hinkley 10 hours ago [-]
It's what ad men do. Point out there's a problem, offer you the solution.
kiicia 9 hours ago [-]
we don't, check Boltzmann brain https://en.m.wikipedia.org/wiki/Boltzmann_brain
vorpalhex 11 hours ago [-]
Meet people in real life. This problem is trivially solved by just using meatspace.

Alternatively for sign ups, tell them to contact you and ask. Chat with them a moment. Ask them about their hobbies and family.

ethanbond 10 hours ago [-]
Using meatspace doesn't solve the problem, using meatspace exclusively solves the problem. And it's not a great one given, you know, how much of the world "happens" online now.
ilamont 12 hours ago [-]
See also: "Tom Lesley has published 40 books in 2023, all with 100% positive reviews"


ritzaco 12 hours ago [-]
I remember that one - interestingly the amazon link it goes to shows only 3 books now, all that look real, not the 40 that I remember seeing before.

So I guess Amazon is doing something even though I regularly hear complaints from authors that they allow blatant piracy all the time

kmeisthax 11 hours ago [-]
Amazon has no reason to give a shit about piracy on KDP: they make money either way. But having a load of AI generated garbage on your platform makes it far less valuable. You want your stolen books to actually be good. :P
bragr 11 hours ago [-]
>shows only 3 books now

Those appear to be by different authors with similar names: https://www.amazon.com/s?k=%22tom+lesley%22

phh 12 hours ago [-]
Possibly it's the author removing them at the first one star rating to keep their author score high?
willio58 12 hours ago [-]
It seems Amazon cares more about polluting search results in Kindle than polluting the search results in their own e-commerce business. I think low-effort books generated by AI are much less detrimental than sketchy physical products being shipped to your door in 2 days or less.
crooked-v 11 hours ago [-]
It's probably about volume rather than quality. Sketchy copycat product lines are still hard limited by the number of factories and shipping operations in existence, while sketchy AI-generated books can easily keep growing exponentially in number for a while.
harles 11 hours ago [-]
The title of this story doesn’t seem to match the content. This seems like a proactive move to prevent individual publishers from spamming many many submissions - and even then, they’re willing to make exceptions.

> While we have not seen a spike in our publishing numbers, in order to help protect against abuse, we are lowering the volume limits we have in place on new title creations. Very few publishers will be impacted by this change and those who are will be notified and have the option to seek an exception.

Almondsetat 12 hours ago [-]
Livestreams where artists show their creative process and use the streaming platform to immediately sell the thing they produced, just to prove it had human origins.

This is the future

edgarvaldes 11 hours ago [-]
We have realtime filters, avatars, translators, TTS, etc. Soon, all of this will be "good enough" to mimic the proposed solution.
SketchySeaBeast 11 hours ago [-]
You're only kicking the can down the road.
adamredwoods 12 hours ago [-]
>> We require you to inform us of AI-generated content (text, images, or translations) when you publish a new book or make edits to and republish an existing book through KDP. AI-generated images include cover and interior images and artwork. You are not required to disclose AI-assisted content.
hiidrew 12 hours ago [-]
Their distinction:

>AI-generated: We define AI-generated content as text, images, or translations created by an AI-based tool. If you used an AI-based tool to create the actual content (whether text, images, or translations), it is considered "AI-generated," even if you applied substantial edits afterwards. AI-assisted: If you created the content yourself, and used AI-based tools to edit, refine, error-check, or otherwise improve that content (whether text or images), then it is considered "AI-assisted" and not “AI-generated.” Similarly, if you used an AI-based tool to brainstorm and generate ideas, but ultimately created the text or images yourself, this is also considered "AI-assisted" and not “AI-generated.” It is not necessary to inform us of the use of such tools or processes.


prvc 11 hours ago [-]
Allowing the use of tools to modify the contents erases any clear distinction between the categories.
pcl 12 hours ago [-]
This is really interesting. I imagine that AI-generated art / illustrations for books mostly-text is a pretty compelling thing for authors, for all the same reasons that AI-generated text is of value for non-authors. I wonder how this line will work out in practice.
el_benhameen 12 hours ago [-]
This doesn’t seem surprising. Half of my YouTube ads these days are for some kind of AI+Kindle-based get rich quick scheme.
NotYourLawyer 12 hours ago [-]
fuddle 11 hours ago [-]
About time, YouTube is full of videos about making eBook's with ChatGPT. e.g "Free Course: How I Made $200,000 With ChatGPT eBook Automation at 20 Years Old" https://www.youtube.com/watch?v=Annsf5QgFF8
dzink 11 hours ago [-]
Strategically, AI generated content is a boon for platforms like Amazon.

1. The more content there is, the more you can't reliably get good stuff without reviews, the more centralized distribution platforms with reviews and rankings are needed. 2. Even if people are making fake books for money laundering, Amazon gets a cut of all sales, laundered or not.

Just like Yahoo's directory once upon a time though, and Movie theaters, the party gets ruined when most people learn they can use AI to generate custom stories at home and/or converse with the characters and interact in far more ways than currently possible. Content is going from king to commodity.

blibble 10 hours ago [-]
amazon's reviews and rating are completely garbage and have been for some time
neilv 11 hours ago [-]
This sounds like a commendable move by Amazon. I especially like the idea of requiring disclosure of use of "AI".
cogman10 11 hours ago [-]
Here's a pretty good article about the problem with AI generated books. "AI Is Coming For Your Children" [1]

[1] https://shatterzone.substack.com/p/ai-is-coming-for-your-chi...

cellu 12 hours ago [-]
Why do people read contemporary books is something I can’t really get my head around. There’re so many classics to keep people busy for life - and are 100% guaranteed to be insightful and pleasurable.
rustymonday 11 hours ago [-]
Should people stop telling new stories? A century from now the best books of today will be classics. Books can act as a time capsule of a certain time and place and mode of life. And that has value.
bwb 11 hours ago [-]
Contemporary books are just new classics. It is like asking why read :)
OfSanguineFire 11 hours ago [-]
There’s a distinct demographic in the contemporary-fiction-reading community, as can be seen in corners of Goodreads or Instagram, that demands new fiction to tell the stories of groups not covered, or supposedly unfairly covered, in that classic literature: LGBT, BIPOC, the working class, etc. In fact, they might even deny that the classics are “insightful and pleasurable” due to these social concerns.
timeagain 11 hours ago [-]
That’s really weird. People are making all kinds of books and stories. And stories are relevant to their time. The matrix wouldn’t be written in 1900, a tale of two cities wouldn’t be written in 1200, …

It is true though that if you have a culturally diverse set of friends and are open to their experiences and opinions, a lot of “the classics” start to smell bad. Imagine being black and reading Grapes of Wrath. You might think the situation of the main characters as humorous or infantile, considering how relatively fortunate they are.

Baeocystin 11 hours ago [-]
What's the name of the law where the longer something has already been around, the longer it will likely stay around in the future?

I've found that it definitely applies to books. Starting at a ~20 year horizon is a surprisingly good filter for quality.

savoyard 10 hours ago [-]
> What's the name of the law where the longer something has already been around, the longer it will likely stay around in the future?

The Lindy effect.

Baeocystin 45 minutes ago [-]
Thank you.
gamepsys 11 hours ago [-]
I think the risk of reading a suboptimal book is not greater than the risk of not allowing myself to be exposed to different voices.
carlosjobim 10 hours ago [-]
One of the best books I read last year was the story of the rescue of the football team that was trapped in a flooded cave in 2018 – written by cave diver Rick Stanton, who found the team and led the rescue. How would that account have been written into a book before it happened?
barrysteve 9 hours ago [-]
Yes, and there's been a drop in quality since then too. The 1800-1940s really saw literature as the high water mark for quality media and it shows.

Finding deeply valuable and high quality books is much rarer in today's crop of authors. The best minds are rarely making the medium of literature their highest good, but are instead chasing dollars and relations with the rich and famous.

freediver 11 hours ago [-]
This is just a tip of the iceberg, compared to what we are heading into with the web. Very concerning.

I would go long the value of genuine human writing, aka the 'small web'.

unmole 7 hours ago [-]
So, what are the actual limits?
pseingatl 9 hours ago [-]
If KDP required an ISBN it would cut down on the garbage books. In the US at least, ISBN's cost money.
fragmede 9 hours ago [-]
they're not that much but you can just get an Australian one for free
campbel 12 hours ago [-]
Gee, I sure hope people don't just lie about it...
skepticATX 11 hours ago [-]
It doesn’t matter. It’s garbage content and immediately recognizable as being AI generated.

It is absolutely possible to write a good article or even a good book with AI, but at least for now it’s just as hard, if not harder, than doing it without AI.

But of course people trying to make a quick buck won’t put in the required effort, and they likely don’t even have the ability to create great or even good content.

duskwuff 11 hours ago [-]
> It’s garbage content and immediately recognizable as being AI generated.

It's also recognizable by its sheer volume. An "author" who submits several new books every day is clearly not doing their own writing. The AI publishing scam relies on volume -- they can't possibly win on quality, but they're hoping to make up for that by putting so many garbage books on the market that buyers can't find anything else.

atrus 10 hours ago [-]
I'm not sure. Ghostwriting exists, and a person (or organization) with enough money could easily pay enough ghostwriters to output at a more than human pace.
duskwuff 10 hours ago [-]
Even at their most prolific, a ghostwritten author still probably wouldn't publish more than one or two books a month. Beyond that point, you're just competing with yourself. (For instance, young adult series like Goosebumps, The Baby-Sitters Club, or Animorphs typically published a book every month or two.)

Publishing multiple books per day is out of the question. That's beyond even what's reasonable for an editor to skim through and rubber-stamp.

mostlylurks 10 hours ago [-]
> It doesn’t matter. It’s garbage content and immediately recognizable as being AI generated.

Is it? How do you immediately recognize a book as AI generated before buying it, if the author isn't doing something silly like releasing several books per day/month? And even after you buy a book, how can you distinguish between the book just being terrible and the book being written with extensive use of AI? I don't believe AI can write good books, but I would still like to distinguish those two cases, since the former is just a terrible book, which is perfectly fine, while the latter I would like to avoid. I don't want to waste my limited time reading AI content.

gamepsys 11 hours ago [-]
> It’s garbage content and immediately recognizable as being AI generated.

Yea, but the Turning Test is actively being assaulted. Soon we won't know the difference between an uninspired book written by an AI and an uninspired book written by a human.

tyingq 11 hours ago [-]
>It is absolutely possible to write a good article or even a good book with AI, but at least for now it’s just as hard, if not harder, than doing it without AI.

How hard is it though, to create a shitty book with AI, that Amazon can't detect was written with AI?

idomajid 9 hours ago [-]
Finally, I hope those garbage books will slightly decrease from there.
ggm 10 hours ago [-]
Mushroom picking guides on AI "what could possibly go wrong"
corethree 12 hours ago [-]
How do we even know this entire comment thread isn't polluted with AI?

Maybe it doesn't matter. The quality of the work matters more than the process of actualization.

quickthrower2 10 hours ago [-]
In a practical sense: AI generated stuff is crappy and often subtly wrong and it can be generated faster than human generated content. So it becomes untenable to even search for good information.
corethree 8 hours ago [-]
Then it's good for fiction. Lots of demand for fiction.
DookieBiscuit 12 hours ago [-]
bern4444 10 hours ago [-]
It seem that as a society we are coming to realize that enabling anyone to do anything on their own and at anytime isn't the best of ideas.

Verifiability and authenticity matter and are valuable. Amazon has long had a problem of fake reviews. This issue with kindle books seems an extension of that. Massive centralized platforms like Amazon makes fraud more likely and is bad for the consumer.

The "decentralization" that we need as a society is not in the form of any crypto based technical capability but simply for the size of the massive players to be reduced so competition can reemerge and give consumers more options on where and how to spend their dollars. Other E-book stores may just pop up that develop relationships with publishers and disallow independent publishing if amazon were forced to be broken up.

I hope the FTC can begin finding a strategy to force some of these massive corporations to split making it more likely for there to be more competition.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:34:02 GMT+0000 (Coordinated Universal Time) with Vercel.