this post was submitted on 13 Aug 2023
26 points (78.3% liked)

Programming

17326 readers
239 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
 

Inspired by the comments on this Ars article, I've decided to program my website to "poison the well" when it gets a request from GPTBot.

The intuitive approach is just to generate some HTML like this:

<p>
// Twenty pages of random words
</p>

(I also considered just hardcoding twenty megabytes of "FUCK YOU," but that's a little juvenile for my taste.)

Unfortunately, I'm not very familiar with ML beyond a few basic concepts, so I'm unsure if this would get me the most bang for my buck.

What do you smarter people on Lemmy think?

(I'm aware this won't do much, but I'm petty.)

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 11 points 1 year ago (1 children)

show the 20 pages of random words to your users, right?

any dev worth it's salt is going to check the agent string for GPTBot.

That said, it's a perfect receipe for getting companies to spoof browsers.

[โ€“] [email protected] 4 points 1 year ago

Yeah, and even if OpenAI uses user agents that identify that bot as GPTBot, there's no guarantee other scrapers will be so kind.