this post was submitted on 09 Jun 2023
64 points (98.5% liked)

Memes

45192 readers
2154 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago (1 children)

If he thinks locking down the API is going to stop them, he's bumped his head. These companies have more than enough manpower to write and maintain an HTML scraper for Reddit.

[–] [email protected] 0 points 1 year ago (1 children)

Creating a web scraper vs actually maintaining one that is effective and works is two different things. It's very easy to fight web scraping if you know what you are doing.

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago) (1 children)

Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it's them.

Also, Google and Microsoft already have a search index full of Reddit content to scrape.

[–] [email protected] 0 points 1 year ago (1 children)

You are right. You would need a team of skilled scrapers and network engineers though would know how to get around rate limiters with some kind of external load balancer or something along those lines.

[–] [email protected] 2 points 1 year ago

Rate limiters work on IP source. This is easily bypassed with a rotating proxy. There are even SaaS that offer this. The trick is to not use large subnets that can be easily blocked. You have to use a lot of random /32 IPs to be effective.