Fediverse

17671 readers

24 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

What is the fediverse?
- Short ver.
- Full ver.
Fediverse Platforms
How to run your own community

founded 4 years ago

MODERATORS

[email protected]

Is Mastodon's Link-Previewing Overloading Servers ? (news.itsfoss.com)

submitted 6 months ago by [email protected] to c/[email protected]

35 comments fedilink hide all child comments

The blog Its FOSS has 15,000 followers for its Mastodon account — which they think is causing problems:

When you share a link on Mastodon, a link preview is generated for it, right? With Mastodon being a federated platform (a part of the Fediverse), the request to generate a link preview is not generated by just one Mastodon instance. There are many instances connected to it who also initiate requests for the content almost immediately. And, this "fediverse effect" increases the load on the website's server in a big way.

Sure, some websites may not get overwhelmed with the requests, but Mastodon does generate numerous hits, increasing the load on the server. Especially, if the link reaches a profile with more followers (and a broader network of instances)... We tried it on our Mastodon profile, and every time we shared a link, we were able to successfully make our website unresponsive or slow to load.

It's Foss blog says they found three GitHub issues about the same problem — one from 2017, and two more from 2023. And other blogs also reported the same issue over a year ago — including software developer Michael Nordmeyer and legendary Netscape programmer Jamie Zawinski.

And back in 2022, security engineer Chris Partridge wrote:

[A] single roughly ~3KB POST to Mastodon caused servers to pull a bit of HTML and... an image. In total, 114.7 MB of data was requested from my site in just under five minutes — making for a traffic amplification of 36704:1. [Not counting the image.]

Its Foss reports Mastodon's official position that the issue has been "moved as a milestone for a future 4.4.0 release. As things stand now, the 4.4.0 release could take a year or more (who knows?)."

They also state their opinion that the issue "should have been prioritized for a faster fix... Don't you think as a community-powered, open-source project, it should be possible to attend to a long-standing bug, as serious as this one?"

Abstract credit: https://slashdot.org/story/428030

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 34 points 6 months ago (1 children)

Wait, you're going to tell me you don't actually have to serve bloat on a blog like it's foss? No way!

[–] [email protected] 24 points 6 months ago (2 children)

I only serve bloat to AI crawlers.

map $http_user_agent $badagent {
  default     0;
  # list of AI crawler user agents in "~crawler 1" format
}

if ($badagent) {
   rewrite ^ /gpt;
}

location /gpt {
  proxy_pass https://courses.cs.washington.edu/courses/cse163/20wi/files/lectures/L04/bee-movie.txt;
}

...is a wonderful thing to put in my nginx config. (you can try curl -Is -H "User-Agent: GPTBot" https://chronicles.mad-scientist.club/robots.txt | grep content-length: to see it in action ;))

[–] [email protected] 4 points 6 months ago (1 children)

Your bandwidth bill lol

[–] [email protected] 6 points 6 months ago (1 children)

I don't think serving 86 kilobytes to AI crawlers will make any difference in my bandwidth use :)

[–] [email protected] 4 points 6 months ago (1 children)

Oic its a redirect now

[–] [email protected] 7 points 6 months ago

It's not. It just doesn't get enough hits for that 86k to matter. Fun fact: most AI crawlers hit /robots.txt first, they get served a bee movie script, fail to interpret it, and leave, without crawling further. If I'd let them crawl the entire site, that'd result in about two megabytes of traffic. By serving a 86kb file that doesn't pass as robots.txt and has no links, I actually save bandwidth. Not on a single request, but by preventing a hundred others.