this post was submitted on 15 Jun 2023
169 points (95.2% liked)
Programming
17408 readers
70 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
In lemmy's case, my perusal of the DB didn't really suggest that the queries would be that complex and I suspect that moving it to a higher performance NoSQL DB might be possible, but I'd have to take a look at a few more queries to be sure.
I wonder if this could be made to work with Aerospike Community Edition...
Obviously it could be more effort than it's worth though.
The issues I've seen more are around images. Hosting the images on an object store (cloudflare r2, s3) and linking there would reduce a lot of federation bandwidth, as that's probably cause higher ram/swap usage too.
pict-rs supports storing in object stores, but when getting/serving images, it still serves through the instance as the bottleneck IIRC. That would do quite a bit to free up some resources and lower overall IO needed by the server.
There's no need to migrate the database, that shouldn't be an issue at this size. Caching should be implemented as another comment suggested.
Would you be so kind as to recommend some resources about caching? I've read the basics, but have yet to dive deep on it
The basic idea is to keep data as close to the processor as possible, so with a database that means storing the result of commonly used queries in memory.
Good resources.
Oh shit does lemmy not have response caching? Yeah, that's gonna be an issue pretty soon.
https://www.reddit.com/r/Lemmy/comments/14h965f/comment/jpdemet
I have no idea, just inferred that from some other posts.
Ehhhhhhh. Using a relational database for Lemmy was certainly a choice, but I don't think it's necessarily a bad one.
Within Lemmy, by far the most expensive part of the database is going to be comment trees, and within the industry the consensus on the best database structure to represent these is... well, there isn't one. The efficiency of this depends way more on how you implement it within a given database model than on the database model itself. Comment trees are actually a pretty difficult problem; you'll notice a lot of platforms have limits on comment depth, and there's a reason for that. Getting just one level of replies to work efficiently can be tricky, regardless of the choice of DBMS.
Looking at the schema Lemmy uses, I see a couple opportunities to optimize it down the road. One of the first things I noticed is that comment replies don't seem to be directly related back to the top-level post, meaning you're restricted to a breadth-first search of the comment tree at serving time. Most comments will be at pretty shallow depths, so it sometimes makes sense to flatten the first few levels of this structure so you can get most relevant comments in a single query and rebuild the tree post-fetching. But this makes nomination (i.e. getting the "top 100" or whatever comments to show on your page) a lot more difficult, so it makes sense that it's currently written the way it is.
If it's true (as another commenter said) that there's no response caching for comment queries, that's a much bigger opportunity for optimization than anything else in the database.