RoundSparrow

joined 1 year ago
MODERATOR OF
[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

who would have predicted that Elon Musk would do all the wild things he did with Twitter. Reddit pissing everyone off in June... pretty odd how audiences are behaving in 2023 towards all this. Oh yha, Threads, that coming on the scene too. 2023 has really been odd for audiences.

The SQL speaks for itself, but I don't know what's going on in terms of why people are treating social media platforms like Lemmy, Twitter, Threads, Reddit this year so unusually. This SQL statement kind of thing has been covered in so many books, conferences, etc. It's like forgotten history now in the era of Elon Musk X and Reddit Apollo times.

I don't know what to say other than I can try to hire a translator or teacher to explain how this SQL problem is obvious and well understood 13 years ago. I mean, there was a whole "NoSQL movement" because of this kind of thing. But I clearly can't get people to hear past all the Elon Musk, Threads, Lemmy from Reddit ... and I'm left describing it as 'social hazing' or whatever is gong on with social media.

Lemmy has like 5 different Rust programming communities, but nobody fixing Lemmy. It's surreal in 2023 the Elon Musk X days. I think it's making all of us uncomfortable. The social movement underway.

[–] [email protected] -1 points 1 year ago (1 children)

Ok, so let's look at recent changes that they have deployed.... https://github.com/LemmyNet/lemmy/issues/3886

One of which makes entire tree of comments disappear. Do you see developers fretting over this and fixing it? Or do you see them ignoring the May 27 PostgreSQL JOIN problem.

How did such a bug go out? Do you see Lemmy developers actually using Lemmy to test things and notice these crashes and problems? Do you look at their posting and comment history? Do they actually go login over at Beehaw and Lemmy.world and see just how terrible the code performance is?

If it isn't hazing, what is it?

It's as if they build a product only for other people to use... and they don't notice any of the constant crashes, incredibly slow performance etc - and they act like nobody in the computer industry ever heard of Memcache or Redis to solve performance problems. If it isn't extreme hazing going on, then what is it?

[–] [email protected] 2 points 1 year ago

Here, you can dig into what posted days before the pull request you read:

https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1685314733

 

June 4:

joins are better than in queries with potentially thousands of inserted IDs.

Given that more than 8 JOIN statements is something PostgreSQL specifically concerns itself with (join_collapse_limit). I hand-edit the query with a single IN clause and the performance problem disappears. 8 full seconds becomes less than 200ms against 5,431,043 posts. And that 200ms is still high, as I was extremely over-reaching with "LIMIT 1000" in case the end-user went wild with blocking lists or some other filtering before reaching the final "LIMIT 10". When I change it to "LIMIT 20" in the subquery, it drops almost in half to 115ms... still meeting the needs of the outer "LIMIT 10" by double. More of the core query filtering can be put into the IN subquery, as we aren't dealing with more than 500 length pages (currently limited to 50).

SELECT 
   "post"."id" AS post_id, "post"."name" AS post_title,
   -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
   -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
     "person"."id" AS p_id, "person"."name",
     -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
     -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
     -- "person"."bot_account", "person"."ban_expires",
     "person"."instance_id" AS p_inst,
   "community"."id" AS c_id, "community"."name" AS community_name,
   -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
   -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
   -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
   "community"."instance_id" AS c_inst,
   -- "community"."moderators_url", "community"."featured_url",
     ("community_person_ban"."id" IS NOT NULL) AS ban,
   -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
   -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
   --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
   --  "community_follower"."pending",
   ("post_saved"."id" IS NOT NULL) AS save,
   ("post_read"."id" IS NOT NULL) AS read,
   ("person_block"."id" IS NOT NULL) as block,
   "post_like"."score",
   coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread

FROM (
   ((((((((((
   (
	   (
	   "post_aggregates" 
	   INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
	   )
   INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
   )
   LEFT OUTER JOIN "community_person_ban"
       ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
   )
   INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
   )
   LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
   )
   LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
   )
   LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
   )
   LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
   LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
   )
WHERE 
  post_aggregates.id IN (
     SELECT id FROM post_aggregates
     WHERE "post_aggregates"."creator_id" = 3
     ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
     LIMIT 1000
  )
  AND
  (((((((
  (
  (("community"."deleted" = false) AND ("post"."deleted" = false))
  AND ("community"."removed" = false))
  AND ("post"."removed" = false)
  )
  AND ("post_aggregates"."creator_id" = 3)
  )
  AND ("post"."nsfw" = false))
  AND ("community"."nsfw" = false)
  )
  AND ("local_user_language"."language_id" IS NOT NULL)
  )
  AND ("community_block"."person_id" IS NULL)
  )
  AND ("person_block"."person_id" IS NULL)
  )
ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
LIMIT 10
OFFSET 0
;

 

If it isn't social hazing, then what is going on here? Why has this issue gone on since May and servers are crashing every day?

[–] [email protected] 1 points 1 year ago (4 children)

. However, I’m far from an expert,

Funny, because I'm a published author and expert on messaging systems... like Lemmy. Iv'e been building them since 1986 professionally.

There was a massive thread I posted dozens of comments on that came before today's pull request... I suggest you read that too.

Did you notice them even acknowledge server crashes are happening? Do you think developers ever suggest Memcache or Redis? Or discuss how Reddit solved their scaling in 2010 with PostgreSQL?

but perhaps they themselves felt attacked. I know that wasn’t your intention, but misunderstanding happen, especially over text.

I don't have any trouble understanding a bad SQL statement that has 14 JOINs and being told "JOIN is a distraction" after posting tons of examples.

Do we really need to spoon fed the stuff I did post?

Have you never seen social hazing in action? is it possible that I might be on to something going on psychologically besides my autism?

I can't believe anyone thinks a server should be crashing with 1 user on it.

[–] [email protected] -1 points 1 year ago* (last edited 1 year ago) (7 children)

may I voice my opinion on the exchange? This is coming from a place of trying to help, since I really do appreciate all the work you’ve put in and are putting in, and the fediverse can really use your talents, so I hope I don’t offend you.

Can you explain to me why it isn't social hazing?

it didn’t appear that you were being ignored/hazed

Do you know how to read a SQL statement? I just can't grasp how it isn't social hazing. I've been reading SQL statements for decades, this is obviously a problematic one.

Can you offer alternate explanations of how 3 people could think that SQL statement isn't ... poor performing and gong to cause problems? And how an SQL statement without a WHERE clause took them months to discover and fix?

Extreme hazing is my best answer. I just can't accept that the SQL statements don't speak for themselves along with the server crashes. 57K users for 1300 servers is very... taking several seconds to load 10 posts....

Look at the date... May... this has been going on since May. If it isn't social hazing ... what is it? I keep asking myself that.

 

So this is currently a thought experiment or brainstorming, whatever...

When a community is local, the "home server" for community, that means that the federation is sent out to subscribed servers.

Now it also means that the total subscribe list is only know to that one server... As least I think it works that way, that the subscribe of "home" server is the complete list and everyone else has partial lists.

There is a lot of the structure of a community that offers some data opportunities to Lemmy. For one, a community object can be edited by any of the moderators. and any moderator can Lock a post, Remove a post, etc.

Reddit has a not-often-used Wiki feature, that even basically means more than one person could edit content too. Other than the sidebar of a community, I don't think Lemmy has any concept of multiple people being able to edit a post or a comment. But again, multiple mods can feature or lock a post... so there is some concept of multiple-actors on data.

I think when you get into creating flair / tags and even playlists (multi-community lists), you want multiple people to be able to edit data. Which in Reddit was the Wiki structure.

i know some of it is kind of a waste of time.. because people tend to avoid Wiki and highly favor posting the same repeat questions and content over and over, reposts. But a man can dream, can't he :)

[–] [email protected] 18 points 1 year ago (1 children)

Why is there a lack of gifs/videos on Lemmy?

Lemmy's internal data performance is so horribly slow and crash-causing that I think the last thing they want is even more popular data.

Video is simply the most superior type of media there is, and I think that not having easy access to it on Lemmy is hurting it.

Video is more data, popularity is more data. For whatever reason, at every turn, I've seen developers turn away from scaling options like Memcache, Redis, or just abandoning ORM data management and rewriting the data interfaces by hand....

since the sites on which the videos are hosted can track you.

That's already true for images that are hot linked routinely, so I don't think video really changes it.

I've been baffled since June why data and fixing lemmy's data coding hasn't been front and center. It's pretty wild to witness so many come to Lemmy and then turn away... Elon Musk has been flocking people, Reddit, etc. It's as if the project wants to make code that won't work on any data. It's baffeling.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

You’re putting too much importance into this matter. If this is distressing you should let it go and think about something else.

The apologists come out of the woodwork around here who can't see an SQL statement for what it is, a charade. Anyone who has worked with SQL knows that this is bloated SQL statement and poorly engineered.

I notice the scientific facts of server crashing and SQL statements you won't discuss, but you sure dish out the social advice for me to "move along" like a Jedi mind trick. Let's talk about the human attraction to truth and honesty since you are so great at handing out life advice to people. What do you know about the works of Marshall McLuhan on media?

Repeating: Its’ as if the mere concept of Redis or Memcache never existed… and that nobody ever heard of JOIN performance problems. If it isn’t extreme social hazing, what is it?

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

ild rants. i’ve seen you do this many times. you need to step back, relax, and not take technical feedback so personally

I've stepped back and watched them ignore the issue since May when all the servers were crashing. Every single Lemmy server was falling over while they ignored the PostgreSQL problems.

The mistakes are obvious and huge. These are not minor topics.

our comments there are exceptionally aggressive. you accuse the developers of “hazing” you

I think they are hazing the entire World Wide Web, Reddit users, etc. How else can you explain such basic SQL problems that they have allowed to go on for so long?

Its' as if the mere concept of Redis or Memcache never existed... and that nobody ever heard of JOIN performance problems. If it isn't extreme social hazing, what is it?

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (2 children)

having a meltdown on github doesn’t help anybody.

I'm glad for you that mental control is so trival and you aren't near death in your life from your brain damage.

Go outside and take a breath

I just got back from dinner ant the months of hazing I've witnessed hasn't gone away. The level of social games being played with PostgreSQL in this project are levels beyond anything I've encountered in my 50+ years alive. And I've first hand seen Bill Gates and his team do all kinds of odd things to groups.

I am at a total loss to explain why such fundamentals of basic relational database are avoided in this project. If it isn't social hazing, what is it?

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (11 children)

Of course, that would be an insane amount of work, especially if it would get ignored, but something to consider!

I already did an insane amount of work to populate a Lemmy database with over 10 million posts. It is so incredibly slow out of the box that the normal API would take days to accomplish this. i had to rewrite the SQL TRIGGER logic to allow bulk inserts.

Here is my work on that:

DROP TRIGGER site_aggregates_post_insert ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER site_aggregates_post_insert
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION site_aggregates_post_insert();


DROP TRIGGER community_aggregates_post_count ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER community_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION community_aggregates_post_count();


DROP TRIGGER person_aggregates_post_count ON public.post;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER person_aggregates_post_count
   AFTER INSERT ON public.post
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION person_aggregates_post_count();



/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.site_aggregates_post_insert() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
   UPDATE site_aggregates SET posts = posts +
      (SELECT count(*) FROM new_rows WHERE local = true)
      WHERE site_id = 1
      ;

   RETURN NULL;
END
$$;


CREATE OR REPLACE FUNCTION public.community_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            community_aggregates ca
        SET
            posts = posts + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, community_id
            FROM new_rows
            GROUP BY community_id
             ) AS p
        WHERE
            ca.community_id = p.community_id;

    RETURN NULL;
END
$$;


/*
TRIGGER will be replaced with per-statement INSERT only
no Lemmy-delete or SQL DELETE to be performed during this period.
*/
CREATE OR REPLACE FUNCTION public.person_aggregates_post_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
        UPDATE
            person_aggregates personagg
        SET
            post_count = post_count + p.new_post_count
        FROM (
            SELECT count(*) AS new_post_count, creator_id
            FROM new_rows
            GROUP BY creator_id
             ) AS p
        WHERE
            personagg.person_id = p.creator_id;

    RETURN NULL;
END
$$;


/*
***********************************************************************************************
** comment table
*/


DROP TRIGGER post_aggregates_comment_count ON public.comment;


/*
TRIGGER will be replaced with per-statement INSERT only
*/
CREATE TRIGGER post_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION post_aggregates_comment_count();


-- IMPORTANT NOTE: this logic for INSERT TRIGGER always assumes that the published datestamp is now(), which was a logical assumption with general use of Lemmy prior to federation being added.
CREATE OR REPLACE FUNCTION public.post_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            -- per statement update 1
            post_aggregates postagg
        SET
            comments = comments + c.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, post_id
            FROM new_rows
            GROUP BY post_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id;


        UPDATE
            -- per statement update 2
            post_aggregates postagg
        SET
            newest_comment_time = max_published
        FROM (
            SELECT MAX(published) AS max_published, post_id
            FROM new_rows
            GROUP BY post_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id;

        UPDATE
            -- per statement update 3
            post_aggregates postagg
        SET
            newest_comment_time_necro = max_published
        FROM (
            SELECT MAX(published) AS max_published, post_id, creator_id
            FROM new_rows
            WHERE published > ('now'::timestamp - '2 days'::interval)
            GROUP BY post_id, creator_id
             ) AS c
        WHERE
            postagg.post_id = c.post_id
            AND c.creator_id != postagg.creator_id
            ;

    RETURN NULL;
END
$$;


DROP TRIGGER community_aggregates_comment_count ON public.comment;

CREATE TRIGGER community_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.community_aggregates_comment_count();


CREATE OR REPLACE FUNCTION public.community_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            community_aggregates ca
        SET
            comments = comments + p.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, community_id
            FROM new_rows AS nr
            JOIN post AS pp ON nr.post_id = pp.id
            GROUP BY pp.community_id
             ) AS p
        WHERE
            ca.community_id = p.community_id
            ;

    RETURN NULL;

END
$$;


DROP TRIGGER person_aggregates_comment_count ON public.comment;

CREATE TRIGGER person_aggregates_comment_count
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.person_aggregates_comment_count();


CREATE OR REPLACE FUNCTION public.person_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

        UPDATE
            person_aggregates personagg
        SET
            comment_count = comment_count + p.new_comment_count
        FROM (
            SELECT count(*) AS new_comment_count, creator_id
            FROM new_rows
            GROUP BY creator_id
             ) AS p
        WHERE
            personagg.person_id = p.creator_id;

    RETURN NULL;
END
$$;


DROP TRIGGER site_aggregates_comment_insert ON public.comment;

CREATE TRIGGER site_aggregates_comment_insert
   AFTER INSERT ON public.comment
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION public.site_aggregates_comment_insert();


CREATE OR REPLACE FUNCTION public.site_aggregates_comment_insert() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN

   UPDATE site_aggregates
      SET comments = comments +
         (
            SELECT count(*) FROM new_rows WHERE local = true
         )
      WHERE site_id = 1
      ;

    RETURN NULL;
END
$$;

With this in place, 300,000 posts a minute can be generated and reaching levels of 5 million or 10 million don't take too long.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

I heard that they gave out a bunch of free .ml domains and those are the ones they aren't allowing for free any more...

[–] [email protected] 9 points 1 year ago* (last edited 1 year ago) (10 children)

I already feel like I have to keep sticking my neck out to get them to question if using the ORM and a dozen JOIN statements isn't a problem.... but I guess I'll link it: https://github.com/LemmyNet/lemmy/pull/3900

As stated on my Lemmy user profile, I'm "RocketDerp" on GitHiub.

Honestly, the reason I keep making noise is because I'm sick of Lemmy crashing all the time when I come to use it... and I am on many servers that this happens. I really am not trying to piss off the developers, I even said I felt like I am being hazed, and I feel like hazing in general might explain what is going on with how much they are avoiding the elephant in the ROOM that ORM and a dozen JOIN might be the cause! Let alone the lack of Redis or Memcached addition being avoided, that's a second elephant on the second floor tap-dancing.... GitHub Issue 2910 was the straw that broke my back weeks ago, it took months for them to address it when it could be fixed in a couple hours (and it was weeks before the Reddti API deadline at the end of June.... and issue 2910 was neglected). The whole thing was a nightmare for me to watch...

10
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]
 

Help! I want to divorce ReadFn from ListFN - bypassing the Queries mutual closure behavior so I can better isolate some logic. My need is to get an independent ListFn...

fn queries_<'a>() -> Queries<
  impl ReadFn<'a, PostView, (PostId, Option, bool)>,
  impl ListFn<'a, PostView, PostQuery<'a>>,
> {

Context: https://github.com/LemmyNet/lemmy/blob/main/crates/db_views/src/post_view.rs

For sake of clarity... I'm not wanting to break the whole Queries joined closure marriage project-wide, just this one source file I want to be able to copy/paste this code twice and have just a single closure for ListFn.

Thank you.

 

Donald Trump faces four indictments, 91 criminal charges and hundreds of years of maximum prison time combined.

This is a former president who — according to the latest grand jury indictment in Fulton County, Georgia — participated in a “criminal enterprise.” Trump and 18 co-defendants are accused of trying “to unlawfully change the outcome of the election” in 2020. Among the 13 felony charges he faces is one count of violating the Georgia RICO (Racketeer Influenced and Corrupt Organizations) Act and two counts of conspiracy to commit forgery.

Most of those charges are related to a fake elector scheme by the Trump campaign in which a slate of “alternate” electors in Georgia would cast electoral votes for Trump instead of Joe Biden. The president of the most powerful democracy in the world allegedly tried to steal an election.

We can’t say it often enough: This is serious. Americans cannot shrug this off or normalize it, no matter how many times Trump gets indicted. Yet it feels like business as usual. Not only is Trump favored to win the GOP presidential nomination, he’s also neck and neck with President Biden in the 2024 general election, according to a July poll by the New York Times/Siena Poll.

MORE THAN A CULT

Trump’s support cannot only be explained as the product of the cult-like power he has over his MAGA base, which accounts for roughly 40% of Republican voters who believe those indictments are nothing but a conspiracy against him.

more: https://www.miamiherald.com/opinion/editorials/article278265068.html

 

I'm trying to wrangle in and get 'back to basics' with Lemmy's Diesel code and at every turn I run into not understanding the complexity of the Rust code.

You may want to do a GitHub checkout of this branch if you want to see what I'm attempting: https://github.com/LemmyNet/lemmy/pull/3865

I'm basing my experiments off the code in that pull request, which links to this branch on this repository: https://github.com/dullbananas/lemmy/tree/post-view-same-joins

Right now Lemmy's Diesel code spins up many SQL table joins and for an anonymous user it just passes a -1 user id to all the joins - and it really makes for difficult to read SQL statements. So I decided to experiment with removing as much logic as I could to get the bare-bones behavior on generating the desired SQL statement....

I copied/pasted the queries function/method and gave it a new name, kept removing as much as I could see that referenced the user being logged-in vs. anonymous, and got to this point:

fn queries_anonymous<'a>() -> Queries<
  impl ReadFn<'a, PostView, (PostId, Option, bool)>,
  impl ListFn<'a, PostView, PostQuery<'a>>,
> {
  let is_creator_banned_from_community = exists(
    community_person_ban::table.filter(
      post_aggregates::community_id
        .eq(community_person_ban::community_id)
        .and(community_person_ban::person_id.eq(post_aggregates::creator_id)),
    ),
  );

// how do we eliminate these next 3 assignments, this is anonymous user, not needed

  let is_saved = |person_id_join| {
    exists(
      post_saved::table.filter(
        post_aggregates::post_id
          .eq(post_saved::post_id)
          .and(post_saved::person_id.eq(person_id_join)),
      ),
    )
  };

  let is_read = |person_id_join| {
    exists(
      post_read::table.filter(
        post_aggregates::post_id
          .eq(post_read::post_id)
          .and(post_read::person_id.eq(person_id_join)),
      ),
    )
  };

  let is_creator_blocked = |person_id_join| {
    exists(
      person_block::table.filter(
        post_aggregates::creator_id
          .eq(person_block::target_id)
          .and(person_block::person_id.eq(person_id_join)),
      ),
    )
  };

  let all_joins = move |query: post_aggregates::BoxedQuery<'a, Pg>, my_person_id: Option| {
    // The left join below will return None in this case
    let person_id_join = my_person_id.unwrap_or(PersonId(-1));

    query
      .inner_join(person::table)
      .inner_join(community::table)
      .inner_join(post::table)
// how do we eliminate these next 3 joins that are user/person references?
      .left_join(
        community_follower::table.on(
          post_aggregates::community_id
            .eq(community_follower::community_id)
        ),
      )
      .left_join(
        community_moderator::table.on(
          post::community_id
            .eq(community_moderator::community_id)
        ),
      )
      .left_join(
        post_like::table.on(
          post_aggregates::post_id
            .eq(post_like::post_id)
        ),
      )
      .left_join(
        person_post_aggregates::table.on(
          post_aggregates::post_id
            .eq(person_post_aggregates::post_id)
        ),
      )
      .select((
        post::all_columns,
        person::all_columns,
        community::all_columns,
        is_creator_banned_from_community,
        post_aggregates::all_columns,
        CommunityFollower::select_subscribed_type(),
// how do we eliminate these next 3 for anonymous?
        is_saved(person_id_join),
        is_read(person_id_join),
        is_creator_blocked(person_id_join),
        post_like::score.nullable(),
        coalesce(
          post_aggregates::comments.nullable() - person_post_aggregates::read_comments.nullable(),
          post_aggregates::comments,
        ),
      ))
  };

  let read =
    move |mut conn: DbConn<'a>,
          (post_id, my_person_id, is_mod_or_admin): (PostId, Option, bool)| async move {

      let mut query = all_joins(
        post_aggregates::table
          .filter(post_aggregates::post_id.eq(post_id))
          .into_boxed(),
        my_person_id,
      );

        query = query
          .filter(community::removed.eq(false))
          .filter(post::removed.eq(false))
          ;

      query.first::(&mut conn).await
    };

  let list = move |mut conn: DbConn<'a>, options: PostQuery<'a>| async move {
    let person_id = options.local_user.map(|l| l.person.id);

    let mut query = all_joins(post_aggregates::table.into_boxed(), person_id);


      query = query
        .filter(community::deleted.eq(false))
        .filter(post::deleted.eq(false));


    // every SELECT has to labor away on removed filtering
      query = query
        .filter(community::removed.eq(false))
        .filter(post::removed.eq(false));

    if options.community_id.is_none() {
      query = query.then_order_by(post_aggregates::featured_local.desc());
    } else if let Some(community_id) = options.community_id {
      query = query
        .filter(post_aggregates::community_id.eq(community_id))
        .then_order_by(post_aggregates::featured_community.desc());
    }

    if let Some(creator_id) = options.creator_id {
      query = query.filter(post_aggregates::creator_id.eq(creator_id));
    }


    if let Some(url_search) = options.url_search {
      query = query.filter(post::url.eq(url_search));
    }

    if let Some(search_term) = options.search_term {
      let searcher = fuzzy_search(&search_term);
      query = query.filter(
        post::name
          .ilike(searcher.clone())
          .or(post::body.ilike(searcher)),
      );
    }

      query = query
        .filter(post::nsfw.eq(false))
        .filter(community::nsfw.eq(false));


    query = match options.sort.unwrap_or(SortType::Hot) {
      SortType::Active => query
        .then_order_by(post_aggregates::hot_rank_active.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::Hot => query
        .then_order_by(post_aggregates::hot_rank.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::Controversial => query.then_order_by(post_aggregates::controversy_rank.desc()),
      SortType::New => query.then_order_by(post_aggregates::published.desc()),
      SortType::Old => query.then_order_by(post_aggregates::published.asc()),
      SortType::NewComments => query.then_order_by(post_aggregates::newest_comment_time.desc()),
      SortType::MostComments => query
        .then_order_by(post_aggregates::comments.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopAll => query
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopYear => query
        .filter(post_aggregates::published.gt(now - 1.years()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopMonth => query
        .filter(post_aggregates::published.gt(now - 1.months()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopWeek => query
        .filter(post_aggregates::published.gt(now - 1.weeks()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopDay => query
        .filter(post_aggregates::published.gt(now - 1.days()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopHour => query
        .filter(post_aggregates::published.gt(now - 1.hours()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopSixHour => query
        .filter(post_aggregates::published.gt(now - 6.hours()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopTwelveHour => query
        .filter(post_aggregates::published.gt(now - 12.hours()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopThreeMonths => query
        .filter(post_aggregates::published.gt(now - 3.months()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopSixMonths => query
        .filter(post_aggregates::published.gt(now - 6.months()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
      SortType::TopNineMonths => query
        .filter(post_aggregates::published.gt(now - 9.months()))
        .then_order_by(post_aggregates::score.desc())
        .then_order_by(post_aggregates::published.desc()),
    };

    let (limit, offset) = limit_and_offset(options.page, options.limit)?;

    query = query.limit(limit).offset(offset);

    debug!("Post View Query: {:?}", debug_query::(&query));

    query.load::(&mut conn).await
  };

  Queries::new(read, list)
}

This compiles, but I can not progress further. There are 3 joins more that aren't really needed for an anonymous user... but the interdependent Rust object structures I can't unravel enough to remove them from the code.

For example, Lemmy allows you to "save" a post, but an anonymous user doesn't have that ability - so how can I remove the JOIN + select related to that while still satisfying the object requirements? I even tried creating a variation of PostViewTuple object without one of the bool fields, but it all cascades into 50 lines of compiler errors. Thank you.

 
SELECT id
    FROM my_table
    WHERE id IN (
     SELECT id
     FROM my_table
     WHERE criteria_a = 19
     ORDER BY create_when DESC
     LIMIT 1000
  );

This is the pattern I am looking for, but I need the criteria_a to be repeated for every value of criteria_a with the important focus being the LIMIT 1000 for any single value of criteria_a. There is no need to put a total LIMIT on the query, just to limit to the 1000 per criteria_a with the specific ORDER BY at that point. Put another way...

SELECT id
    FROM my_table
    WHERE id IN (
          SELECT id
		 FROM my_table
		 WHERE criteria_a = 19
		 ORDER BY create_when DESC
		 LIMIT 1000
	)
       OR id IN (
	  SELECT id
		 FROM my_table
		 WHERE criteria_a = 20
		 ORDER BY create_when DESC
		 LIMIT 1000
     );

Where I desire 2000 total rows. I could turn this into programming code (even a PostgreSQL FUNCTION) that loops over every value of criteria_a and replaces 19 in the example.

I don't care of it is a JOIN or an IN, I'm more stuck on how to repeat the inner SELECT with the LIMIT 1000 based on sort and criteria_a. Can I do it without looping and/or UNION? Thank you.

 

cross-posted from: https://lemmy.world/post/3252643

lemmy.readme.io uploaded some great API documentation to get started making your own Lemmy client.

Proved very useful in making my iOS client Lunar

 

lemmy_server PostgreSQL table for comment does not keep parent comment id directly, it uses a path field of ltree type.

by default, every comment has a path of it's own primary key id.

comment id 101, path = "0.101"
comment id 102, path = "0.102"
comment id 103, path = "0.101.103"
comment id 104, path = "0.101.103.104"

comment 103 is a reply to comment 101, 104 is a reply to 103.

A second table named comment_aggregates has a count field with comment_id column linking to comment table id key. On each new comment reply, lemmy_server issues an update statement to update the counts on every parent in the tree. Rust code issues this to PostgreSQL:

        if let Some(parent_id) = parent_id {
          let top_parent = format!("0.{}", parent_id);
          let update_child_count_stmt = format!(
            "
update comment_aggregates ca set child_count = c.child_count
from (
  select c.id, c.path, count(c2.id) as child_count from comment c
  join comment c2 on c2.path <@ c.path and c2.path != c.path
  and c.path <@ '{top_parent}'
  group by c.id
) as c
where ca.comment_id = c.id"
          );
      sql_query(update_child_count_stmt).execute(conn).await?;
    }

I've been playing with doing bulk INSERT of thousands of comments at once to test SELECT query performance.

So far, this is the only SQL statement I have found that does a mass UPDATE of child_count from path for the entire comment table:

UPDATE
    comment_aggregates ca
SET
    child_count = c2.child_count
FROM (
    SELECT
        c.id,
        c.path,
        count(c2.id) AS child_count
    FROM
        comment c
    LEFT JOIN comment c2 ON c2.path <@ c.path
        AND c2.path != c.path
GROUP BY
    c.id) AS c2
WHERE
    ca.comment_id = c2.id;

There are 1 to 2 millions comments stored on lemmy.ml and lemmy.world - ~~this rebuild of child_count can take hours, and may not complete at all. Even on 100,000 rows in a test system, it's a harsh UPDATE statement to execute.~~ EDIT: I found my API connection to production server was timing out and the run-time on the total rebuild isn't as bad as I thought. With my testing system I'm also finding it is taking under 19 seconds with 312684 comments. The query does seem to execute and run normal, not stuck.

Anyone have suggestions on how to improve this and help make Lemmy PostgreSQL servers more efficient?

EDIT: lemmy 0.18.3 and 0.18.4 are munging the less-than and greater-than signs in these code blocks.

view more: next ›