PostgreSQL

637 readers

3 users here now

The world's most advanced open source relational database

Project

About (history)
Docs
Donate to PostgreSQL
Wiki
Planet PostgreSQL
IRC
Mailing lists:
- pgsql-announce
- pgsql-hackers (developers)
- pgsql-general
- pgsql-jobs
User Groups

Events

SEAPUG Summer BBQ, 6 July in Seattle
SFBA PostgreSQL Meetup, 12 July
Chicago PostgreSQL Meetup, 19 July
PGDay UK 2023, 12 September in London
PGConf 2023, 3-5 October in New York City
PGDay Israel 2023, 19 October
PGConf.EU 2023, 12-15 December in Prague

Podcasts

postgres.fm (feed)

Related Fediverse communities

c/SQL on programming.dev
#sql on Mastodon
#postgresql on Mastodon

founded 1 year ago

MODERATORS

[email protected]

Can I accomplish this in a single SQL statement? (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

6 comments fedilink hide all child comments

SELECT id
    FROM my_table
    WHERE id IN (
     SELECT id
     FROM my_table
     WHERE criteria_a = 19
     ORDER BY create_when DESC
     LIMIT 1000
  );

This is the pattern I am looking for, but I need the criteria_a to be repeated for every value of criteria_a with the important focus being the LIMIT 1000 for any single value of criteria_a. There is no need to put a total LIMIT on the query, just to limit to the 1000 per criteria_a with the specific ORDER BY at that point. Put another way...

SELECT id
    FROM my_table
    WHERE id IN (
          SELECT id
		 FROM my_table
		 WHERE criteria_a = 19
		 ORDER BY create_when DESC
		 LIMIT 1000
	)
       OR id IN (
	  SELECT id
		 FROM my_table
		 WHERE criteria_a = 20
		 ORDER BY create_when DESC
		 LIMIT 1000
     );

Where I desire 2000 total rows. I could turn this into programming code (even a PostgreSQL FUNCTION) that loops over every value of criteria_a and replaces 19 in the example.

I don't care of it is a JOIN or an IN, I'm more stuck on how to repeat the inner SELECT with the LIMIT 1000 based on sort and criteria_a. Can I do it without looping and/or UNION? Thank you.

top 6 comments

sorted by: hot top controversial new old

[–] [email protected] 6 points 1 year ago (2 children)

I think you can use RANK as a window function and specify TOP 1000

[–] [email protected] 2 points 1 year ago

Ok, I'm doing some reading: https://medium.com/@amulya349/how-to-select-top-n-rows-from-each-category-in-postgresql-39e3cfebb020

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

Good results with this approach. I hadn't considered the RANK OVER PARTITION BY criteria_a values and it works like a champ. It moves the ORDER BY into the realm of focus (criteria_a) and performance seems decent enough... and it isn't difficult to read the short statement.

SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
  (
     SELECT id, post_id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank &lt;= 1000
;

Gives me the expected results over the 5+ million test rows I ran it against.

If you could elaborate on your idea of TOP, please do. I'm hoping there might be a way to wall the LIMIT 1000 into the inner query and not have the outer query need to WHERE filter rank on so many results?

[–] [email protected] 3 points 1 year ago (1 children)

Glad this is working for you. Using TOP probably was a bad idea and I think the way you used RANK <=1000 is a better approach.

If there was a way to safely exclude any of the records - like if you knew that when published was older than X days/months/years it would never make it into the final results, you could filter them out before ranking them. That might squeeze a little more performance out of the query, but could be risky if the data isn't predictable enough.

[–] [email protected] 1 points 1 year ago

Thank you

[–] [email protected] 2 points 1 year ago

I'm not at my desk ATM but I think this is a prime usecase for crosstabs.