I think you can use RANK as a window function and specify TOP 1000
PostgreSQL
The world's most advanced open source relational database
Project
- About (history)
- Docs
- Donate to PostgreSQL
- Wiki
- Planet PostgreSQL
- IRC
- Mailing lists:
- pgsql-announce
- pgsql-hackers (developers)
- pgsql-general
- pgsql-jobs
- User Groups
Events
- SEAPUG Summer BBQ, 6 July in Seattle
- SFBA PostgreSQL Meetup, 12 July
- Chicago PostgreSQL Meetup, 19 July
- PGDay UK 2023, 12 September in London
- PGConf 2023, 3-5 October in New York City
- PGDay Israel 2023, 19 October
- PGConf.EU 2023, 12-15 December in Prague
Podcasts
Related Fediverse communities
- c/SQL on programming.dev
- #sql on Mastodon
- #postgresql on Mastodon
Ok, I'm doing some reading: https://medium.com/@amulya349/how-to-select-top-n-rows-from-each-category-in-postgresql-39e3cfebb020
Good results with this approach. I hadn't considered the RANK OVER PARTITION BY criteria_a values and it works like a champ. It moves the ORDER BY into the realm of focus (criteria_a) and performance seems decent enough... and it isn't difficult to read the short statement.
SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
(
SELECT id, post_id, community_id, published,
rank() OVER (
PARTITION BY community_id
ORDER BY published DESC, id DESC
)
FROM post_aggregates) ranked_recency
WHERE rank <= 1000
;
Gives me the expected results over the 5+ million test rows I ran it against.
If you could elaborate on your idea of TOP, please do. I'm hoping there might be a way to wall the LIMIT 1000 into the inner query and not have the outer query need to WHERE filter rank on so many results?
Glad this is working for you. Using TOP probably was a bad idea and I think the way you used RANK <=1000 is a better approach.
If there was a way to safely exclude any of the records - like if you knew that when published was older than X days/months/years it would never make it into the final results, you could filter them out before ranking them. That might squeeze a little more performance out of the query, but could be risky if the data isn't predictable enough.
Thank you
I'm not at my desk ATM but I think this is a prime usecase for crosstab
s.