Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

| A note about count(1) vs count(* ). One might think that count(1) would be faster because count(* ) appears to consult the data for a whole row. However the opposite is true.

Which is the opposite decision of other databases. So sometimes Postgres does make bad decisions...



I think the "the opposite is true" is in reference to the assertion that "count(⧆) appears to consult the data for a whole row", not to the proposition that "one might think that count(1) would be faster [than count(⧆)]".

count(⧆) just counts the tuples themselves, which is fast; it's like counting heap-allocated data structures by counting their pointers (which you're already walking), without dereferencing those pointers.

count(1) counts the result of evaluating the SQL expression "1" upon landing on each row, but still walks the same pointers to do so.

So, in terms of time complexity, they're roughly equivalent. Both data items (the tuple and the SQL constant expression) are already on the stack, ready to be directly computed upon.

Postgres's count(1) isn't slower than the one in any other DBMS. It's just their count(⧆)—at least the expression-evaluation part of it—which is more optimized than the one in other DBMSes. Nothing wrong with that, IMHO.


I'm surprised count(<constant>) is not special-cased in pretty much every database, and count(*) as well. There's no reason either would do anything further counting the number of records / tuples.


> Postgres's count(1) isn't slower than the one in any other DBMS.

count( * ) is faster on Postgres than count(1). But both are fundementally slow because of MVCC. And count( * ) on postgres (the optimized one on Postgres) is much slower than count(1) on other databases (the optimzed one on other databases). So practically speaking, counting rows is slower on Postgres than on other databases.

That said, I love Postgres. I use it every day. There is some room for improvement and it does improve all the time. It is an amazing open source project. And I wouldn't care at all if they never optimize count(1)


Yes, I was trying to make a finer point—the difference between count(⧆) and count(1) comes down to the cost of filtering the row, and in Postgres count(1) is a regular filtering operation—taking the same filtering cost that count(1) has in other DBMSes—while count(⧆) has a filtering cost that is lower than that of other DBMSes, because it has been specifically optimized†.

Separately, there's an MVCC cost of walking the rows to filter them, and other DBMSes optimize walking rows for counting [usually causing both count(1) and count(⧆) to be faster], while Postgres does not do this optimization. (And, as stated in the article, in those DBMSes, this isn't a pure optimization per se, but is rather a trade-off, trading write speed for all INSERTs/DELETEs for read speed for this particular case.)

(† Technically, the filtering cost of count(⧆) hasn't been specifically optimized; the relative speed of filtering tuples for count(⧆) is an emergent property of the general fact that Postgres treats any mention of `⧆` as a reference to the row-tuple object itself. i.e. If `foo` is a table (x int, y text), then in actuality, `foo` is first created as a type [a pg_class] defined as the tuple (int, text); and then the table `foo` is defined as a relation persisting a rowset of `foo`-tuple-typed rows [in est making a table['s triggerable operations] each into a stored procedure with a `foo`-tuple-typed-rowset return type.] Then, the expressions `(SELECT ⧆ FROM foo)`, and `(SELECT f.⧆ FROM foo f)` both evaluate to rowsets type `foo`, which means that Postgres doesn't need to dereference the pointer to each `foo` heap tuple to build those rowsets. It only needs to dereference the pointers when it comes time to actually serialize and emit the row over the wire—which in case of a `count(⧆)` operation, never happens.)


I'm not sure I understand your statement. On databases without an intrinsic row count property the only way to find the row count is to go to every page and count the rows there. The query planner will use the same plan for both `count(*)` and `count(1)`.


An interesting postgres oddity I ran into a few years ago that isn't about counting but is in a similar arena is how under very specific circumstances, "SELECT *" is significantly faster than "SELECT one_column" despite the conventional and normally well advised advice that you should only select the columns you need.

See https://www.postgresql.org/message-id/CAN1FPGN1ynBj3m1DMszc9... and Tom Lane's followup for details.


Not according to that blurb. There have been other databases where count(1) was much faster. Here they are saying that in Postgres, count(1) is slower because star is treated as no arguments, and count(1) is more complex.


That is not what "they" are saying. count(1) and count(*) are equally fast because they are doing the same thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: