Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Postgres could in fact be used here by creating an append-only table similar to:

id, data (JSON field)

However, Postgres doesn't have good support for creating derived append-only logs (=streams) from that table. Kafka has Kafka Streams and KafkaSQL. And producer+consumer APIs that are a good fit for NYTs use case.

> Am I missing something?

Remember that in addition to schema changes, NYT also wants to avoid row changes. One reason was that all search indices & other systems need updating too during a row change in a DB and this will lead to inconsistencies in large scale (sometimes some of these updates fail).

IMHO the thing you are missing is the log based architecture where all databases are derived/materialised from the SSOT: the log.



Pretty much all databases are already a transaction log that gets materialized. Postgres and many other RDBMS take in changes, write to the WAL, then provide tables that are views of the latest data of each row.

If you want total history and the database doesn't support this automatically then you can easily insert news rows instead (like you described) and then just create an SQL view that then shows the latest versions of each row, while also adding views for all kinds of other data access patterns. Companies have been doing this for decades because it's self-contained, fast, and reliable.

Using Kafka and separate processes to do this is deconstructing the RDBMS into separate layers that you now have to manage yourself. Useful if you really have that kind of scale but at 100GB of data, it's just silly. Use kafka as a work queue but leave the database work to actual database software.


However, Postgres doesn't have good support for creating derived append-only logs ???? A one line trigger will give you derived append only log that is transactionally consistent.


so how does that one-liner transform data from log A to log B in realtime?

are talking about creating a new table or materialised views or stored procedures or...?


In your post you are proposing to have a table be equivalent to a kafka topic. So derived "log"/topic can be another table updated with a trigger.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: