Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Data duplication virtually always happens as the organization gets larger. Even if you stick to RDBMS-only, you're going to get it once you setup master/slave replication, batch jobs, data science, tape backups, etc.

It's not a serious problem as long as you designate one data source as primary and every other data source as derived. That way, if you ever get data corruption issues, you can re-create the derived data and blow away the old copy. You also don't face complex data-sync issues, because all data-transfer is unidirectional. You write into Postgres or your other source-of-truth, and then you write from Postgres to ElasticSearch. If there's a difference between them, the Postgres version is authoritative, so copy its data over ElasticSearch.

Most of your other points are common to adding any new major feature, and you should always weigh the user benefits against the maintenance costs of a large new dependency. Presumably, if you're looking at ElasticSearch, it's because full-text search in your RDBMS is not good enough (which won't always be the case, but has been my experience and apparently the experience of several other people here).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: