I've been exploring the idea of using SQLite to publish data online via my Datas...

oscargrouch · on Jan 12, 2022

I also have a project to explore this alternative way of peers communication but i have a different answer to this, and i think its better if its a network of peers that expose API's

https://github.com/mumba-org/mumba

It's badly documented as i have just published to github, but i hope it gives a clue of how is supposed to work.

I'm on the final touches over this project, but the main concept is already working as is 90% of it, but i think exposing SQL is too raw, and maybe dont offer the whole picture, as for instance, what is important is not data, but sometimes pure computation.. Eg. suppose you offer a deep leaning inference where you receive and give back tensors..In the middle of it is a different sort of computation, where it doesnt have anything to do with databases.

Or yet, suppose you need to access something in a third-party before giving an answer, or if you want to do it in a distributed fashion without you api consumer even noticing it?

API's are a good answer to that, and in my opinion are superior interfaces, whatever the semantic web of the future will be, it will need this network of API peers to work as a floor to it.

For instance, you can design a Graph API on top of it. Exposing your data layer directly is bad engineering as there's a lot of problems you wont be able to solve, and where leaving clients to talk to "you" over a well-defined API will.

To put it simply, in my point of view the direction the semantic-web is pointing to is cool, but the answer is not the right one, and this idea of exposing SQLite directly while is cooler, yet have the same flaws, or else something as GraphQL would have taken the world as its not much a different answer than the one presented here.

simonw · on Jan 12, 2022

I've thought a bit about the problem of exposing your underlying database - that's obviously a problem for creating a stable API, because it means you may be unable to change your internal database schema without breaking all of your existing API clients!

With Datasette, my solution is to specifically publish the subset of your data in the schema that you think is suitable for exposing to the outside world. You might have an internal PostgreSQL database, then use my db-to-sqlite tool - https://datasette.io/tools/db-to-sqlite - to extract just a small portion of that into a SQLite database which you periodically publish using Datasette.

The other idea I have is to use views. Imagine having a PostgreSQL database with a couple of documented SQL views that you expose to the outside world. Now you can change your schema any time you like, provided you then update the definition of those views to expose the same shape of data that your external, documented API requires.

As with all APIs of this sort, adding new columns is fine - it's only removing columns or changing the behaviour of existing problems that will cause breakages for clients.

transfire · on Jan 12, 2022

I wonder if database engines will ever have versioning, such that it would always be possible to see the database as it was at different points in time.

simonw · on Jan 12, 2022

MariaDB has this and it sounds really powerful: https://simonwillison.net/2018/Apr/25/system-versioned-table...

edtechdev · on Jan 12, 2022

TerminusDB is a graph database that supports versioning https://terminusdb.com/

simonw · on Jan 12, 2022

A couple of other plugins that work along similar lines:

- https://datasette.io/plugins/datasette-ics can be used to generate ICS calendar feeds, which you can subscribe to using desktop calendars or Google Calendar

- https://datasette.io/plugins/datasette-geojson can generate GeoJSON files for any SpatiaLite database table with a geometry column.