It should also be noted that this isn't the only way to do this either.
The other three most common ways I've seen to store differently structured data in a consistent way are:
* Edge table
* Binary table
* Universal table
The edge table approach maps edges into key:value tables for the type of table. So you might have an integer data table that was ID<key>:Value<int>, and it only stored the data values and a key to identify the value. And then you have a thing table which stored Name<string>:Flag<type>:ID<key> where the flag column said "int" and the ID column then contains the key of some row in your integer table.
There are as many tables as there are types of data, + 1 table for the definition of all things stored.
The binary approach groups values of the same name into a table of that name. So if you have a property called "username" you'd have a table called "username", and this approach usually still includes the edge approach such that the values are stored in tables with ID:Value structure by type of value. Effectively partitioning the value table by name of the property being stored.
There are as many tables as there are types of data, + the count of unique property names.
The universal table approach uses a single table to store all data, effectively containing all of the edges in a graph. It's a conceptual full outer join of all binary tables. Imagine 100's of columns, and for each row the vast majority of those are NULLs.
The triplestore can be thought of as a condensed version (not type-safe) of the universal table. More space efficient, but usually at the cost of the datatype safety offered by the database.
You can combine these methods to store any type of semi-structured data in a traditional RDBMS and these approaches are used a lot if you know where to look for evidence of them.
SharePoint uses a version of the universal table. Oracle (for the ORA_XML storage type) uses a combination of the above under the hood to map XML into traditional database tables.
They all have pros and cons relating to query speed (for certain types of query), storage efficiency, indexing, types, etc. And as always you need to know why you're choosing something.
SharePoint also partitions their universal table dynamically. But they start with saying that each "SiteCollection" has it's own database/tablespace. As SiteCollections can contain a single site, this is basically equivalent to Reddit creating new tables per subreddit.
Elsewhere (up or down on this page) I saw that the SQLAlchemy author advises not to do this kind of thing. I simply say don't do if you don't know strongly why you should do it. Don't cargo-cult... go and read up on the problem space and only implement one of these solutions when you know the effects of doing so.
This is one of those times when analysis paralysis can be your friend and prevent you from building a mess you didn't need. If anyone starts with one of these designs and you haven't got a decade of experience to know why... you are probably doing it wrong.
The other three most common ways I've seen to store differently structured data in a consistent way are:
The edge table approach maps edges into key:value tables for the type of table. So you might have an integer data table that was ID<key>:Value<int>, and it only stored the data values and a key to identify the value. And then you have a thing table which stored Name<string>:Flag<type>:ID<key> where the flag column said "int" and the ID column then contains the key of some row in your integer table.There are as many tables as there are types of data, + 1 table for the definition of all things stored.
The binary approach groups values of the same name into a table of that name. So if you have a property called "username" you'd have a table called "username", and this approach usually still includes the edge approach such that the values are stored in tables with ID:Value structure by type of value. Effectively partitioning the value table by name of the property being stored.
There are as many tables as there are types of data, + the count of unique property names.
The universal table approach uses a single table to store all data, effectively containing all of the edges in a graph. It's a conceptual full outer join of all binary tables. Imagine 100's of columns, and for each row the vast majority of those are NULLs.
The triplestore can be thought of as a condensed version (not type-safe) of the universal table. More space efficient, but usually at the cost of the datatype safety offered by the database.
You can combine these methods to store any type of semi-structured data in a traditional RDBMS and these approaches are used a lot if you know where to look for evidence of them.
SharePoint uses a version of the universal table. Oracle (for the ORA_XML storage type) uses a combination of the above under the hood to map XML into traditional database tables.
They all have pros and cons relating to query speed (for certain types of query), storage efficiency, indexing, types, etc. And as always you need to know why you're choosing something.
SharePoint also partitions their universal table dynamically. But they start with saying that each "SiteCollection" has it's own database/tablespace. As SiteCollections can contain a single site, this is basically equivalent to Reddit creating new tables per subreddit.
Elsewhere (up or down on this page) I saw that the SQLAlchemy author advises not to do this kind of thing. I simply say don't do if you don't know strongly why you should do it. Don't cargo-cult... go and read up on the problem space and only implement one of these solutions when you know the effects of doing so.
This is one of those times when analysis paralysis can be your friend and prevent you from building a mess you didn't need. If anyone starts with one of these designs and you haven't got a decade of experience to know why... you are probably doing it wrong.