The issue with plain SQL is simply that you can not compose queries at runtime.
For example let user decide which columns to select, in which order to fetch the rows, which table to query etc.
Or not even let the user decide but decide based on some config file or the system state.
You end up with a bunch of string manipulation that is fragile and does not compose well.
What solution is there except from simulating SQL via your languages nestable data structures?
Still fundamentally manipulating SQL text (which is a feature as I don't want to learn a full DSL), but it handles wrangling embedded placeholders while you're composing stuff and some other common compositional tasks. It's worked well for me anyway but I'm under no illusions it'd be right for everyone.
Not an original concept regardless; my original version of this was in Node: https://github.com/bdowning/sql-assassin, but a few years after I wrote that (and mostly didn't use it) I found https://github.com/gajus/slonik which was very similar and much more fleshed-out; I rolled _some_ of its concepts and patterns into sql-athame.
In my experience, such problems only occur when people believe that common software engineering practices do not apply when writing stored procedures. Just have your stored procedures version controlled, tested, and deployed like any other code.
The real problem is that you're almost always shifting work from a language that is well known and understood by you and/or your team to one that is less, or even poorly, understood and known, and you end up incurring the cost of novice programmers, which can be a real problem for both security and performance.
If you have good knowledge and experience in the language your preferred version of SQL implements, that's good. If you just have people that understand how to optimize schemas and queries, you might find that you encounter some of the same problems as if you shelled out to somewhat large and complex bash scripts. The value of doing so over using your core app language is debatable.
That seems to assume some very specific "scene" of developers, or at least a team environment where it's impossible to select for developers who know SQL. That may very well be the case in your situation, but I see no reason to expect it would be the case in most situations. SQL isn't exactly a new or niche language.
I'm not talking about plain SQL, I'm talking about the included procedural language that often ships with it[1] as an extension and which is much more DBMS specific.
So, the question is, does your team know SQL/PSM, or PL/SQL, or PL/pgSQL, or some other variant, and how well.
If your front-end is written in React and your business logic is written in SQL, is it really fair to call what's left in the middle tier a "core app language"? If you're writing a SaaS application today, you're more likely to want to rewrite your Java middle tier in Go than replacing your DBMS.
Not everyone is making a web app, or even something amenable to using React native, and even if they are, there's no guarantee that their middle tier isn't also in JavaScript.
That said, I wasn't making a case about replacing your DBMS. I specifically avoided that because yes, most people stay with what they know and used, and even if they switch, they switch for a different project, not within the same project. There are some cases where multiple DBMS back-end support is useful, but I think that's a fairly small subset (software aimed towards enterprises which wants to ease into your existing system and note add new requirements, and open source software meant to use one of the many DBMS back-ends you might have).
My actual point is more along the lines of:
- Most DBMS hosted languages I've seen are pretty shitty in comparison to what you're already using.
- The tooling for it is likely much worse or possibly non-existent.
- You are probably less familiar with it and likely to fall into the pitfalls of the language. All languages have them, shitty languages have more. See first point.
- If you accept those points and the degree to which you accept them should definitely play a role in deciding to use stored procedure you've written in the language your DBMS provides.
- I think trade offs are actually similar to what you would see writing chunks of your program in bash and calling out to that bash script. People can write well designed and safe bash programs. It's not easy, and there are a lot of pitfalls, and you can do it in the main language you're writing probably. Thus the reasons against calling out to bash for chunks of core are likely similar to the reasons against calling a stored procedure.
You make some good points. I personally have experience with business web applications, i.e. large complex data models with rather simple updates and report-like queries. These types of queries combined with a well normalized data model map well to set-based SQL. Of course, it's a different story if you're writing technical applications or games that are more about crunching numbers than querying and updating data.
To me, the shitty procedural languages you mention are just for gluing queries together. The important stuff happens in SQL and the simplicity of keeping it all in the database is worth it.
This isn’t my experience at all, but I suppose it depends on how you build your applications. I used to be of your opinion, before we moved more and more of our code based from .Net to Python an I was a linq junkie, but these days I think you’re crippling your pipeline if you don’t utilise SQL when and how it’s reasonable to do so.
There are a lot of data retrieval where a stores procedure will will save you incredibly amounts of resources because it gives you the exact dataset you need exactly when you need it.
That's a bit of an absolutist stance I think, a good programmer can use stored procedures just fine and in some cases it can even improve performance. Yes it makes it easier for bad programmers to write bad code, but bad programmers will write bad code no matter what tools they have in their toolbox.
With stored procedures you run your code inside of the SQL server process.
With SQLite, your entire application IS the process, and your SQLite data moves from disk to your app processes RAM.
I think the SQLite model is much better as you get to use your modern language and tooling (which is better than the language used for stored procedures which has not changed in 20 years, and is generally a bear to observe, test, develop).
> simulating SQL via your languages nestable data structures
I've gone down this rabbit hole... you've described the in-house query engine I work with.
Instead of sql, queries are nested json representing a dumbest-possible sql parse tree. It's only a subset of sql features, but it provides, as you point out, composability. By having the query as a data struct, all sorts of manipulations become possible... toggle fields for access control; recovery from dropped or renamed columns (schema has a changelog); auto-join related tables as required; etc.
It's magic when it works but it suffers from the drawbacks noted elsewhere in the thread about traditional ORMS... onboarding new devs, debugging when something goes wrong.
Adding new features to the dsl is a pain because the surface area is not just _the query_ but _all the queries it might form_ when composed with other queries in the system O_o
> It's magic when it works but it suffers from the drawbacks noted elsewhere in the thread about traditional ORMS...
(Not that there are lots of other devs on my project at the moment, but)
I am managing this by aggressively limiting the size / features of my query builder, so it's (relatively) easy to understand just by looking at the code.
If it ever passes that point, it's probably time to switch to a "real" query builder / ORM.
I'm curious what feature subset you settled with. Ours got as far as aggregate ops, and the complexity of that turned out to be 'hydra'. (Surely I've cut off the last head!)
By that point we were painted into a corner with the in-house query builder... there was no off-the-rack ORM that had semantics for combining queries. (i.e. meld together query A and query B so that the result cells are union or intersection.)
You seem familiar with the problem space, I'm curious what your experience was. What feature(s) made you draw the line and say, 'when we need it we'll migrate to X'? What are your "real ORM" candidates for X to bridge that gap?
Tightly bound. Basically you have a bit of query string with question marks in it and you bind out your data points. Either for sending/recieving.
Loosely bound. Here is a totally composed string ready to go just run it. Also a good way to make SQL injection attacks.
Both involve string manipulation.
I think the issue is the ODBC interface does not really map to what SQL is and does. The column binding sort of does. But not table, not where, not having, not sort, etc. So we end up re-inventing something that pastes over that. Building a string to feed into an API that feeds that over the network to a string parser on the other side to decompose it again then runs it and goes back the other way in a similar manner.
This is simpler than what gp is asking for, they want the ability to dynamically change which columns are requested too, so change it to "foo, bar, baz", but still do so with reasonable safety.
fmt ("SELECT {:v}, {:v} FROM {:v} WHERE id = {}", colname1, colname2, tablename, id);
The :v means "don't quote this, insert this string exactly as specified." It also has specialized handling for NULL values (i.e. it can generate 'IS NULL' instead of '= ...' if the value passed on is intended as a NULL value).
Honestly though, how often does this scenario actually happen. Devs love to build for what if, but there's not a ton of scenarios where you need to build completely custom queries based on user input. A nice SPROC with some inputs can do most of that while maintaining performance
Pretty much every web app I've written has had some sort of search feature with a table of results, where the user can filter on multiple columns of the search. That requires either:
a. Dynamically adding "and SomeColumn like @someValue" to the where clause of the query
or
b. Having the query include all filters but ignore them when they are left blank, such as
`where @someValue is null or SomeColumn like @someValue`
The latter approach does allow you get by without string-building queries for filtering, but tends to perform worse if you have a lot of columns you might filter on.
And you still have to play the string concatenation game if you let the user click column headers to change the ORDER BY.
With a decent ORM, like Entity Framework for C#, any of this is very simple and there is no need to write SQL string manipulation code in your application. But under the covers that is what is happening.
Is this just a complaint about a lack of good tooling for SQL code? User-configurable columns are probably always going to be implemented as an array of columns that gets built at runtime and then sent to whatever database management system you're using. In any programming language and database system, isn't the only thing automatically preventing you from building a malformed array of columns at runtime some sort of type-checking or linting? It seems to be that this mistake is just as easy to make using an ORM or data access library without type-checking or linting.
The way LINQ in C# works allows a good amount of type-checking to be done on the expressions fed into the ORM. So you can compose queries in a much more sensible way than string concatenation. The only downside is that LINQ makes it easy to express certain things that SQL makes hard, so you can sometimes write LINQ queries that turn into monstrous SQL.
I authored a library for F# that did static typechecking of SQL in strings at compile time. So it could tell you if you, for example, mistyped a column name or selected a column that didn't exist on a table or passed a SQL function the wrong number or type of parameters, etc. That was nice, but it still sucked for trying to compose queries or dynamically add filtering/sorting to a query.
The main place I've run into this is a table with sorting, pagination and filtering while lazy loading data and I encounter this need a lot.
With some runtime query builder most databases will have decent performance for the space between "too much to reasonably load in memory" to "we need a dedicated query service like elastic". Unfortunately taking the example of sort by X then by Y where X and Y are dynamic I don't know of a nice solution in SQL only in MSSQL 2012 or MySQL 5.
I think you are playing into the comment, though. You'd be surprised how far you can go with a table that doesn't support sorting and filtering. Basic pagination can often be foregone, if you can instead resort to offering export and let them use whatever software they are used to for this purpose. (Pagination is also just easier if you forego custom sort and filtering.)
At some point, yes, you will want these things. But the basics can get you really far.
For sure you can offer a good approximation through SQL but I'm unlucky enough that I've needed to support dynamic sorting and query builders against tables at least 4 times for different datasets in the past years.
I'm particularly proud of the last one I built that still absolutely flies despite doing wildcard querying on text fields (no full text index) with multiple combined queries and a dynamically computed status from aggregate child properties in MySQL 5 with its lack of window functions. The solution was basically dynamically construct an efficient as possible inner query before applying the joins.
To clarify I believe in doing as much as possible in raw SQL and have a fondness for the stored procedure driven system I worked with but this problem remains unsolvable in SQL alone and I just keep encountering it.
For example let user decide which columns to select, in which order to fetch the rows, which table to query etc. Or not even let the user decide but decide based on some config file or the system state.
You end up with a bunch of string manipulation that is fragile and does not compose well.
What solution is there except from simulating SQL via your languages nestable data structures?