You Probably Don't Need Query Builders
In a previous [0] post I’ve discussed why I ditched sea-query and why sql is almost always the best way to not re-learn something new from the beginning that will inevitably end up slowing you down or simply not working at all in the long run.
From time to time, I still stumble upon stackoverflow questions like this one [1].
OP is basically asking for a way to dynamically build a query based on a search filter that is declared like the following
struct Search {
id: i64,
username: Option<String>,
min_age: Option<i8>,
max_age: Option<i8>,
}
You may agree that this is a very common scenario in every single API that’s out there. You have an endpoint that returns some items and you want to expose a way to filter those items so that the client can ask only for things it’s interested in.
What’s even more interesting is that all the answers to OP involve some kind of
complex way to build the query and if
and else
that, at least in that
case, are most likely unnecessary.
One of the answers suggests the following implementation
fn search_query(search: Search) -> String {
let mut query = QueryBuilder::new("SELECT * from users where id = ");
query.push_bind(search.id);
if let Some(username) = search.username {
query.push(" AND username = ");
query.push_bind(username);
}
if let Some(min_age) = search.min_age {
query.push(" AND age > ");
query.push_bind(min_age);
}
if let Some(max_age) = search.max_age {
query.push(" AND age < ");
query.push_bind(max_age);
}
query.build().sql().into()
}
The above solution works just fine, sure - the fact that the field id
is
always present in the Search
filter simplifies stuff a bit. What if id
was
an Option<i64>
? If you want to keep using the query builder you’d need to
introduce another if
statement checking if the id
is present or not
fn search_query(search: Search) -> String {
let mut query = QueryBuilder::new("SELECT * from users");
if let Some(id) = search.id {
query.push(" AND id = ");
query.push_bind(search.id);
}
if let Some(username) = search.username {
query.push(" AND username = ");
query.push_bind(username);
}
if let Some(min_age) = search.min_age {
query.push(" AND age > ");
query.push_bind(min_age);
}
if let Some(max_age) = search.max_age {
query.push(" AND age < ");
query.push_bind(max_age);
}
query.build().sql().into()
}
Mmmh, the query seems to be broken now!
If each field is None
the query will work just fine, if one of them is at
least Some(_)
the query won’t work because that will translate to SELECT
* from users AND username = 'testname'
. A quick fix would be adding a WHERE 1 =
1
clause at the beginning of the query
fn search_query(search: Search) -> String {
let mut query = QueryBuilder::new("SELECT * from users WHERE 1 = 1");
if let Some(id) = search.id {
query.push(" AND id = ");
query.push_bind(search.id);
}
if let Some(username) = search.username {
query.push(" AND username = ");
query.push_bind(username);
}
if let Some(min_age) = search.min_age {
query.push(" AND age > ");
query.push_bind(min_age);
}
if let Some(max_age) = search.max_age {
query.push(" AND age < ");
query.push_bind(max_age);
}
query.build().sql().into()
}
Ahhh it's working again now!
Yep, that was an easy fix, but we can do even better
The sqlx
crate is capable of handling Option<T>
types easily: if the value
is Some(_)
that will be used in the binding, otherwise sqlx will bind NULL
for None
values. With that in mind we can use SQL to only apply those WHERE
clauses if the value IS NOT NULL
.
The function above becomes
async fn search_query(search: Search) -> String {
let query = r"
SELECT * from users
WHERE id = $1
AND ($2 IS NULL OR username = $2)
AND ($3 IS NULL or age > $3)
AND ($4 IS NULL or age < $4)
".into()
}
Note that you need to use check for IS NULL to avoid running the filtering when the value is None
This approach does not let you push bindings one by one as the previous method, but you don’t actually need it here, you can bind values all at once later. Let’s query the data directly
async fn search_query(search: Search, pg: &PgPool) {
let query = sqlx::query(r"
SELECT * FROM users
WHERE id = $1
AND ($2 IS NULL OR username = $2)
AND ($3 IS NULL OR age > $3)
AND ($4 IS NULL OR age < $4)
")
.bind(search.id)
.bind(search.username)
.bind(search.min_age)
.bind(search.max_age)
.fetch_all(pg)
.await
.expect("failed querying users");
}
This is what I usually prefer, it looks nicer and I don’t have to write more Rust logic that I’d have to test later. The dynamically built query shown before can end up being 16 different queries, on the other hand you only have one query if you exclusively use sql.
Another reason why I prefer to do queries this way is that I can copy and paste
the statement in Datagrip and test it directly in the database, mimicking what
sqlx
will end up doing.
In the following examples I am using Postgres syntax, it may be different for other engines like Sqlite or MySql
By now you should have a better idea of how you can work with sql to reduce the Rust logic that’s involved in your queries, but I’d like to give some other common examples and functions you can work with.
A common type that I encounter pretty frequently is a Vec<T>
. Most of the
times I do not want to filter at all if vec.is_empty()
. To make this a bit
more complicated let’s consider the scenario where I have an Option<Vec<T>>
and I only want to apply the filter if !vec.is_empty()
.
async fn filter(nicknames: Option<Vec<i64>>, pg: &PgPool) {
sqlx::query(
r"SELECT *
FROM users
WHERE 1 = 1
AND ($1 IS NULL
OR CARDINALITY($1::integer[]) = 0
OR nickname = $1
)"
)
.bind(nicknames)
.fetch_all(pg)
.await
.unwrap()
}
Let’s break it down:
-
$1 IS NULL
is satisfied ifnicknames.is_none()
and won’t apply the filter -
CARDINALITY($1::integer[])
stops the filtering ifnickname.is_some() && nickname.unwrap().len() = 0
-
Finally, if the vector is not
None
and its length is greater than one thennickname = $1
will filter all the users that havenickname
as nickname
Let’s move on to another similar scenario, this time you have a vector
represented as string with comma separated values: you may have an endpoint that
accepts a query parameter with multiple values separated by a comma (e.g
?ids=11,22,33,44
). My naive-self in the past used to create a fancy
custom deserializer function that transformed 11,22,33,44
from a String
into
a Vec<i64>
and that is useless work that could have easily been handled by the
database.
async fn filter(ids: String, pg: &PgPool) {
sqlx::query(
r"SELECT *
FROM users
WHERE
id IN (ARRAY_REMOVE(STRING_TO_ARRAY($1, ','), ''))"
)
.bind(ids)
.fetch_all(pg)
.await
.unwrap()
}
The syntax above works for Postgres but may be different for other database servers. I'm sure such a basic feature is available on most of them and you can use it.
(ARRAY_REMOVE(STRING_TO_ARRAY($1, ','), '')
creates an array of
ids by splitting comma separated values and also removes empty values in
case someone decides to mess with your backend and tries to pass
?ids=11,,,
.
The next feature I’d like to explore is probably the de-facto API must-have: pagination. Pagination basically lets your client say "give me page 2 with a maximum of 10 items in it". You can model that filter with the following struct
struct Filter {
/// current page
pub page: Option<i64>,
/// number of items per page
pub limit_per_page: Option<i64>
}
I’ve marked them as Option<i64>
because clients are rude and want you to do
the guessing. Most of them won’t ask for a specific page but they really want
the fist one. Also, if the client is lazy and does not provide a
limit_per_page
you should fallback to a good default value to not overload
your Postgres instance.
Pagination is tricky because a user could pass a ?page=-100
and negative pages
do not exist. You could have avoided that by using an Option<u64>
but then
you’d have to cast that value to an i64
whenever you want to bind that to a
query (in Postgres at least). To make things worse, an hacker could request a
limit_per_page=1000000000
to make your server crash and wake you up while
you’re sleeping because your app is dead.
My less sql-centric self would have implemented a filter validation to check for its correctness
impl Filter {
fn validate(&self) -> Result<(), String> {
match self.page {
Some(page) if page < 0 => {
return Err("page can't be negative")
}
_ => {}
}
match self.limit_per_page {
Some(limit) if limit < 10 && limit > 100 {
return Err("limit must be between 10 and 100")
}
_ => {}
}
Ok(())
}
}
But guess what… SQL has the solution for you
async fn filter(filter: Filter, pg: &PgPool) {
sqlx::query(
r"SELECT *
FROM users
LIMIT
CASE WHEN $2 > 0 THEN $2 END
OFFSET
CASE
WHEN $2 BETWEEN 0 AND 100 AND $1 > 0
THEN (($1 - 1) * $2)
ELSE
50
END"
)
// if page is not provided, fallback to 0
.bind(filter.page.unwrap_or(0))
// if limit is not provided, fallback to 50
.bind(filter.limit_per_page.unwrap_or(50))
.fetch_all(pg)
.await
.unwrap()
}
The LIMIT
statement is only applied if the passed value is greater than zero,
on the other hand the OFFSET
is applied only if the passed
limit_per_page
is between zero and a hundred and the page
is not
negative, all other cases are defaulted to OFFSET 50
.
Lastly, I’d like to cover an UPDATE
statement which I feel is also quite
common in APIs out there. Let’s say we have an UpdateForm
struct that models
our HTML form.
struct UpdateForm {
id: Uuid
name: Option<String>
surname: Option<String>
}
This form takes the user unique id and, optionally, its name and surname. I want
to provide a way for my client to only update the value that it passes to my
backend with a Some(_)
value. Seems like we can do just that with the use of COALESCE
async fn update_user(UpdateForm { id, name, surname }: UpdateForm, pg: &PgPool) {
sqlx::query(
r"UPDATE user
SET name = COALESCE($2, name),
surname = COALESCE($3, surname)
WHERE id = $1"
)
.bind(id)
.bind(name)
.bind(surname)
.fetch_one(pg)
.await
.unwrap();
}
The query above uses COALESCE
to only update those values that are
.is_some()
, the others will remain the same and no update will take place.
Neat! be mindful, `COALESCE` will return the first non-null value in its statement, so order matters!
As you can see we’ve got a lot done by employing sql alone, you can get smart with it and avoid writing over-complicated Rust logic in your queries, isn’t that a neater and better solution overall?