A pattern for constructing SQL queries in your code

Posted on Aug 16, 2020

SQL is a powerful language for querying information from a database. Anything that is powerful can also be dangerous. By now, SQL Injections are a well-known vulnerability type, but the lure of just doing something “quick-and-dirty” is always there.

In this post, I want to share a coding pattern I found very useful for constructing complex SQL queries, which are still safe from SQLi, while keeping your application code simple and readable. All the examples below are in Python, but the pattern is applicable to any other programming language that allows you to permute strings.

Note: The db.query() function below is a fictional API for preparing the SQL query (first argument) executing the prepared query with the parameters (second argument) and fetching all resulting rows.

Let’s start with a basic SQL query:

1
2
3
4
SELECT id,
       nickname
FROM   users
WHERE  last_seen >= current_timestamp - interval '1 day'

On a PostgreSQL server, this will give us the IDs and nicknames of all users that were logged in during the last 24 hours.

In our application code, we might wrap it up in a function like this, which can dynamically set the activity interval (in Python):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def get_active_users(within_limit):
    query = """
    SELECT id,
           nickname
    FROM   users
    WHERE  last_seen >= %s
    """
    ts = date.today() - timedelta(days=within_limit)
    results = db.query(query, params=(ts,))
    return results

While our application grows, we want to add more WHERE clauses over time. I have seen different approaches to this in a few codebases, but the one you should never use is directly injecting the parameters into the SQL. While it is “cheap” and “easy” to do so in any programming language, you must always use prepared statements (Obligatory XKCD #327).

Yes, also when you only have internal consumers of your API. In the end, you never know how your code might be reused by someone else (and which assumptions that someone else makes), so be on the safe side and always assume malicious input (Defensive programming).

But, you can still dynamically generate the SQL statements. For that, I like using the following pattern (works in any language, not just Python):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def get_active_users(within_limit):
    query = """
    SELECT id,
           nickname
    FROM   users
    WHERE  TRUE
    """
    params = []

    if within_limit:
        query += ' AND last_seen >= %s'
        params.append(date.today() - timedelta(days=within_limit))

    results = db.query(query, params)
    return results

First, we specify the base query with a WHERE TRUE (which doesn’t do anything, a no-op). Then, if necessary, we append necessary queries to the query string and append to the list of query parameters, neatly contained in an if clause.

What I like the most about this code is that it can easily be extended and modified by the next person going through the code.

To see how that works in practice, let’s add more arguments to the function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def get_active_users(within_limit, min_posts):
    query = """
    SELECT id,
           nickname
    FROM   users
    WHERE  TRUE
    """
    params = []

    if within_limit:
        query += ' AND last_seen >= %s'
        params.append(date.today() - timedelta(days=within_limit))

    if min_posts:
        query += ' AND posts >= %s'
        params.append(int(min_posts))

    results = db.query(query, params)
    return results

Now we can query active users either by 1) their last log in or 2) the minimum number of posts or 3) both!

The pattern also works when you need to add multiple parameters to one query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def get_active_users(within_limit, min_posts, signup_year):
    query = """
    SELECT id,
           nickname
    FROM   users
    WHERE  TRUE
    """
    params = []

    if within_limit:
        query += ' AND last_seen >= %s'
        params.append(date.today() - timedelta(days=within_limit))

    if min_posts:
        query += ' AND posts >= %s'
        params.append(int(min_posts))

    if signup_year:
        query += ' AND signup_date >= %s AND signup_date <= %s'
        params.append(date(signup_year, 1, 1)) # first day of year
        params.append(date(signup_year, 12, 31)) # last day of year

    results = db.query(query, params)
    return results

With this option we can query users that signed up in a particular year. I think by now you got the logic. At some point, you might have too many arguments in your function signature which makes the lines very long and either you or linting tool will complain about it. To fix that, we can pass all of these optional parameters as a filter object (or hash or dictionary, depending on your programming language):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def get_active_users(filter_params):
    query = """
    SELECT id,
           nickname
    FROM   users
    WHERE  TRUE
    """
    params = []

    if filter_params.get('within_limit'):
        query += ' AND last_seen >= %s'
        params.append(date.today() - timedelta(days=filter_params.get('within_limit')))

    if filter_params.get('min_posts'):
        query += ' AND posts >= %s'
        params.append(int(filter_params.get('min_posts')))

    if filter_params.get('signup_year'):
        query += ' AND signup_date >= %s AND signup_date <= %s'
        params.append(date(filter_params.get('signup_year'), 1, 1)) # first day of year
        params.append(date(filter_params.get('signup_year'), 12, 31)) # last day of year

    results = db.query(query, params)
    return results

This also has the advantage that when you add a new filter parameter, you don’t need to change the function signature (if you are using a statically typed language, you might need to edit the interface of the filter parameters, though). It also keeps the VCS diff pretty clean.

Finally, let’s say we need a bit more flexibility and complexity in our SQL query, instead of just appending WHERE clauses. With this pattern, we can use format strings, but without injecting values - just parameters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def get_active_users(filter_params, fields, limit=100):
    query_template = """
    SELECT id,
           nickname
    FROM   users
    {WHERE_QUERY}
    {LIMIT_QUERY}
    """
    params = []

    where_query = 'WHERE TRUE'
    if filter_params.get('within_limit'):
        where_query += ' AND last_seen >= %s'
        params.append(date.today() - timedelta(days=filter_params.get('within_limit')))

    if filter_params.get('min_posts'):
        where_query += ' AND posts >= %s'
        params.append(int(filter_params.get('min_posts')))

    if filter_params.get('signup_year'):
        where_query += ' AND signup_date >= %s AND signup_date <= %s'
        params.append(date(filter_params.get('signup_year'), 1, 1)) # first day of year
        params.append(date(filter_params.get('signup_year'), 12, 31)) # last day of year

    limit_query = 'ORDER BY signup_date ASC LIMIT %s'
    params.append(limit)

    query = query_template.format(
        WHERE_QUERY=where_query,
        LIMIT_QUERY=limit_query,
    )
    results = db.query(query, params)
    return results

The caller of our function can now also specify the maximum amount of items that should be returned (there is no need for an if clause here because the limit function argument has a default value set). This functionality can be extended into pagination.

As the code snippets above show, the complexity of this approach stays low despite adding more and more parameters. The code is readable and maintainable, not just executable by a machine. The next person will thank you for using this pattern to construct your SQL queries - and the next person might even be yourself when you re-visit the code a few month later and need to add another query parameter.

Happy querying!