I reduced GraphQL codebase size by 40% and increased type coverage to 90%+

Using code generation to create data loaders for all database resources

Developing APIs has never been easy. Aside from the API design and documentation, simply keeping API up to date with the database schema is a challenge.

GraphQL solves part of the problem. To a large extent, GraphQL APIs are self-documenting. I will even go as far as to say that GraphQL makes API design a lot more simple. For one, you don’t need to concern yourself with the shape of the response – request defines the required fields. Furthermore, ability to deprecate individual fields, inspect performance impact of specific fields and observe how your API is being used allows rapid development.

A single database change can trickle dozens of changes in the API codebase. Using abstractions such as data loaders and strict types helps — as long as strict-type declarations are up to date, a type linter will guide you through all the breaking changes. However, keeping data loader queries and strict type declarations up to date is time consuming and error-prone.

Image for post
Image for post
Margaret Hamilton (1.52 m) poses with a print of an avg. codebase Flow type declarations.

Generating Flow types of PostgreSQL database resources

CREATE TABLE reservation_confirmation_email_task (
id integer NOT NULL,
reservation_id integer NOT NULL,
failed_attempt_count integer DEFAULT 0 NOT NULL,
started_at timestamp with time zone,
ended_at timestamp with time zone,
created_at timestamp with time zone DEFAULT now() NOT NULL
);

Database schema tells you the field names and their types – all you need is a tool that can read the database and generate Flow type declarations. In this particular case, the Flow type equivalent is:

type ReservationConfirmationEmailTaskRecordType = {|
+id: number,
+reservationId: number,
+failedAttemptCount: number,
+startedAt: number | null,
+endedAt: number | null,
+createdAt: number
|};

ReservationConfirmationEmailTaskRecordType has been generated using PostLoader.

All thats left to do is reference the type records across the codebase. Then you can sleep safe knowing that your codebase and database are in sync.

This is great – automating type generation removes massive amount of code that would otherwise need to be typed by hand (this is where the 40% figure comes from). Similarly, code generation prevents human errors such as wrongly describing a database resource. However, you still need to load the data from the database.

Loading data from the database

Continuing with the reservation_confirmation_email_task example, it could be as simple as:

// @flowimport {
sql
} from 'slonik';
import type {
DatabaseConnectionType
} from 'slonik';
const getReservationConfirmationEmailTasksByIds = (
connection: DatabaseConnectionType
): Promise<$ReadOnlyArray<ReservationConfirmationEmailTaskRecordType>> => {
return connection.any(sql`
SELECT
id,
reservation_id "reservationId",
failed_attempt_count "failedAttemptCount",
started_at "startedAt",
ended_at "endedAt",
created_at "createdAt"
FROM reservation_confirmation_email_task
WHERE id IS IN ${id}
`);
};

You would then construct a DataLoader:

const ReservationConfirmationEmailTaskLoader: DataLoader<number, ReservationConfirmationEmailTaskRecordType> = new DataLoader(getReservationConfirmationEmailTasksByIds.bind(connection));

(In case you are not familiar with DataLoader, it is a must for a GraphQL codebase.)

Good. This is a perfect setup. Every resource is loaded using a data loader that batches the requests. Resources have strict type association. Strict types are generated using database schema. Except — This is a lot of typing.

Generating data loaders of PostgreSQL database resources

Using PostLoader is simple:

export POSTLOADER_DATABASE_CONNECTION_URI=postgres://postgres:password@127.0.0.1/test
export POSTLOADER_COLUMN_FILTER="return /* exclude tables that have a _view */ !columns.map(column => column.tableName).includes(tableName + '_view')"
export POSTLOADER_TABLE_NAME_MAPPER="return tableName.endsWith('_view') ? tableName.slice(0, -5) : tableName;"

postloader generate-loaders > ./PostLoader.js

This will generate code equivalent to:

// @flowimport {
getByIds,
getByIdsUsingJoiningTable
} from 'postloader';
import DataLoader from 'dataloader';
import type {
DatabaseConnectionType
} from 'slonik';
export type UserRecordType = {|
+id: number,
+email: string,
+givenName: string | null,
+familyName: string | null,
+password: string,
+createdAt: string,
+updatedAt: string | null,
+pseudonym: string
|};
// [..]export type LoadersType = {|
+UserByIdLoader: DataLoader<number, UserRecordType>,
+UsersByAffiliateIdLoader: DataLoader<number, $ReadOnlyArray<UserRecordType>>,
// [..]
|};
// [..]export const createLoaders = (connection: DatabaseConnectionType, NotFoundError: Error) => {
const UserByIdLoader = new DataLoader((ids) => {
return getByIds(connection, 'user', ids, 'id', '"id", "email", "given_name" "givenName", "family_name" "familyName", "password", "created_at" "createdAt", "updated_at" "updatedAt", "pseudonym"', false, NotFoundError);
});
const UsersByAffiliateIdLoader = new DataLoader((ids) => {
return getByIdsUsingJoiningTable(connection, 'affiliate_user', 'user', 'user', 'affiliate', 'r2."id", r2."email", r2."given_name" "givenName", r2."family_name" "familyName", r2."password", r2."created_at" "createdAt", r2."updated_at" "updatedAt", r2."pseudonym"', ids);
});
// [..]return {
UserByIdLoader,
UsersByAffiliateIdLoader,
// [..]
};
};

Consuming the code is as simple. Simply construct the loaders collections and use then as you would. Refer to the usage examples.

The generated files

Meanwhile, increased strict type coverage allowed to discover several not so small bugs that would have otherwise caused a turmoil.

Image for post
Image for post
I wonder how many people got upset because I have misattributed Margaret Hamilton’s work. I am sorry.

The example in article is using a simple PK lookup. This covers most of the use cases, at least when it comes to a GraphQL database and related resource resolution. However, PostLoader is capable of generating a lot more complex loaders, e.g. using joining tables and other columns with unique constraints. Refer to the behaviour documentation.

What makes this different from using an ORM?

  1. ORM is not going to give you strict types and code completion.
  2. ORM has runtime overhead for constructing the queries and formatting the results.
  3. You would still need to type all the boilerplate code to create data loaders.

From practical perspective, inspective issues in generated codebase is a lot simpler than debugging ORM behaviour.

To sum up

Most of the loaders are needed to perform simple PK look ups, e.g. UserByIdLoader. Writing this logic for every table is a mundane task.

PostLoader solves both of these problems by:

  1. Creating type declarations for all database tables.
  2. Creating loaders for the most common lookups.

Would you like to work with me?

Written by

Software architect, startup adviser. Editor of https://medium.com/applaudience. Founder of https://go2cinema.com.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store