I reduced GraphQL codebase size by 40% and increased type coverage to 90%+

Using code generation to create data loaders for all database resources

5 min readNov 15, 2017

Developing APIs has never been easy. Aside from the API design and documentation, simply keeping API up to date with the database schema is a challenge.

GraphQL solves part of the problem. To a large extent, GraphQL APIs are self-documenting. I will even go as far as to say that GraphQL makes API design a lot more simple. For one, you don’t need to concern yourself with the shape of the response – request defines the required fields. Furthermore, ability to deprecate individual fields, inspect performance impact of specific fields and observe how your API is being used allows rapid development.

A single database change can trickle dozens of changes in the API codebase. Using abstractions such as data loaders and strict types helps — as long as strict-type declarations are up to date, a type linter will guide you through all the breaking changes. However, keeping data loader queries and strict type declarations up to date is time consuming and error-prone.

Margaret Hamilton (1.52 m) poses with a print of an avg. codebase Flow type declarations.

Generating Flow types of PostgreSQL database resources

It does not make sense to write Flow type declarations by hand for database resources because you are duplicating information that is already in the database schema.

CREATE TABLE reservation_confirmation_email_task (
    id integer NOT NULL,
    reservation_id integer NOT NULL,
    failed_attempt_count integer DEFAULT 0 NOT NULL,
    started_at timestamp with time zone,
    ended_at timestamp with time zone,
    created_at timestamp with time zone DEFAULT now() NOT NULL
);

Database schema tells you the field names and their types – all you need is a tool that can read the database and generate Flow type declarations. In this particular case, the Flow type equivalent is:

type ReservationConfirmationEmailTaskRecordType = {|
  +id: number,
  +reservationId: number,
  +failedAttemptCount: number,
  +startedAt: number | null,
  +endedAt: number | null,
  +createdAt: number
|};

ReservationConfirmationEmailTaskRecordType has been generated using PostLoader.

All thats left to do is reference the type records across the codebase. Then you can sleep safe knowing that your codebase and database are in sync.

This is great – automating type generation removes massive amount of code that would otherwise need to be typed by hand (this is where the 40% figure comes from). Similarly, code generation prevents human errors such as wrongly describing a database resource. However, you still need to load the data from the database.

Loading data from the database

Once you have GraphQL setup with mock data and type definition, you need to create the actual data loaders.

Continuing with the reservation_confirmation_email_task example, it could be as simple as:

// @flowimport {
  sql
} from 'slonik';
import type {
  DatabaseConnectionType
} from 'slonik';const getReservationConfirmationEmailTasksByIds = (
  connection: DatabaseConnectionType
): Promise<$ReadOnlyArray<ReservationConfirmationEmailTaskRecordType>> => {
  return connection.any(sql`
    SELECT
      id,
      reservation_id "reservationId",
      failed_attempt_count "failedAttemptCount",
      started_at "startedAt",
      ended_at "endedAt",
      created_at "createdAt"
    FROM reservation_confirmation_email_task
    WHERE id IS IN ${id}
  `);
};

You would then construct a DataLoader:

const ReservationConfirmationEmailTaskLoader: DataLoader<number, ReservationConfirmationEmailTaskRecordType> = new DataLoader(getReservationConfirmationEmailTasksByIds.bind(connection));

(In case you are not familiar with DataLoader, it is a must for a GraphQL codebase.)

Good. This is a perfect setup. Every resource is loaded using a data loader that batches the requests. Resources have strict type association. Strict types are generated using database schema. Except — This is a lot of typing.

Generating data loaders of PostgreSQL database resources

Clearly, data loading code is verbose and suffers from the same issue of a potential human error as the earlier strict type example. This is why I have created PostLoader. PostLoader generates strict type declarations and all the boilerplate data loader code for every resource in the database.

Using PostLoader is simple:

export POSTLOADER_DATABASE_CONNECTION_URI=postgres://postgres:password@127.0.0.1/test
export POSTLOADER_COLUMN_FILTER="return /* exclude tables that have a _view */ !columns.map(column => column.tableName).includes(tableName + '_view')"
export POSTLOADER_TABLE_NAME_MAPPER="return tableName.endsWith('_view') ? tableName.slice(0, -5) : tableName;"

postloader generate-loaders > ./PostLoader.js

This will generate code equivalent to:

// @flowimport {
  getByIds,
  getByIdsUsingJoiningTable
} from 'postloader';
import DataLoader from 'dataloader';
import type {
  DatabaseConnectionType
} from 'slonik';export type UserRecordType = {|
  +id: number,
  +email: string,
  +givenName: string | null,
  +familyName: string | null,
  +password: string,
  +createdAt: string,
  +updatedAt: string | null,
  +pseudonym: string
|};// [..]export type LoadersType = {|
  +UserByIdLoader: DataLoader<number, UserRecordType>,
  +UsersByAffiliateIdLoader: DataLoader<number, $ReadOnlyArray<UserRecordType>>,
  // [..]
|};// [..]export const createLoaders = (connection: DatabaseConnectionType, NotFoundError: Error) => {
  const UserByIdLoader = new DataLoader((ids) => {
    return getByIds(connection, 'user', ids, 'id', '"id", "email", "given_name" "givenName", "family_name" "familyName", "password", "created_at" "createdAt", "updated_at" "updatedAt", "pseudonym"', false, NotFoundError);
  });
  const UsersByAffiliateIdLoader = new DataLoader((ids) => {
    return getByIdsUsingJoiningTable(connection, 'affiliate_user', 'user', 'user', 'affiliate', 'r2."id", r2."email", r2."given_name" "givenName", r2."family_name" "familyName", r2."password", r2."created_at" "createdAt", r2."updated_at" "updatedAt", r2."pseudonym"', ids);
  });// [..]return {
    UserByIdLoader,
    UsersByAffiliateIdLoader,
    // [..]
  };
};

Consuming the code is as simple. Simply construct the loaders collections and use then as you would. Refer to the usage examples.

The generated files

Okay, I lied — I said that I decreased codebase by 40% (I did remove 40%+ of hand-typed code from the GO2CINEMA codebase). Actually, the codebase size increased by almost 5000 LoC. It doesn’t really matter, because you don’t need to maintain this code by hand. Next time you make a change in the database, simply run PostLoader to generate new code.

Meanwhile, increased strict type coverage allowed to discover several not so small bugs that would have otherwise caused a turmoil.

I wonder how many people got upset because I have misattributed Margaret Hamilton’s work. I am sorry.

The example in article is using a simple PK lookup. This covers most of the use cases, at least when it comes to a GraphQL database and related resource resolution. However, PostLoader is capable of generating a lot more complex loaders, e.g. using joining tables and other columns with unique constraints. Refer to the behaviour documentation.

What makes this different from using an ORM?

You could have gotten away with using an ORM to create loaders. However, ORM approach has multiple important downsides:

ORM is not going to give you strict types and code completion.
ORM has runtime overhead for constructing the queries and formatting the results.
You would still need to type all the boilerplate code to create data loaders.

From practical perspective, inspective issues in generated codebase is a lot simpler than debugging ORM behaviour.

To sum up

Keeping database and codebase in sync is hard. Whenever changes are done to the database schema, these changes need to be reflected in the codebase’s type declarations.

Most of the loaders are needed to perform simple PK look ups, e.g. UserByIdLoader. Writing this logic for every table is a mundane task.

PostLoader solves both of these problems by:

Creating type declarations for all database tables.
Creating loaders for the most common lookups.

Would you like to work with me?

I am hiring multiple, full-time remote developers to work with me on GO2CINEMA. A lot of the work we are doing is open-source and we are working with all the cutting-edge tech. Drop me an email at gajus@applaudience.com if you would like to chat.