Hacking Medium to create a self-hosted blogging platform

A free-rider’s guide to using Medium editor to create content for hosting on a personal website

4 min readSep 5, 2017

Here is the thing: Medium editor is awesome. There is medium-editor (an open-source clone of the Medium editor), however, it does not have the same feel. I just 💖 Medium editor.

Therefore, when it was time to start working on the GO2CINEMA blog, the first question I asked myself was – can I use Medium? The answer is no. Unfortunately, Medium does not give enough flexibility, e.g. I cannot configure which links use rel=nofollow and I cannot use Medium under a custom path within a domain (e.g. https://go2cinema.com/stories/).

However, I can use the Medium editor. You see, not only does Medium editor look great, it also produces a well organised abstraction of the content.

There is more than one way to skin Medium.

Take this article as an example. You can get JSON version of this article by appending ?format=json to the end of the URL, e.g. https://medium.com/@gajus/hacking-medium-to-create-a-self-hosted-blogging-platform-fd04fe24c752?format=json

The body of the article is described using paragraphs. Each paragraph has an ID and metadata describing images and text styles.

Data normalisation

In addition to saving me time developing a CMS, analysing Medium data structure taught me couple of things about data normalisation. Namely, I have learned that article, paragraph, article structure (list of paragraphs at every revision of an article), paragraph metadata and markups need to be separate entities.

This normalisation (having ID associated with every paragraph) allows versioning of the articles at a paragraph level. In addition to versioning, it enables the beloved features of Medium such as content highlighting and comments referencing specific sections of text. Meanwhile, having markups as separate entities referencing the paragraph and as a result – paragraphs stored without markup in the database, allows for simpler full-text search across the story database.

Furthermore, having article body defined using “paragraphs” allows to have different types of paragraphs: regular text paragraphs, images and other add-ons (e.g. subscribe to newsletter box). Contrast this to the traditional blob of data representing article body. In the latter case, adding different types of “paragraphs” would require a combination of markdown and XML-like markup.

Risks

Naturally, as part of assessing the benefits I had to consider the risks. The most obvious risk being that Medium either change their API response shape (likely; low impact), restrict API access altogether (unlikely; high impact) or Medium stop existing altogether (unlikely; high impact).

These are all the risks I am willing to accept in the short term. In the long term, I will be building an in-house CMS to manage GO2CINEMA stories, likely using the aforementioned open-source medium-editor.

How does it work?

So what’s the process? You publish the article, get the JSON, unpublish it? May you elaborate?

Federico Zivolo asked to clarify what is the process.

Well, it is pretty simple – once I write an article using Medium editor, I save it as a draft without publishing the article. Then I add the article ID to the GO2CINEMA database. GO2CINEMA article service is pooling Medium to get the latest version of the article. Every time it detects that there is a new version of the article, it saves a copy of the article as a new revision of the article.

At the high level, the database structure looks like this:

article (id, medium_id, headline, alternative_headline, [..])
article_revision (id, revision_number, paragraph_order, paragraph_id)
paragraph (id, article_id, article_revision_id, text, type)
markup (id, paragraph_id, start, end, type)

I have skipped supporting tables such as paragraph_image, article_image, etc.

The rest is GO2CINEMA specific – reconstructing the article using data in the database.

The result

I have used this method to create the first post on what is to be the GO2CINEMA publication.

Do file sharing and piracy "eat at the roots" of the cinema industry?

A common belief is that cinema industry revenue is decreasing and piracy is to blame. I summarise findings of the…

go2cinema.com

Double dip

Now that the article has been published on the main website, I will give Google couple of days to pick up the article and then will use Medium import post feature to import it to https://medium.com/applaudience publication (A Medium publication for movie buffs that I am running).

The benefit of this method is that it creates higher chances of the article appearing in the search results, and since Medium links imported articles back to the original content, we are not risking to fall under Google’s duplicate radar.