Ensuring good service health by automating thorough integration testing and alerting

Palantir automates service health checks and communication with the developers

There are only so many things that make you look more unprofessional than your clients informing you about a failure of your service without you being aware and transparent about the failure beforehand. It is your responsibility as a service provider to be the first to know when something breaks and inform clients that you are aware of the failure and working on a fix.

When your client informs you about your API being down.
  1. Automate communication with the responsible developers

Palantir

Palantir is used for communication and as a means of seeing events in other parts of the system.
  1. Alerting program

Palantir test

A Palantir test is an object defining methods used to query data and assert an expectation:

type TestContextType = Object;type QueryResultType = *;type TestConfigurationType = Object;/**
* @property configuration Test-specific configuration passed to `beforeTest` and `afterTest` as the first parameter.
* @property description Test description.
* @property interval Returns an interval (in milliseconds) at which the test should be executed.
* @property tags An array of tags used for organisation of tests.
* @property query Method used to query the data. If method execution results in an error, the test fails.
* @property assert Method used to evaluate the response of query. If method returns `false`, the test fails.
*/
type TestType = {|
+configuration?: TestConfigurationType,
+description: string,
+interval: (consecutiveFailureCount: number) => number,
+tags: $ReadOnlyArray<string>,
+query: (context: TestContextType) => Promise<QueryResultType>,
+assert?: (queryResult: QueryResultType) => boolean
|};
{
description: 'https://applaudience.com/ responds with 200',
interval: () => {
return interval('30 seconds');
},
query: async () => {
await axios('https://applaudience.com/', {
timeout: interval('10 seconds')
});
},
tags: [
'go2cinema'
]
}

Monitor program

Palantir monitor program continuously performs user-defined tests.

$ palantir monitor ./tests/**/*

Alert program

Palantir alert program subscribes to Palantir HTTP API and alerts other systems using user-defined scripts.

$ palantir alert --configuration ./alert-configuration.js --palantir-api-url http://127.0.0.1:8080/
/**
* @file Using https://www.twilio.com/ to send a text message when tests fail and when tests recover.
*/
import Twilio from 'twilio';
const twilio = new Twilio('ACCOUNT SID', 'AUTH TOKEN');const sendMessage = (message) => {
client.messages.create({
body: message,
to: '+12345678901',
from: '+12345678901'
});
};
export default {
onNewFailingTest: (test) => {
sendMessage('FAILURE ' + test.description + ' failed');
},
onRecoveredTest: (test) => {
sendMessage('RECOVERY ' + test.description + ' recovered');
}
};

Alert controller

Palantir alert controller abstracts logic used to filter temporarily failures.

import interval from 'human-interval';
import Twilio from 'twilio';
import {
createAlertController
} from 'palantir';
const twilio = new Twilio('ACCOUNT SID', 'AUTH TOKEN');const sendMessage = (message) => {
client.messages.create({
body: message,
to: '+12345678901',
from: '+12345678901'
});
};
const controller = createAlertController({
delayFailure: (test) => {
return interval('5 minute');
},
delayRecovery: () => {
return interval('1 minute');
},
onFailure: (test) => {
sendMessage('FAILURE ' + test.description + ' failed');
},
onRecovery: () => {
sendMessage('RECOVERY ' + test.description + ' recovered');
}
});
export default {
onNewFailingTest: (test) => {
controller.registerTestFailure(test);
},
onRecoveredTest: (test) => {
controller.registerTestRecovery(test);
}
};
delayFailure: (test) => {
if (test.tags.include('database')) {
return 0;
}
return interval('5 minute');
}

Takeaway

Palantir tests allow to write thorough integrations tests, e.g. testing customer-journeys using Puppeteer, querying database to construct conditional tests, as well as asserting service-health using 3rd-party services, such as https://www.webpagetest.org/.

Software architect, startup adviser. Editor of https://medium.com/applaudience. Founder of https://go2cinema.com.