How Frequently Akita App Syncs Your Data?
In a previous post, we looked at How We Built More Than 100 Customer Success-Integrations between Akita and apps like Hubspot, Salesforce, Intercom and Zendesk. We outlined our 3-step process for keeping data in sync: Schedule, Retrieve, and Process.
The Scheduler kicks everything off and is responsible for determining which data needs retrieving at any given time.
Let’s look at a customer (ACME) who has connected their Salesforce account. We can sync many types of data (or Interaction Types) from Salesforce including:
- Accounts
- Contacts
- Opportunities
- Tasks
- Notes
- Users
By default, we sync each type of Salesforce data every 180 minutes. This may be overkill for “Users” (ie. new Salesforce licenses you might rarely add) but is useful for having timely access to new Accounts and Contacts in Akita.
We schedule the retrieval of each different Interaction Type for ACME (an Integration Object) separately. This simplified SQL statement allows us to select the Integration Objects that need syncing:
SELECT * FROM IntegrationObjects WHERE SyncedAt <= NOW() - INTERVAL ‘3 HOURS’;
The Scheduler runs every couple of minutes, finding any Integration Objects (ex. ACME’s > Salesforce > Accounts) that need syncing, creating a Retrieve Job and dispatching it to a queue. When the Retriever successfully executes the Retrieve Job, it will update the SyncedAt
property to the current time.
The Retriever doesn’t re-sync the entire dataset each time. It selects only those records that have changed since it last ran. Combined with our Scheduler logic, it creates a system where updates are quickly added to Akita without straining our system or those of our integration partners.
Easy peasy lemon squeezy.
But not really. There are some important edge cases.
The Initial Sync
We have to make sure we catch brand-new Integrations that have not yet synced, so we tweak our SELECT
statement to also include any __ Integration Objects that have not yet been synced:
SELECT * FROM IntegrationObjects WHERE SyncedAt <= NOW() - INTERVAL ‘3 HOURS’ OR SyncedAt IS NULL;
This first sync is the costliest—it syncs all data at once. After the initial sync, each update is relatively easy.
Throttling
Akita uses messaging queues extensively (AWS SQS). On occasion, the job queue will be backed up. It is possible that a Retrieve job is queued but doesn’t run before the next evocation of the Scheduler. In this case, there could be duplicate retrieve jobs causing two syncs to run in parallel.
To avoid this, we use throttling to make sure we don’t dispatch the same Retrieve Job more than once in a given time period.
Flexibility
We can adjust the sync frequency of each individual Integration Object (ex. ACME’s Salesforce Tickets). That is we can sync Salesforce Accounts more frequently for customer A than for customer B. This is useful in two situations:
The customer absolutely MUST have their data synced more frequently
Every business is different. Some of our customers have legitimate requirements to have data available in close-to -real time. In these cases, we can adjust their settings to sync a specific Integration Objectmore frequently (by no means in-real-time but fairly frequently).
The customer’s entire dataset changes continuously (making every-3-hour syncs too costly)
Take an analytics tool that provides up-to-the-second statistics for every Account. It may show new results for every Account each time it is queried. Since each record was updated, each sync would require a full re-sync. If there were 100K or 1M Accounts, a full re-sync would mean a never-ending series of API calls.
In this case, we change “every three hours” to “once a day”. This reduces the strain on our servers and our integration partners but still keeps the data fresh enough for almost any Customer Success purpose (building Customer Health Scores, triggering Customer Success Playbooks, or refreshing a Customer Dashboard).
Next week, we’ll look at how our Retriever requests data from our integration partners, handles responses (including errors) and dispatches jobs to the queue for our Processor to analyze.