Problem statement: The enterprise
needs to load millions of rows form legacy Salesforce CRM to new Salesforce CRM.
While IT team continue data migration activity other business units must be
able to use system for day to day operations. For example, resolving customer issues,
responding queries, marketing activity and so on. Salesforce multitenant architecture
with verity of APIs (Bulk /SOAP/Rest) handle this very well with very few exceptions.
The particular
problem that I want to highlight global search. When we bulk load
data it affects global search indexing. Global search work asynchronously but
gives a feel of near real time. But what we have to load is billions of rows.
When we load high volumes of data if make the global search unusable as the
indexing is not complete. In our test it took whole week for global search to
work properly. CSR users were not very happy about it and they break various SAL
due to this.
Solution:
The approach is load data as slowly as possible. If you load relatively
less records (say 5k or less) the global search is available in the matter of
seconds (for me, it was less than 30 seconds). So, instead of loading data in
bulk we choose to load data in “Trickle Load” fashion. In a way it’s exactly opposite
of bulk data loading. In Bulk data loading we upload all data and periodically
keep checking the status. In Trickle load we load small chunk and wait for salesforce
to finish updating global indexes. After some time, we submit again another
chunk and this can go on and on till we finish.
If you think of this in a way, it’s just tread of between limits.
How to Trickle Load huge data?
1. Split large CSV file of data
into batches of 5k (php code is provided at the bottom)
2. Trigger job of data loading
at every 5 minutes to insert one batch file
3. The easiest way for me was
using data loader CLI to run from command prompt
Things to keep
in mind:
·
With this approach we can load max 1,440,000 records per 24 hours (5k * (24* 12(batches per hour)))
·
As the wait
time increase you will notice less problem
of jamming the global search indexes. As the wait time reduced you will notice more the problem of jamming the global search indexes
·
Higher API
consumption can be result with this approach.
·
We consider to wait for 5 minutes but you can choose
another number as per your ability to wait after jamming problem.