Wednesday, November 9, 2016

Trickle Data Load


Problem statement: The enterprise needs to load millions of rows form legacy Salesforce CRM to new Salesforce CRM. While IT team continue data migration activity other business units must be able to use system for day to day operations. For example, resolving customer issues, responding queries, marketing activity and so on. Salesforce multitenant architecture with verity of APIs (Bulk /SOAP/Rest) handle this very well with very few exceptions.  

The particular problem that I want to highlight global search. When we bulk load data it affects global search indexing. Global search work asynchronously but gives a feel of near real time. But what we have to load is billions of rows. When we load high volumes of data if make the global search unusable as the indexing is not complete. In our test it took whole week for global search to work properly. CSR users were not very happy about it and they break various SAL due to this.


Solution:
The approach is load data as slowly as possible. If you load relatively less records (say 5k or less) the global search is available in the matter of seconds (for me, it was less than 30 seconds). So, instead of loading data in bulk we choose to load data in “Trickle Load” fashion. In a way it’s exactly opposite of bulk data loading. In Bulk data loading we upload all data and periodically keep checking the status. In Trickle load we load small chunk and wait for salesforce to finish updating global indexes. After some time, we submit again another chunk and this can go on and on till we finish.
If you think of this in a way, it’s just tread of between limits.

How to Trickle Load huge data?

1.   Split large CSV file of data into batches of 5k (php code is provided at the bottom)
2.   Trigger job of data loading at every 5 minutes to insert one batch file
3.   The easiest way for me was using data loader CLI to run from command prompt


Things to keep in mind:
·     With this approach we can load max 1,440,000 records per 24 hours (5k * (24* 12(batches per hour)))
·     As the wait time increase you will notice less problem of jamming the global search indexes. As the wait time reduced you will notice more the problem of jamming the global search indexes
·     Higher API consumption can be result with this approach.
·     We consider to wait for 5 minutes but you can choose another number as per your ability to wait after jamming problem.





Sunday, September 18, 2016

Tricky Things

1) Two restated list on standard page.

2) For loop with inline query

3) Detail object will always trigger master trigger -  avoid data scew

4) Join report can be used to show data from tree like Data model

5) Make record read only conditionally - workflow - change record type-> new record type should have page layout that is read only    

Data Migration - Field Mapping





This field mapping may help you for data migration project.



Salesforce Type H2MySQL Oracle PostgreSQLSQL/Server
 ANYTYPE CLOB TEXT NCLOB TEXT NVARCHAR (MAX)
 BASE64 CLOB TEXT, MEDIUMTEXT, or LONGTEXT CLOB TEXT VARCHAR(MAX)
 BOOLEAN BOOLEAN BOOLEANDECIMAL(1) BOOLEAN  BOOLEAN
 COMBOBOX NVARCHAR  VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 CURRENCY DECIMAL DECIMAL NUMBER NUMERIC DECIMAL
 DATA CATEGORY GROUP REFERENCE CHAR CHAR CHAR CHAR CHAR
 DATE TIMESTAMP DATE TIMESTAMP(3) DATE DATE
 DATETIME TIMESTAMP DATETIME TIMESTAMP(3) TIMESTAMP DATETIME
 DOUBLE DECIMAL DECIMAL NUMBER NUMERIC DECIMAL
 EMAIL NVARCHAR  VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 ENCRYPTED STRING NVARCHAR VARCHAR NVARCHAR2VARCHAR  NVARCHAR
 ID CHAR CHAR CHAR CHAR CHAR
 INT INTEGER INTEGER, DECIMAL, or BIGINT NUMBER INTEGER INT or BIGINT
 MULTIPICKLIST NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 PERCENT DECIMAL  DECIMAL NUMBER NUMERIC DECIMAL
 PHONE NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 PICKLIST NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 REFERENCE CHAR  CHAR CHAR CHAR CHAR
 STRING NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 TEXTAREA NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR
 TIME TIME DATETIME TIMESTAMP(3) TIME DATETIME
 URL NVARCHAR VARCHAR NVARCHAR2 VARCHAR NVARCHAR

Monday, September 12, 2016

Passed Salesforce Certified Integration Architecture Designer

I cleared Salesforce Certified Integration Architecture Designer exam on 9 Sept 2016.

Here are my notes those may help you to prepare and give you some idea on what to expect.

Important Topics:

  1. SOAP API - partner WSDL VS enterprise WSDL
  2. REST API 
  3. Outbound messaging with all limitations and pro and corns 
  4. There was question about email service which was not mentioned in study guide 
  5. Data Loader 
  6. ETL tools pro and cons 
  7. Governor limits 
  8. Security 
  9. Other APIs metadata, chatter, stemming, bulk 

Other notes 
  • Almost 70% questions were scenario based 
  • It took me 89 minuets to complete then exam and did not got any time for review 
  • I know the above points so i followed do it once and do it right principal
  • There was only one question on  Enterprise Integration Patterns - The Book and to be very frank, i just read the title of it. As its too much to read. if you are like me then may be you can just brush through it and focus on rest of important topics 
  • Many times you will feel that all options are correct 
  • Follow elimination technique to reduce your choices 
Please don't ask for any sample questions. If you have any other questions please feel free to ask in comment and i will get back to you.

All the best!
Overview of Salesforce database :
·         Salesforce works on Multi-tenant architecture, i.e many organizations (Tenants) are using same infrastructure.
·         Salesforce database doesn’t save data in traditional way. Data of all tenants are in common table.
·         It saves Metadata Information in hundreds of table.
·         Run-time engine, generates organization specific query to get information about their organizations and fetch data from common table.
·         Database tables are partitioned by Organization ID and generates virtual table specific to Org.
·         Due to Multi-tenant architecture ,traditional Query optimization query and technique will not work in this case.
·         To consider multitenant statistics, the platform has its own SOQL query optimizer that determines optimal SQL to satisfy a specific tenant's request.





Overview of Force.com query optimizer :
·         The Force.com query optimizer is an engine that sits between your SOQL and the database itself.
·         Because of salesforce.com’s multitenancy, the optimizer gathers its own statistics instead of relying on the underlying database statistics.
·         Using both these statistics and pre-queries, the optimizer generates the most optimized SQL to fetch your data.
·         It looks at each filter in your WHERE clause to determine which index, should drive your query.

How does Force.com query optimizer works:
·         Determines the best index from which to drive the query, if possible, based on filters in the query
·         Determines the best table to drive the query from if no good index is available
·         Determines how to order the remaining tables to minimize cost
·         Injects custom foreign key value tables as needed to create efficient join paths
·         Influences the execution plan for the remaining joins, including sharing joins, to minimize database input/output (I/O)
·         Updates statistics

Query Plan Tool :
·         There is no such tool available to optimize final generated SQL but Salesforce released Query Plan Tool to analyze how query is performing.
·         Query Plan tool in the Developer Console can help speed up SOQL queries done over large volumes.
·         Use the Query Plan tool to optimize and speed up queries done over large volumes.

How to enable the Query Plan Tool in the Developer Console?
Go to:
Developer Console>>Help > >Preferences and set "Enable Query Plan" to true


Why use this tool?
·         Use this tool to check the Query Plan for any SOQL queries that execute slowly or queries which are run on large volume data .
·         It will provide you with insight on the different plans
·         It provide the cost of SOQL using the index compared to a full table scan. If the cost for the table scan is lower than the index, and the query is timing out.
·         In such a scenario, you will need to perform further analysis on using other filters to improve selectivity

How To determine if a filter is selective?
1.       If the filter is on a standard field, it'll have an index if it is a primary key, a foreign key and an audit field
2.       Custom fields will have an index if they have been marked as Unique or External Id
3.       If the filter doesn't have an index, it won't be considered for optimization
4.       If the filter has an index, determine how many records it would return:
Ø  For a standard index, the threshold is 30 percent of the first million targeted records and 15 percent of all records after that first million.
i.e. the selectivity threshold for a standard index maxes out at 1  million total targeted records, which you could reach only if you had more than 5.6 million total records.

Ø  For a custom index, the selectivity threshold is 10 percent of the first million targeted records and 5 percent all records after that first million.
i.e. the selectivity threshold for a custom index maxes out at 333,333 targeted records, which you could reach only if you had more than 5.6 million records.

5.       If the filter exceeds the threshold, it won't be considered for optimization.
6.       If the filter doesn't exceed the threshold, this filter is selective, and the query optimizer will consider it for optimization.

View Query plan :
·         To use the Query Plan tool you simply enter a query in the Query Editor and press the "Query Plan" button which returns a Query Plan for the SOQL provided.



·         The Query Plan tool will show a list of available plans that our Query Optimizer can utilize for the query provided and will be arranged by cost ascending.





·         Each Plan will contain information on Cardinality, Operation Type, Cost, sObject Type etc.
·         The plan with the lowest cost is the plan that is used for driving the query execution.

Terminology used to display in query plan:
1.       Cardinality : The estimated number of records that the leading operation type would return.

2.        Fields : The indexed field(s) used by the query. If the leading operation type is Index, the fields value is Index. Otherwise, the fields value is null.

3.       LeadingOperationType : The primary operation type that Salesforce will use to optimize the query
a. Index - The query will use an index on the query object.
b. Other - The query will use optimizations internal to Salesforce.
c. Sharing - The query will use an index based on the sharing rules associated with the user who is executing the query. If there are sharing rules that limit which records that user can access, Salesforce can use those rules to optimize the query.
d. TableScan - The query will scan all records for the query object.

4.        Notes : An array of one or more feedback notes.

5.       RelativeCost : The cost of the query compared to the Force.com query optimizer’s selectivity threshold. Values above 1 mean that the query won’t be selective.

6.       sobjectCardinality : The approximate record count for the query object.

7.       sobjectType: The name of the query object.

How is the Cost determined?
·         Each plan has its own Cost value. The cost value is derived from the latest gathered database (DB) statistics on the table and values.
·         The plan with the lowest cost will be the plan used.
·         If the Cost is above 1, it means that the query won’t be selective.


Improve the performance of SOQL queries
When querying large objects special design considerations must be taken into consideration.
1. Make SOQL query selective
2. Improve performance with Custom indexes using Selective SOQL Queries

1. Make SOQL query selective:
·         The performance of a SOQL will depend on the presence of a selective filter
·         If a SOQL query contains at least 1 selective filter, the query is said to be selective.
·         If the SOQL query doesn't contain a selective filter, the query is said to be non-selective and will require a full table scan.
·         A filters can be simple or composite
o   Simple filters - single field expression in where clause or multiple expressions with 'AND' operator
o   composite filter - result of joining 2 or more field expressions via the "OR" operator is a .
·         A query is selective when one of the query filters is on an indexed field and the query filter reduces the resulting number of rows below a system-defined threshold.
·         The performance of the SOQL query improves when two or more filters used in the WHERE clause meet the threshold related conditions.

2. Improve performance with Custom indexes using Selective SOQL Queries
·         Salesforce support can add Custom Indexes upon request for customers
·         When creating a case with Support, be sure to include the SOQL query which has the field to be indexed as a filter in the WHERE clause. Also be sure to include bind values if any.





Sunday, September 4, 2016

Salesforce: Points to consider while data migration




Step 1:-  Working on Source
  • Define the scope of migration i.e. which objects and fields needs be migrated. For ex. not all objects, field and records needs to be in target system. for ex. we may need data of only past 2 years 
  • Check the ER of source data and find master detail / parent child relationships 
  • For auditing purpose we should take full backup but it should be different from migration backup
  • Take the down time and sure no one can modify data while we are performing migration.
Step 2:- Working on staging DB
  • Perform all data clean up in staging DB
  • If any calculations/data massaging needs to be done, then that should also be done in staging. It will remove the load on target system when you load data in bulk.
  • Use VLOOKUP to maintain referential integrity
  • Security of data 
  • Duration of backup process - data may chnaged while running backup process. 
Step 3:- Test data load in Full Copy Sandbox
  • We should test data load in sandbox environment.  
  • Disable all unnecessary activities i.e. triggers, validation rules, workflow, assignment rules   
  • Make sure minimum to sharing calculations are happening
  • Decide which API suites you best i.e SOAP, Bulk with serial mode, Bulk with parallel mode
  • Arrange data in a way to reduce locking at target system
  • We will need to create external id fields in target system to preserve the relation ships.
  • Prepare check list / To do list based on challenges you face that you can use for actual data migration in production. 
Step 4:- Actual data load
  • We will get lot of learning while testing in sandbox. We need to apply all that in production follow the check list. 
  • Run reports and queries to check the statistic make sure they match 
  • Do some sample checks 
  • Enable triggers, validation rules, workflow, assignment rules
  • Update sharing model to meet your business needs and recalculate sharing 
  • After migration run both system in parallel. Don’t immediately discard source system. Keep a limited access to source system 

Thursday, June 2, 2016

How to find IP of salesforce org

Scenario:  Let’s assume that we need to integrate salesforce with other 3rd party applications. This application have tight security policy like salesforce.
Due to this you need to while list all IP from where the request can be made to 3rd party API.  

Solution: Salesforce publish a list IP owed by them and which will be used for calling any 3rd party API.
The complete list of IP is published at : https://help.salesforce.com/apex/HTViewSolution?id=000003652

Problem: When salesforce acquire new IP or increase the IP range some of the Integration fail. To debug this issue you have 2 ways.

1)Check the if there are any updates on the above link and cross verify the IP ranges with your network team
Or
2) find out IP of your new salesforce org.

How: There are simple steps you can follow to find the IP of your salesforce Org.

1)      Go to http://mockbin.org/ and click create bin
2)      On next screen also just click create bin without entering any details.
3)      Copy The URL from any of code snippet shown for ex. http://mockbin.org/bin/84995797-d5fe-472d-bbf2-0b7ccf95170c?foo=bar&foo=baz
4)      Login to org for which you want to find the IP
5)      Go to setup and quick search ‘Remote Site setting’
6)      Add http://mockbin.org in trusted site
7)      Run following code from your developer console
8)      Then go back to . http://mockbin.org/bin/84995797-d5fe-472d-bbf2-0b7ccf95170c and click view history
9)      It will show all the requests along with IP from where the request was made.

Http h = new Http();
HttpRequest req = new HttpRequest();

//Don’t forget to replace the URL copied in step 3
req.setEndpoint('http://mockbin.org/bin/cbc071cf-851d-4b1f-8abf-3b14859bcfb6?foo=bar&foo=baz'); 

req.setMethod('GET');
HttpResponse res = h.send(req);

system.debug('>>' + res.getBody() );