The Sharepoint Crawler does not index the expected number of records or objects.

Document ID : KB000056692
Last Modified Date : 14/02/2018
Show Technical Document Details

Scenario:

I have configured the Crawler Surface to be crawled by Sharepoint 201x which is successfully completing but it will only ever index 13 records.  

Explanation:

This result of 13 records may occur when a full crawl is performed under the following conditions:

1. In the Crawler Surface XML Configuration File, the value of the list_form_number_of_record_per_object parameter is set to the out-of-the-box value of 5.

2. In the SharePoint configuration for the CA Service Desk Manager content source, specify page depth is to be limited, set the Limit Page Depth to 2 and set the Limit Server Hops to 1.

3. In the SharePoint configuration for the CA Service Desk Manager content source, use farm=KD.

When "Limit Page Depth" is used, then the SharePoint Crawler only scans list_form_number_of_records_per_object multiplied with the value set in "Limit Page Depth" during a single crawl.   So, in the case above, only 10 CA Service Desk Manager knowledge documents were indexed.  The additional 3 correspond to other links such as "next page" links for the list.

To index the full set of objects in the farm during the full crawl, you could either specify a sufficient value for list_form_number_of_records_per_object, or do not specify a page depth limit.

Considerations:

  1. The value of specified for list_form_number_of_records_per_object applies to both types of crawls - full crawl, and incremental crawl.
     
  2. The higher the value that is specified for list_form_number_of_records_per_object, the bigger the potential impact on performance.
     
  3. There is no value of list_form_number_of_records_per_object that can be used to mean unlimited.  However, a value as high as 10000000 (i.e. ten million) can be specified. 
     
  4. If the number of objects that need to be indexed exceeds the value of Limit Page Depth multiplied by list_form_number_of_records_per_object, then a full crawl will not index all of the objects.  
     
  5. list_form_number_of_records_per_object represents the number of objects of each type that are included in each list for each page.  When Limit Page Depth is used, the number of page lists that are indexed will be the same as the value specified for Limit Page Depth.
     
  6. If you do not limit the page depth, then the full crawl should index all of the objects as per the farm definition.

Notes:

The Crawler Surface XML Configuration file is named "crawler_surface_config.xml" and exists under $NX_ROOT\bopcfg\www\CATALINA_BASE_FS\webapps\fscrawl\WEB-INF of a CA Service Desk Manager primary server. 

Other relevant information on this topic can be found, for CA Service Desk Manager 14.1, on the Wiki, in the section titled "How to Configure the Crawler Surface for SharePoint".  Navigation is as follows:  CA Service Management Home > Using > Knowledge Management > Integrating Multiple Search Engines Using Federated Search > How to Configure the Crawler Surface for SharePoint