How to Set Up and Configure Documentum Connector

VERSION SUPPORT

Ensure you install the appropriate version of the connector to interact with your Documentum. 

  • Documentum Connector version 3 uses theREST APIto interact with Documentum. This topic is about version 3.
  • Documentum Connector version 2 uses theDFC APIto interact with Documentum. See Documentum v2: DFC API.

All BA Insight connectors can be downloaded from Right Answers under Knowledge > BA Insight > Product Downloads > Connectors.

This connector is installed with the same generic steps as any BA Insight connector.

Satisfy the Prerequisites for your connector before installing.

Configuration specifics are detailed below.

Use the Connection and Content specific information, described below, to complete the configuration of your connector.

Note: The walk-thru video below guides you through the entire installation and configuration process in video format.

Documentum Installation Walk-Thru Video

Use the following video to install any BA Insight Repository-type connector.

The Documentum connector is used in the video as an example.

See the specific settings for your connector Connection and Content in the topics below.

Installing a Repository-Type Connector: Documentum from BA Insight on Vimeo.

Connection Configuration Specifics

  • Documentum Account: Specify a login user name and password that is a valid Documentum account.
    • This account must have READ permissions to all documents in order to crawl.
  • Excluded Metadata fields: Comma (,) separated list of metadata which is NOT retrieved for indexed documents.
  • EnterTimeZone Offset: If your Documentum deployment stores dates with the local time zone instead of UTC, please specify the time zone of your Documentum deployment.
    • This typically only apply to Documentum pre v6 (or upgraded to v6 or higher).
  • Documentum URL: Enter the URL of the Documentum instance using the following format: http(s)://<documentum_server>:<port_number>
    • Default port: 7777
    • Documentum REST services must be deployed on this instance.
    • BA Insight recommends the URL to be HTTPS-secured as the Documentum account credentials are passed on for authentication.
  • Exclude UTC date field marker: Older versions of Documentum fail to interpret the UTC marker in DQL queries.
    • To build DQL queries without the UTC marker please select this option (from version D6 onward dated are stored in UTC by default)

Content Configuration Specifics

  • The OpenText Documentum Connector provides settings that you can use to specify which documents to crawl and what information to retrieve about each of these documents.
  • These settings must be specified in an XML file.
  • All of the elements are optional.
  • For this reason, if a setting is not defined, the default value is used.
Setting Default Value Description
<Customfilter> Empty (no filter)

If you do not want to crawl all of the documents, you can specify a DQL query fragment to filter items using any type of metadata.

  • For example: <customfilter>r_object_id = '090003e780001679'</customfilter>

<indexAllVersions> (Only latest version of the item will be indexed) Set to "True" if you want to index all versions of a specific document as separate entries
<saveevent>

The OpenText Documentum Connector uses the audit table to get events that signal document changes.

Use this setting to specify the type of event to use when a document changes.

  • For example: <saveevent>dm_save</saveevent>
<deleteevent> dm_destroy

The OpenText Documentum Connector uses the audit table to get events about deleted documents.

This setting contains the type of the event to use for this purpose

  • For example: <deleteevent>dm_delete</deleteevent>

<rootObject> dm_document

Defines the root object type included in a crawl.

Higher level types in the type hierarchy are skipped.

  • For example: <rootObject>dm_email_message</rootObject>

<enableContentless> True

If this settings is set to "True," documents without any content are crawled

<contentlessExtension> unk

This setting contains the extension that is returned for documents that have no content. 

  • For example: <ContentlessExtension>unknown</contentlessExtension>

  • Make sure the extension is added to the list of authorized file extensions in your search engine (SharePoint) or the items may not be correctly crawled

<downloadBlockedExtension> Empty (No extensions)

You can define a comma-separated (,) list of extensions.

The content of the documents that have this type of extension are not returned during a crawl. 

Only metadata is returned.  

  • For example: <downloadBlockedExtensions>zip</downloadBlock edExtensions>

<includeFolderPath> False If set to "True" the folder path (r_folder_path) is retrieved if it is part of the item's metadata.
<includeCabinetName> False

If set to "True", then cabinet name is returned as metadata.

  • For example: <includeCabinetName>True</includeCabinetName>

<disableSavedCheck> False

If this option is set to "True", document saved events are not checked on incremental crawls to report modified documents. 

  • For this reason, permission changes are not detected in incremental crawls: <disableSavedCheck>True</disableSavedCheck>

<disableDeletedCheck> False

If this option is set to "True", deleted documents are not checked on incremental crawls to report modified documents. 

  • For this reason, these documents are not removed from the search index during incremental crawls.

  • For example: <disableDeletedCheck>True</disableDeletedCheck>

<dontRetrieveDocument> False

If set to "True", no content is returned for any of the documents.

Only metadata is returned.

  • For example: <dontRetrieveDocument>True</dontRetrieveDocument>

<dontRetrieveSecurity> False

If set to "True," permissions are returned for any documents.
In other words, all of the documents are public and available to everyone.

  • For example: <dontRetrieveSecurity>True</dontRetrieveSecurity>

<IncludeParentId> False

If set to "True", the parent ID is returned as metadata.

  • For example: <includeParentId>True</includeParentId>

<skipCrawlReadPermissionCheck> False Set to "True" if you want to index documents that the crawl account does not have at least READ access permissions to (that is, the account has only Browse permissions).
<enableCustomFilterOnGetItem> True

If set to "False", the custom filter is not applied on the GET item calls: this might improve performance by simplifying the item retrieval query but is not recommended when items can move outside of the custom filter as they will not be reported as deleted and will not be removed by incremental crawls.

This value is only recommended in special cases.

  • This can be turned off if you know for sure that your custom filter and indexed items will not change in the future and fail to match the filter.

  • You can provide a folder example or when filtering metadata that can never be changed (assigned on document creation and set as read-only).

Deprecated Configuration Options

Setting Default Value Description

<additionalChanges>


'Effective' and dsm_doc_classification='For Internal Use Only' and acl_name like 'd2%' and dsm_doc_collection_key in (select distinct(alias_name) from dm_alias_set where object_name='dsm_es_publishfiuo')

<enableFilterOptimization>

True

Upgrading Documentum

If you upgrade your Documentum from version 2 to version 3, you must create a new Connection and Content Source.

Your old Connection and Content Source do not carry over from the old version.

Web.config Configuration Specifics

The following additional parameters are available for tuning:

  • PagingFix:
    • The number of additional items retrieved on Full Crawl Enumeration when clean time break not achieved.
    • Default: 1000
  • RestEndpoint:
    • The rest endpoint of the Documentum REST API.
    • Default: dctm-rest
  • PagingSize:
    • The number of items requested per page when calling the REST API.
    • Default: 1000
  • CacheExpiration:
    • Expiration in minutes for an item in cache to be removed since last access.
    • Default: 10
  • RequestTimeoutInSeconds:
    • REST API call timeout.
    • Default: 120