How to Set Up and Configure Documentum v2

Connection Configuration Specifics

  • Documentum Account:

    • Specify a login user name and password that is a valid Documentum account.

    • This account must have read permissions to all documents in order to crawl.

  • Excluded Metadata Provides context with details such as the source, type, owner, and relationships to other data sets. Metadata provides details around the item being crawled by Connectivity Hub. fields:

    • Comma (,) separated list of metadata which is NOT retrieved for indexed documents.

  • EnterTimeZone Offset:

    • If your Documentum deployment stores dates with the local time zone instead of UTC, please specify the time zone of your Documentum deployment.

      • This typically only apply to Documentum pre-v6 (or upgraded to v6 or higher).

  • Server Address and Port:

    • Enter the EMC Documentum server address and docbroker port using the following format:

      • myserver:port (Example: 172.16.10.00:1490)

      • Typical port values include:

        • 1489 – Native

        • 1490 – Secure

    • Exclude UTC date field marker:

      • Older versions of Documentum fail to interpret the UTC marker in DQL queries.

      • To build DQL queries without the UTC marker please select this option (from version D6 onward dated are stored in UTC by default)

Content Configuration Specifics

  • The EMC Documentum Connector provides settings that you can use to specify which documents to crawl and what information to retrieve about each of these documents.

  • These settings must be specified in an XML file.

  • All of the elements are optional.

  • For this reason, if a setting is not defined, the default value is used.

Setting Default Value Description
<Customfilter> Empty (no filter)

If you do not want to crawl all of the documents, you can specify a DQL query fragment to filter items using any type of metadata.

  • For example: <customfilter>r_object_id = '090003e780001679'</customfilter>

<indexAllVersions> (Only latest version of the item will be indexed) Set to "True" if you want to index all versions of a specific document as separate entries
<saveevent>  

The EMC Documentum Connector uses the audit table to get events that signal document changes.

  • Use this setting to specify the type of event to use when a document changes.

  • For example: <saveevent>dm_save</saveevent>

<deleteevent> dm_destroy

The EMC Documentum Connector uses the audit table to get events about deleted documents.

  • This setting contains the type of the event to use for this purpose

  • For example: <deleteevent>dm_delete</deleteevent>

<rootObject> dm_document

Defines the root object type included in a crawl. Higher level types in the type hierarchy are skipped.

  • For example: <rootObject>dm_email_message</rootObject>

<enableContentless> True If this settings is set to "True," documents without any content are crawled
<contentlessExtension> unk

This setting contains the extension that is returned for documents that have no content.

  • For example: <ContentlessExtension>unknown</contentlessExtension>

  • Make sure the extension is added to the list of authorized file extensions in your search engine (SharePoint) or the items may not be correctly crawled

<downloadBlockedExtension> Empty (No extensions)

You can define a comma-separated (,) list of extensions.

  • The content of the documents that have this type of extension are not returned during a crawl.

  • Only metadata is returned.

  • For example: <downloadBlockedExtensions>zip</downloadBlock edExtensions>

<includeFolderPath> False If set to "True" the folder path (r_folder_path) is retrieved if it is part of the item's metadata.
<includeCabinetName> False
  • If set to "True", then cabinet name is returned as metadata.

  • For example: <includeCabinetName>True</includeCabinetName>

<disableSavedCheck> False

If this option is set to "True", document saved events are not checked on incremental crawls Scanning and capturing only new data from all of your content sources. This data did not exist when the last crawl was run. to report modified documents.

  • For this reason, permission changes are not detected in incremental crawls:

    <disableSavedCheck>True</disableSavedCheck>

<disableDeletedCheck> False

If this option is set to "True", deleted documents are not checked on incremental crawls to report modified documents.

<dontRetrieveDocument> False

If set to "True", no content is returned for any of the documents. Only metadata is returned.

For example: <dontRetrieveDocument>True</dontRetrieveDocument>

<dontRetrieveSecurity> False

If set to "True," permissions are returned for any documents.

  • In other words, all of the documents are public and available to everyone.

  • For example: <dontRetrieveSecurity>True</dontRetrieveSecurity>

<IncludeParentId> False

If set to "True", the parent ID is returned as metadata.

  • For example: <includeParentId>True</includeParentId>

<skipCrawlReadPermissionCheck> False Set to "True" if you want to index documents that the crawl account does not have at least READ access permissions to (that is, the account has only Browse permissions).
<enableCustomFilterOnGetItem> True

If set to "False", the custom filter is not applied on the GET item calls:

  • This might improve performance by simplifying the item retrieval query but is not recommended when items can move outside of the custom filter as they will not be reported as deleted and will not be removed by incremental crawls.

  • This value is only recommended in special cases.

  • This can be turned off if you know for sure that your custom filter and indexed items will not change in the future and fail to match the filter.

  • You can provide a folder example or when filtering metadata that can never be changed (assigned on document creation and set as read-only).

Deprecated Configuration Options

Setting Default Value Description
<additionalChanges>   'Effective' and dsm_doc_classification='For Internal Use Only' and acl_name like 'd2%' and dsm_doc_collection_key in (select distinct(alias_name) from dm_alias_set where object_name='dsm_es_publishfiuo')
<enableFilterOptimization> True  

Upgrading Documentum

If you upgrade your Documentum from version 2 to version 3, you must create a new Connection Connection defines the how Connectivity Hub connects to your Source System (which contains your documents, graphics, etc.,). Your Connection includes identifying elements such as: URL of the BA Insight web service connector you are using, (File Share connector, SharePoint Online connector, etc.), Authentication mode, User Accounts and Credentials, Database information (for database connectors) and Content Source Content Sources do the following: Receive data from the Source System via the Connection, Filter the data it receives, Provide the results to the Target, Define the specific search index that contains the content you wish to index (and later search)..

Your Connection and Content Source do not carry over from the old version.