Set Up and Configure the Amazon S3 Connector

All BA Insight connectors can be downloaded from the Upland Right Answers Portal under Knowledge > BA Insight > Product Downloads > Connectors. This connector is installed with the same generic steps as any BA Insight connector. You must satisfy the Prerequisites for your connector before installing. The configuration specifics are detailed below.

Connection Configuration Specifics

Amazon S3 Connector has to option to choose between web services depending on the content you are crawling.

After installation, when configuring the connection Connection defines the how Connectivity Hub connects to your Source System (which contains your documents, graphics, etc.,). Your Connection includes identifying elements such as: URL of the BA Insight web service connector you are using, (File Share connector, SharePoint Online connector, etc.), Authentication mode, User Accounts and Credentials, Database information (for database connectors) you see a drop-down menu with the options:

  • Basic indexing
  • Related documents indexing

Basic Indexing

Amazon S3 log in details:

  • Login
    • If using the Amazon S3 authentication:
      • Provide the Amazon S3 Access Key ID 
    • If using an external provider:
      • Check with the provider for the appropriate login/user/account data to provide.
      • Example: var username = amazonConnectionConfigurationInfo.AccessTokenInfo.AccessKeyId;
  • Password:
    • If using the Amazon S3 authentication:
      • Provide the Amazon S3 Secret Access Key
    • If using an external provider:
      • Check with the provider for the appropriate password/secret data to provide
      • Example: var pass = amazonConnectionConfigurationInfo.AccessTokenInfo.SecretAccessKey;

Parameters

Basic indexing allows the following parameters in the Amazon S3 Connection Parameters field:

  • <Bucket>
    • Required
    • This is the name of the Amazon AWS bucket to be crawled
  • <UseExternalCredentialsProvider> (optional): This parameter must be set to true if you are using an external service to authenticate. If set to true, you must provide the URL of the external service for authentication. By default, this parameter is set to false. See the following config.xml code for an example:
      <configuration>
                                  <Bucket>bucketName</Bucket>
                                  <UseTemporaryCredentials>false</UseTemporaryCredentials>
                                  <UseExternalCredentialsProvider>true</UseExternalCredentialsProvider>
                                  <ExternalCredentialsProviderUrl>http://www.sampleURL.com</ExternalCredentialsProviderUrl>
                                  <CredentialsRoleARN></CredentialsRoleARN>
                                  <DefaultProxy></DefaultProxy>
                  </configuration>
    • This service should return a set of temporary AWS credentials (key, secret key and session token) - optional
  • <UseTemporaryCredentials> (optional): This parameter must be set to true if you are authenticating in the Amazon S3 source system Your Source System is the repository where your data is stored (data to be indexed). This repository is managed by applications such as: - SharePoint O365 - SharePoint 2013/16/19 - Documentum - File Share - OpenText - Lotus Notes - etc. Your Source System repository can also be a database such as SQL or Oracle. with temporary credentials.
  • <ExternalCredentialsProviderUrl> (optional): This parameter specifies the url of the external service for authentication if you specified true in the UseExternalCredentialsProvider parameter.

  • <CredentialRoleARN> (optional): If <UseTemporaryCredentials> is true, you may also provide an Amazon role to be used for authentication. The permissions set for this role are used further on.
  • <AWS Region>: This is the region of the bucket to be crawled. The default is USEast1

Example With External Authentication Service

Example with external authentication service
<configuration>
    <Bucket>Mytestbucket1</Bucket>
    <UseExternalCredentialsProvider>true</UseExternalCredentialsProvider>
    <ExternalCredentialsProviderUrl>
    http://MyExternalAuthProviderURL
    </ExternalCredentialsProviderUrl>
</configuration>

Related Documents Indexing

This sub connector handles the scenario where multiple files stored in the same bucket should be indexed as a single item. For instance, a contract and its addendum's should be indexed together.

The above configuration details for Basic Indexing also apply to Related Documents Indexing. 

  1. You must provide additional information in the Related Documents Pattern field.

  2. When using this mode, you can specify any combination of the following types of files:

    In the Amazon S3 connector 1.2.0.0 release, support was added for the BinaryContentMetadata multi-value metadata field that contains user-defined metadata from the binary content file in JSON format. If your metadata file has multiple related binary content files, the metadata value will have a JSON record for each file. For example, BinaryContentMetadata: [{ “x-amz-meta-testkey”: “test”, “x-amz-meta-customMeta”: “test value” }]. Your metadata must be regenerated for the BinaryContentMetadata to be visible.

In order for Amazon to know that documents are related to each other, a pattern has to be provided and the documents in the AWS bucket have to respect this pattern in their naming.

Example

Example: related documents
<relateddocumentspattern>
    <binarycontent>(?<filename>.).(?<extension>pdf|docx|pptx)</binarycontent>
    <documenttextproperty>(?<filename>._URLs).(?<extension>txt)</documenttextproperty>
    <documentmetadata>(?<filename>.*).(?<extension>xml)</documentmetadata>
</relateddocumentspattern>

You must not use !CDATA sections in your XML configuration script as these sections are automatically added in the <binarycontent><documenttextproperty> and <documentmetadata> tags.

Content Configuration Specifics

Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system.

For the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects.

  • Amazon S3 does this by using key name prefixes for objects.

  • Amazon S3 Connector supports this structure with the use of filters on content