Contextual metadata merger component
The Contextual metadata merger component allows you to refine, filter, and concatenate your metadata to only include relevant information, while removing any unnecessary or irrelevant metadata. Additionally, this component can prefix your concatenated metadata to another metadata property, such as the Chunk body property from the document chunker, to enhance document retrieval with more accurate results.
Configure the component
To configure you component, do the following:
-
In the AutoClassifier administration portal, Add a new component to a new or existing pipeline.
-
When adding your component, select Contextual metadata merger from the New Component list and provide a name for your component.
-
In the Configuration section, specify the following fields:
-
In the Whitelist field, enter a comma separated list of metadata properties that you want to refine.
-
In the Blacklist field, enter a comma separated list of metadata properties that you want to exclude from being refined.
-
Enable Skip GUID if you do not want to include the GUID in your concatenated metadata for the properties included in the whitelist.
-
Enable Skip Numeric Values if you do not want to include numeric values in your concatenated metadata for the properties included in the whitelist.
-
Enable Skip Date Values if you do not want to include date values in your concatenated metadata for the properties included in the whitelist.
-
Enable Skip URLs if you do not want to include URLs in your concatenated metadata for the properties included in the whitelist.
-
Enable Skip value by Regex pattern(s) if you do not want to include a specified regex pattern in your concatenated metadata for the properties included in the whitelist.
-
If you enabled Skip value by Regex pattern(s), in the Regex field, provide the relevant regex pattern that you do not want to include in your concatenated metadata.
-
In the Metadata To Be Prefixed field, you can provide a metadata property that you would like to be prefixed by the concatenated metadata. For example, if you are using the document chunker component, you would have configured a metadata property name for the extracted chunk body, such as ChunkBody. If you enter ChunkBody in this field, you can prefix the ChunkBody metadata with your concatenated metadata.
-
Enable Output Metadata in Json Format if you want to have your metadata output appear in JSON format.
-
-
Click Save.
Output details
Based on the metadata that was selected to be concatenated, the contextual metadata merger component will extract the applicable metadata from the document and apply it to a new concatenated CompleteContextualMetadata property. For example:
PrinterModels: {'Cannon','HP','EPSON'}
GUID: {2a62f10c-0820-4b3e-8b60-91bf0e428b13,95830936-bc93-4b9e-b4c6-5fbe098c4c26}
Body: {'This is a test body'}
ETID: {1,2,3,4,5}
Date: {1/13/2025 3:40:12 PM}
Summary: {'This is a test summary'}
CompleteContextualMetadata: 'PrinterModels=Cannon,HP,EPSON;Body=this is a test body;Summary=this is a test summary;'
PrinterModels: {'Cannon','HP','EPSON'}
GUID: {2a62f10c-0820-4b3e-8b60-91bf0e428b13,95830936-bc93-4b9e-b4c6-5fbe098c4c26}
Body: {'This is a test body'}
ETID: {1,2,3,4,5}
Date: {1/13/2025 3:40:12 PM}
Summary: {'This is a test summary'}
CompleteContextualMetadata: {'{"PrinterModels":"Cannon,HP,EPSON","Body":"this is a test body","Summary":"this is a test summary"}'}
ChunkBodyPropertyName: 'ChunkBody'
ChunkPageNumber: -1
PrinterModels: {'Cannon','HP','EPSON'}
GUID: {2a62f10c-0820-4b3e-8b60-91bf0e428b13,95830936-bc93-4b9e-b4c6-5fbe098c4c26}
Body: {'This is a test body'}
ETID: {1,2,3,4,5}
Date: {1/13/2025 3:40:12 PM}
Summary: {'This is a test summary'}
CompleteContextualMetadata: {'{"PrinterModels":"Cannon,HP,EPSON","Body":"this is a test body","Summary":"this is a test summary"}'}
ChunkBody:'PrinterModels=Cannon,HP,EPSON;Body=this is a test body;Summary=this is a test summary;Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
In the code above, the contextual metadata merger component removed the GUID, ETID, and date values and returned a concatenated list of the remaining metadata values in the CompleteContextualMetadata property. In the last example, the CompleteContextualMetadata property was prefixed to the ChunkBody property.