How to Extract Metadata from PACER Documents

About the PACER Metadata Extractor Component

The PACER Metadata Extractor pipeline stage extracts legal information from Pacer documents.

How to Add the PACER Metadata Extractor to AutoClassifier

Use the following steps to add the Tika Extractor Component and Pacer Metadata Extractor Component to an AutoClassifier pipeline stage. 

  1. Navigate to the AutoClassifier Pipelines component page.
  2. Click New Component and select Tika Extractor from the component list:
  3. Name your new Tika Extractor component and click Add.
  4. After adding your Tika Extractor, Click New Component and select PACER Metadata Extractor from the component list.
  5. Name your new PACER Metadata Extractor component and click Add.
  6. Click Apply to save your changes.
  7. Ensure your new Tika Extractor and PACER Metadata Extractor components are placed in the list of existing pipeline stages.

How to Configure the PACER Metadata Extractor Component

Prerequisites: The Tika Extractor Component must be configured and the  Extract Body and Extract Metadata fields must be enabled.

To configure you PACER Metadata Extractor component, select it from the existing component list and complete the following fields in the Configuration section:

  1. Enable Court Listener Mappings: This field represents whether the output will contain the CourtId and CourtName equivalent to the CourtListener API. If this setting is enabled, PacerCourtId, PacerCourtName, CourtListenerCourtId, and CourtListenerCourtName are displayed instead of CourtName.
  2. Enable Extracting Judge Name: This field represents whether the output will contain the judge name. If this setting is enabled, the Judge will be displayed.
  3. Extract Judge Names Regex Pattern: This field represents the regex that will match the words before the judge name. For example, "hon\.|honorable|district judge"
  4. Click Apply then Cancel.

    Output Properties

    Property

    Type

    Type

    Text

    PublishDate

    Text

    DocumentDisplayNumberText
    CourtNameText
    CostText
    CaseNameText
    CaseIdText
    PacerCourtId*Text
    PacerCourtName*Text
    CourtListenerCourtId*Text
    CourtListenerCourtName*Text
    Judge**Text
    *PacerCourtId, PacerCourtName, CourtListenerCourtId, CourtListenerCourtName metadata properties are returned only if Enable Court Listener Mappings is set to True. In that case, CourtName is not returned.
    **Judge metadata property is returned only if Enable Extracting Judge Name is set to True.