Applying Knowledge Graph Entity Extraction to your environment
SmartHub supports Knowledge Graph–driven enrichment and retrieval using AutoClassifier components powered by OpenAI or Azure OpenAI. These components extract structured entities and relationships from content and user queries, enabling SmartHub to combine vector search and graph-based reasoning during conversational search. This allows SmartHub to move beyond keyword and similarity search by leveraging relationships between entities when answering questions.
Configuring Knowledge Graph entity extraction enables SmartHub to:
-
Build structured knowledge from unstructured content.
-
Link documents through shared entities.
-
Enrich conversational search with relationship awareness.
-
Support advanced agentic RAG workflows.
What is Knowledge Graph?
A Knowledge Graph represents information as:
-
Entities: For example, people, products, systems, topics, policies, etc.
-
Relationships: For example, owns, depends on, references, applies to, etc.
Rather than treating documents as isolated text, a Knowledge Graph connects related concepts across content and systems, creating a structured layer of knowledge that AI can reason over.
During conversational question answering, SmartHub applies the same ontology used for document enrichment to the user’s query. This means that SmartHub analyzes the user’s question using the same defined entity types and relationships used to extract entities from documents, ensuring that queries and content are interpreted consistently within the Knowledge Graph.
At runtime:
-
Entities are extracted from the user’s question.
-
These entities are matched against the Knowledge Graph.
-
Related entities and relationships are retrieved from the graph database (for example, Cosmos DB).
Using knowledge graph with vector search for hybrid retrieval
When knowledge graph and vector search are enabled in your environment, when a user asks a question, SmartHub performs retrieval in parallel.
-
Vector search retrieves semantically relevant documents
-
Knowledge Graph queries retrieve connected entities and their relationships
The final response is generated by AI models using both sources of information. Combining vector search with Knowledge Graph reasoning allows SmartHub to:
-
Understand user intent beyond keywords
-
Surface related concepts that may not appear verbatim in documents
-
Answer relationship-based questions (“What depends on X?”)
-
Handle follow-up questions more accurately
Security for Knowledge Graph
SmartHub’s Knowledge Graph capability uses OpenAI and Azure OpenAI for entity and relationship extraction, and stores graph data in Azure Cosmos DB using the Gremlin API. Within this Graph RAG architecture, user questions are translated into dynamically generated Gremlin queries, executed against the graph database, and then summarized by an LLM to generate the final response.
Because graph traversals can dynamically explore connected nodes and relationships, access control at the entity or relationship level cannot be reliably enforced once data has been ingested into the Knowledge Graph. Unlike traditional relational systems, there is no deterministic mechanism at query time to evaluate who owns a node, who inserted it, or whether the requesting user is authorized to access it. Once entities and relationships become part of the traversal space, they may be retrieved directly or indirectly through connected paths.
Traditional security approaches such as row-level security, column masking, role-based access control, or attribute-based access control are not designed for LLM-generated graph traversals. In a Knowledge Graph, relationships themselves can expose sensitive information, and dynamically generated queries may surface restricted data even if it was not explicitly requested.
Consider the following risk scenarios:
Salary information
If an entity contains Employee > HAS_SALARY > SalaryNode, a user query that asks "show all employees and their details" may indirectly retrieve salary-related nodes.
Personal information
If an entity contains Employee > HAS_EMAIL > EmailNode, a user query that asks "Give me the contact information of all employees" may expose internal or confidential email addresses.
For this reason, security must be enforced at the ontology design and ingestion stage, not at runtime. Post-query filtering is also unreliable, as removing nodes after traversal can result in incomplete, misleading, or logically inconsistent answers.
Secure ontology design principles
Security for the Knowledge Graph must be enforced at the modeling and ingestion stage, not at query time.
Do not ingest fields such as:
-
Salary or compensation details
-
Personal email addresses or phone numbers
-
Personal identifiers
-
Financial data
-
HR-confidential attributes
-
Any information requiring restricted or role-based visibility
If a field would normally require RBAC enforcement in a traditional system, it should not be stored in the Knowledge Graph.
Sensitive information should remain in secure source systems (for example, HR databases or protected APIs) where traditional access control mechanisms can be enforced reliably. The Knowledge Graph should reference contextual relationships, not store confidential records.
Appropriate use cases for the Knowledge Graph include:
-
Organizational structure (non-sensitive elements)
-
Skill and expertise relationships
-
Project associations
-
Public metadata
-
Conceptual or policy relationships
The Knowledge Graph is intended to enhance reasoning and retrieval by connecting concepts. It should not serve as a primary system of record for private or regulated data.
Implementation guidance
Before enabling Knowledge Graph ingestion:
-
Review your ontology for sensitive attributes.
-
Validate which entity types and relationships are being extracted.
-
Confirm that restricted fields are excluded from ingestion.
-
Treat the Knowledge Graph as a derived intelligence layer designed for contextual reasoning.
By designing the ontology with security in mind from the outset, you reduce the risk of unintended data exposure during conversational search and graph-based reasoning.
For more information on ontologies, see Filters.