Coordinate Your Search Engines to Work Together

 

Access and Specify Settings for Multiple Search Engines

Procedure:

  1. Navigate to the SmartHub Admin page at: http(s)://[web-app-url]/_admin.
    • For example: http://localhost:1234/_admin.
  2. Go to the Federator Properties page.
  3. Left panel in the UI:
    • General Settings: Access the general SmartHub settings.
    • Security Settings: Secure your SmartHub.
      • Users (UPN account)
      • Groups (Dispay name) Security roles: Security toles enables admins to assign different security roles. These roles give non-admin users access to other features.
    • User Profile Settings: Configure your user profile/picture providers.
    • Additional Settings: Configure extra settings such as the cache expiration time.
    • Extensibility: Manage your search engines and Tuning stages.
  4. General Settings:
    • SmartHub Version: 6.x. Set by default.
    • Main Backend: The main search engine settings.
  5. Security Settings:
    • Authentication Mode: Specify the authentication type.

      Note: Every authentication mode comes with its own UI.
    • Admin Users: Specify users that are allowed to access the admin page.
    • Trusted App Redirect URLs: Specify the apps that are allowed to communicate with SmartHub.
  6. User Profile Settings:
    • User Profile Providers: Configure providers that fetch the user properties.
    • User Picture Providers: Configure providers that fetch the user pictures.
  7. Additional Settings:

Specify Results Ordering

Normalize Your Results

  • Normalize your results, or prioritize content sources, by specifying a mixing algorithm.

    • A mixing algorithm means the order in which the search results are displayed on the results page.

    • If you set the mixing algorithm to Rank based in the Administrator user interface, the SmartHub search results are shown in descending order.

  • While ranking scores apply only within the search engine, the mixing algorithm normalizes results across your search engines.

    • For example, a 1,000 point score from search engine A might not be equivalent to a 1,000 point score from search engine B.
    • The equivalent score on engine B might be 500, or 5,000.
    • The rank-based mixing algorithm ensures that when the results are merged, end users see their results sorted by decreasing relevancy (or your specified relevancy score).
  • An alternative to applying the mixing algorithm to search engines is to rank one content source over another.

    • For example, a regional farm using SmartHub might want to rank its local content over its corporate content.

    • Conversely, a global deployment might choose to rank headquarters content higher than its regional content.

  • To perform a ranking operation, modify the mixing algorithm values.

Specify the Mixing Algorithm

  1. Go to the SmartHub ADMINISTRATION page.
  2. Click the Mixing Multiple Search Engines link.





  3. The Mixing Multiple Search Engines dialogue appears.

  4. Mixing Algorithm: This window allows you to chose a mixing method:
    • Rank-Based:

      • Orders the search results based on the ranking from the search engine search engine, as well as the boost and offset values that you can assign.

      • The SmartHub search results are shown in descending order of rank, calculated as:
           Rank=(RankS * Boost) + Offset

      Where:

      • Ranks: Rank that is determined by the search engine.

      • Boostand Offset: Values that can be set for each search engine.

        • For example, if you want to add prominence to results from Backend1, you could set the Boost value to 2.

        • In this case, if a search result on the main search engine has a rank of 100, the same search result on Backend1 would have a rank of 200.

        • Using the same example, in order to give additional prominence to the main search engine, you could boost values set to 1 and set the offset value for the additional search engine to 50.

        • In this case, the same search results on the main search engine and Backend1 would have values of 100 and 150, respectively.

        • Experiment to determine the optimal boost and offset values for your data set in order to get the required results.

    • Round Robin:
      • Orders the search results by taking the first result from the first location, the first result from the second location (if there is one) and the first result from the third location (if there is one).
      • The process repeats starting with the second result from the first location until all of the results from all of the locations are mixed together.
    • Weighted Round Robin:
      • Orders the search results based on the designated search engine weight.
      • See more details in the "Mixing results using the Weighted Round Robin algorithm" below.
      • Support for pseudo random mixing has been removed, you should use weight round robin instead.
    • Scriptable:

      • Enables you to write a custom script to decide how the results should be mixed between the search engines.

      • See the Scriptable Mixing Capability section below for more details

      Tip: Use Weighted Round Robin and Round Robin mixing operations when one of your search engines does not return scores. When possible, this algorithm ensures that each page displays a similar number of search results from each search engine.

  5. Click OK.

Mixing Results from Multiple Search Engines using the Weighted Round Robin algorithm

  • This mixing method is useful when you want to mix results returned from different sources which don't provide relevancy scores (document Rank metadata) or where the scores are not consistent across the board.

  • This mixing algorithm enables you to assign a specific importance (weight) to each search engine.

How to Enable the Weighted Round Robin Mixing Method

  1. Go to SmartHub Admin > Additional Settings > Mixing Algorithm
  2. Select "Weighted Round Robin" from the Mixing algorithm dropdown
  3. Provide the search engine names and weights for each of them in the following format: BackendName1,backend1Weight;BackendName2,backend2Weight; 
    • See more details in the "How to configure" section below.

How to Configure the Weighted Round Robin Algorithm

  • Backend Weights: Text field which accepts the search engine names and weights associated to them.
    • Example: SharePointOnline,7;NetDocs,3;
    • The weights need to be integer values greater or equal to 1. 
    • Note: If the SmartHub Search Engine returns documents from search engines other than the ones specified in the Backend Weights field, then those results will be ignored.
      A "Warning" message appears in the logs informing the user about this.
    • Hint: For intuitive configuration and use, the sum of all weights should be 10, that is, the number of results for a regular page.
      • This ensures on any results page the user will see 7 documents from SharePoint Online and 3 documents from the NetDocuments search engine (if there are enough results available).

How it Works

Let's assume the following configuration:

  • Assume you have 3 search engines configured in your SmartHub Admin:
    • SharePoint Online, Azure, and NetDocs
    • The Backend Weights setting is: 
      • SharePointOnline,7;Azure,2;NetDocs,1;
  • Assume your query returns 10 results from SharePoint Online, 5 results from Azure and 1 from NetDocs, for a total of 16 results
  • Assume your results page shows a maximum of 10 results, so you'll receive 2 pages of results

The algorithm splits the page in 3 "zones" for results.

  • The number of zones is based on the formula: NumberOfZones = RowsPerPage / NumberOfBackends;

  • Each zone shows an evenly distributed number of results from each search engine.

  • Page 1 of the results looks like this:

    • Zone 1 shows:
      • 3 documents from SharePoint Online
      • 1 document from Azure
      • 1 document from NetDocuments
    • Zone 2 shows:
      • 2 documents from SharePoint Online
      • 1 document from Azure
      • 0 documents from NetDocuments
    • Zone 3 shows:
      • 2 documents from SharePoint Online
      • 0 documents from Azure
      • 0 documents from NetDocuments
Note that the results are also ordered descending by backend weight (the search engine with the highest weight has its results at the top of the zone).

How to use the Weighted Round Robin Algorithm in a Scriptable Mixing Stage

In your custom mixing stage you can use the algorithm by calling: WeightedRoundRobinMixingAlgorithm.MixResults function.

  • The function has the following definition:

    List<SearchResult> MixResults(Dictionary<string, Queue<SearchResult>> backendResultsQueues, SearchQuery query, Dictionary<string, int> backendWeights, int numberOfSections)

Parameters:

  • backendResultsQueues: Dictionary where the key is the search engine name and the value is a queue constructed from SearchResults.RelevantResults list
  • query: This is the user query object
  • backendWeights: Dictionary where the key is the search engine name and the value is the associated weight
  • numberOfSections: This value defines the number of zones to be used for a page

Mixing and Pagination

Mixing is applied to the results from all of the search search engines before pagination is applied.

  • For example, if you request 10 results per page, the SmartHub engine requests 10 results from each of the configured search engines.

  • The results are mixed and sorted before all of the results are returned.

  • Only the top 10 results are returned with pagination.

Note:

  • The pipeline extensibility stages are applied to the entire mixed and sorted set of results before pagination is applied.

  • These stages let you implement custom mixing algorithms that override the SmartHub’s built-in mixing algorithms.

Specify the Query Response Time

  1. Go to the SmartHub ADMINISTRATION page.
  2. Click the Mixing Multiple Search Engines link.


  3. The Mixing Multiple Search Engines dialogue appears.



  4. Query Timeout: Click the Query Timeout link. The SmartHub pop-up window appears.


  5. Query timeout: The query, or search engine, timeout represents the time span (in milliseconds) that is allocated to each search engine for query response.

If a search engine is not capable of returning results in the specified time span, an error is displayed on the search result page:

See the warning.

Customize Your Search Error Display

  1. Go to the SmartHub ADMINISTRATION page.
  2. Error Handling: Click and the SmartHub pop-up window appears.


  3. Display mode:

    • Show first: Search errors appear at the top of the search results. Enabled by default.
    • Show last: Search errors appear at the bottom of the search results.
    • Don't show: Search errors are not displayed.
  4. Error icon: Select the .png icon that appears for errors.
  5. Warning icon: Select the .png icon that appears for warnings.
  6. Error Title template: Choose one or both of the following:
    • error level: %level%
    • error message: %message%
  7. Error Description Template: %description%
    • A description of the errors details.
  8. Click OK.

Specify the Query Syntax

  • The text can be added at the beginning or the end of the query.
    • If you add the text in more than one location, the query parses only one and considers the second location to be part of the query term.
  • Quotes (" ") are mandatory. If the quotes are not specified for the search engine list, the search engine list is ignored, and the query is passed as-is to the main search engine only.
  • The search engine list can also be specified in the query text box.
    • Simple cases are supported, but the full KQL syntax is not supported.
  • If the search engine list is not specified, or if this list is empty, the SmartHub acts as a pass through and queries only the main search engine.
  • If you specify a query against multiple search engines, each query must be separated by a semi-colon (;).
  • Search engine names are case-insensitive and must exist in the Total additional search engines in order to be queried against.
    • A warning is issued for search engines that are specified in this list, but are not Registered search engines.
  • If a search engine is specified more than once (either explicitly or as a result of * expansion), this search engine is queried only once.
  • You can use the asterisks character (*) to specify starts with behavior, as shown in the last three examples below:
  • FederatorBackends:"backend1,backend2": Queries against search engines named backend1 and backend2
    • FederatorBackends:"*": Queries all the search engines
    • FederatorBackends:"FirstBackend; s*": Queries the first search engine and all search engines that start with s
    • FederatorBackends:"back*": Queries all the search engines that start with the word "back"

Scriptable Mixing Capability

The mixing script is called at query time after the results are returned from all the search engines.

The script has access to:

  • PerBackendResults: List<SearchResults> 
    • The list of results returned by each search engine
    • To know which search engine a SearchResults object belongs to you can check SearchResults.BackendName
    • See SearchQuery class to find all the available properties
  • Query: SearchQuery
    • The SearchQuery object that was executed for the current search
    • You can use this object to read information about the search - see SearchQuery class to find all the properties available
  • MixingError: FederatorError
    • This allows you to return an error the engine so that it knows something went wrong at mixing time
    • To set an error:

      Sample Mixing error

      MixingError = new FederatorError(FederatorErrorLevel.Error, "Something went wrong")

    • The available Error levels are:

      • Error

      • Info

      • Warning

The script is expected to return:

  • List<SearchResult> that contains a number of results less or equal to Query.RowLimit (10 by default) which represents the list of results that should be displayed for the current search page.
    • If you return less than Query.RowLimit results, the Paging mechanism does not consider additional results to be returned after the current page.

Sample script that returns only the 1st result of each search engine:

Sample mixing script
Copy
var results = new List<SearchResult>();

foreach(var backendResults in PerBackendResults)
{
  if(backendResults.Count > 0)
      results.Add(backendResults[0]);
}

return results;