How to Use Learn-To-Rank Features

Important Mentions and other Limitations

Learn-To-Rank works only with SmartAnalytics v5.0 or above.
The Learn-To-Rank features do not work with a NetDocuments backend The search engine your SmartHub instance uses to perform queries. SmartHub can be configured to use more than one search engine..
When upgrading SmartHub the file BAInsight.LearnToRank.Trainer.exe.config is overwritten.
Back up the settings and manually re-add them after the upgrade.

About Learn-To-Rank
Learn-To-Rank Page
Features Configuration
Learn-To-Rank Trainer
Training Results
Features
Settings

About Learn-To-Rank

The Learn-To-Rank component is implemented on top of SmartAnalytics data to support three main features:

Results Boosting
Query suggestions
Content suggestions

Learn-to-Rank uses clustering algorithms (K-means, but extendable to any other algorithm) to build clusters of similar (related) queries and their associated “important” documents.

Learn-To-Rank groups queries and documents based on query similarity as well as user interaction with documents after a query is executed.
These groups are called "clusters".

In a cluster, there are:

Queries that are similar on the containing keywords
Documents that those queries led to
Other queries that are not necessarily similar in keywords, but led to the same documents as other queries

Learn-To-Rank Page

The Learn-To-Rank page can be found in the SmartHub Administration page (https://<hostname>:port/_admin), in the left panel.

The page is composed of two parts:

Features Configuration
Training Results (data)

Features Configuration

Below you see the Learn-To-Rank settings as they appear in the SmartHub interface.

See the section "Training Results" below for more details and examples of the settings shown here.

These settings must be fine tuned, over time, to fit properly with your environment and data
Properly tuned settings reveal the most relevant documents within a cluster and search query

Setting	Default	Description
Learn to Rank Index Name	learntorank-cluster-storage	Learn To Rank index name to be used for cluster storage Note: In case that the name of the index is changed, after an index is already created, the old index will not be removed from Elastic.
Number of days to go back for data	30000	Total number of days for query analytics data freshness.
Number of actions threshold	20	The minimum number of executed queries in order to be used for training. This number must fit your environment. A production environment has thousands or even millions of documents. This number must reflect the data in your environment.
Max number of clusters	10	Total number of clusters that you want your data to be split into. This number must fit your environment. A production environment has thousands or even millions of documents. This number must reflect the data in your environment.
Number of documents to be boosted	2	How many documents are going to be boosted (in case your search query can be assigned to a cluster). For a given cluster (see the example screenshot in "Training Results," below), a value of 2 here would boost the top two documents in the list of documents shown. The boost value for each document is a constant boost (cb) value between 1 and 100, calculated based on how many actions the document to be boosted has relative to maximum number of actions from the same cluster. So if a given document has 1.5 more (or less) actions than any other document in the cluster, the document is boosted on a scale equivalent to that. A document with half (50%) of the actions of the top-most document, would be boosted half as much (50 vs 100). Any query for any of the query terms in a cluster results in seeing boosted documents first in the search results. So even the query
Data cache sliding expiration in minutes	30 minutes	Sliding expiration of data to be stored in cache.
Backend URL property used for boosting	clickUri	The property that your backend uses for path.
Clear cache	N/A	Clear cache for the existing clusters
Scheduled Task Name	BAInsight Learn to Rank Scheduler	The name of the scheduled task used to run the LTR trainer
Scheduled Task Run Interval	7	The time interval that you want your data to be trained. For example, if you set it to 7, it retrains the data every 7 days
Enable	true	This should be set to true in case that you want to automatically train the data

Learn-To-Rank Trainer

The trainer takes the data from SmartAnalytics and creates the clusters (shown in the screenshot in section "Training Results" below).

The trainer is in the SmartHub package /Scheduled Jobs/LearnToRank/Trainer.
If you change the paths for the logger configuration file folder - /config/caching - you have to change it in the trainer as well.

If the installation is successful, a new scheduled task is created in Windows Task Scheduler.
The name of the task is the one specified in the Scheduled Task Name field from the Learn-To-Rank page.

Training Results

Note: Clusters might contain documents that are deleted in the search index.

This is because the Analytics index still contains them.
In time they disappear from this list as usage for other documents increases.
If you want to accelerate the process you can manually delete them from the Analytics index and retrain the data.

Sample Training results are shown below.

Clusters - #1, #2, #3, #4
- Each cluster consists of:
  - A series of queries, shown on the left
  - Document URLs for each query listed on the right
  - Number of actions taken on document on far right (download, opening, previewing, etc.)
- Queries, based on their keywords similarity, are grouped into clusters
Bold text - policy matterid=333056, diabetes treatment, biomedical research, albert gore
- Original cluster query
- The documents this query led to (and a specific threshold number of actions were taken to those documents) are added to the cluster
  - The number of actions threshold is set in the table above in Number of actions threshold.
    - This is set to a very low value of 4 due to the small data set in this sample.
- For example, in Cluster #1, the query text policy matterid=333056 led to the documents shown in Cluster #1 below, "Drug Recall Policy.pdf," "Anti_fraud_and_Fraud_policy.pdf," etc.
- Note: Production environments will most likely have a document threshold in the thousands.
Plain text queries
- Plain text, unbolded queries on left, under the bold text query, are queries that are pulled or "inherited" based on the document list on the right side
- These are the top queries which users have run to discover and take action on the documents listed on the right.
- In other words, a "backwards looking" query on the documents shown yields these queries, listed on the bold, original cluster query

Features

Learn-To-Rank Boosting

During a search, this stage checks if the query matches any of the clusters.

If the query matches a cluster it boosts the documents in that cluster according to their hits (number of clicks, previews, etc.).
The documents that are selected to be boosted are documents that the current query (or very similar query) lead to.
The boost value is proportional with the number of actions of each document that the current query lead to and is within a the 1 - 100 range.

To use Learn-to-Rank Results Boosting:
- Create a stage with empty parameters.
- The stage must be first stage in the list of stages in the section "Query Pipeline Stages"

Learn-To-Rank Query Suggestions

This feature provides suggestions as query text is entered in the search field.

The Learn-To-Rank Query Suggestions provider is located under TypeAhead.

To enable and use this TypeAhead provider:

Add the following line in your custom settings file (from the folder "CustomSettingsTemplate") under the section SH.TypeAhead.CustomSettingsActiveProviders:

Copy

LearnToRankSuggestions: "/modules/TypeAhead/Providers/LearnToRankSuggestions/LearnToRankSuggestions.js"

Example:

Copy

 ActiveProviders: {
            FederatorSuggestions: "/modules/TypeAhead/Providers/FederatorSuggestions/FederatorSuggestions.js",
            PeopleSuggestions: "/modules/TypeAhead/Providers/PeopleSuggestions/PeopleSuggestions.js",
            QuerySuggestions: "/modules/TypeAhead/Providers/QuerySuggestions/QuerySuggestions.js",
            RefinerSuggestions: "/modules/TypeAhead/Providers/RefinerSuggestions/RefinerSuggestions.js",
            SavedQueriesSuggestions: "/modules/TypeAhead/Providers/SavedQueriesSuggestions/SavedQueriesSuggestions.js",
            LearnToRankSuggestions: "/modules/TypeAhead/Providers/LearnToRankSuggestions/LearnToRankSuggestions.js"
        },

Settings

Learn-To-Rank Content Suggestions

This feature provides the user with similar (search) results, excluding those present on the current page.

Learn-To-Rank Query Suggestions settings are shown in the screenshot below.

These similar search results are derived from the Content-By-Search module.

This is also used in the component Similar Documents.

For more about Content-By-Search, see How Users Can Personalize Their Search Results.

Learn-To-Rank Results Suggestions Pipeline Stage Pipeline stages offer uniformity to the end user. Various functions include mapping names and values to match local refinements.
- In order to use Learn-to-Rank Results Suggestion, create a stage with empty Parameters.
- The stage must be first in the list of stages under Query Pipeline Stages section seen in the SmartHub Administration UI.
The Learn-To-Rank Similar Results module is located under <SmartHub installation>/modules/LearnToRank.
- In this module a Learn-To-Rank settings file contains the ID of your Content-By-Search (Learn-To-Rank element) and the URL property.
- This ID can be modified.

Note: Be aware of the relevancy stages order!

Learn-To-Rank works only with SmartAnalytics v5.0 or above.