EDG Copilot

Introduction

EDG Copilot is a suite of AI-powered features integrated into EDG, designed to assist users in creating, discovering, and managing data more efficiently. EDG Copilot features are only available in SaaS environments.

Use EDG Copilot Chat with EDG

Chat with EDG answers questions about EDG based on the official documentation.

Note

Currently, only single questions are supported and not full dialog. Dialog support will be available in the next release.

Enable EDG Copilot Chat with EDG

Please contact support to enable the Chat with EDG feature for your EDG installation, if it’s not already enabled.

Use EDG Copilot Chat with EDG

You can find the Chat with EDG feature on the start page, at the bottom right side. Press the chat icon to open the chat panel.

Chat with EDG Icon

When the chat panel opens, a list of frequently asked questions appears as examples to help you get started. To ask your own question, enter it in the input field at the bottom and press Enter or the send button to submit your question.

Chat with EDG prompt

After submitting your question, it may take a short while for the answer to be generated. Once generated, the answer appears in the chat panel. You can scroll through the chat history to review previous questions and answers.

Some answers may contain many details, which can make them too small to read comfortably in the chat panel. In such cases, press the expand button to make the answer full screen, as shown below:

Chat with EDG full screen answer

Use EDG Copilot Linking

EDG Copilot Linking leverages the Vector Index to add properties that refer to other asset collections using the search methods provided by the Vector Index. Based on properties given in the configuration, EDG Copilot Linking will search for matching resources in the target asset collection. Any asset collection for which the Vector Index has been enabled can be used as a target for EDG Copilot Linking.

Enable EDG Copilot Linking for a property

EDG Copilot Linking must be configured in the property shape in an ontology. The following example shows how to enable EDG Copilot Linking for the related property of SKOS Concept. The content of the source properties is used to search for matching resources in the target asset collection. For the related property, definition and preferred label should contain the information that the target should match. In other cases, there could be a dedicated literal that matches better. For example, if there is already a literal for brand and EDG Copilot Linking should add a property to a catalog of brands, that literal property should be used as the source. The asset collection that contains the target must be selected in graph. A search options data structure must be added where further settings can be configured. The search options can be used by multiple EDG Copilot Linking property configurations.

EDG Copilot Linking configuration for SKOS Concept related

Search options can be used to tweak the search for better results. The most important settings are:

Parameter

Description

search alpha

The relative weight of keyword and vector search for the hybrid search in the range between 0 and 1. 0 is pure keyword search, and 1 is pure vector search. As the hybrid search includes a normalization step, setting this value to 0 or 1 may not give the same result as changing the method to keyword or vector.

search limit

The upper limit of results that will be shown.

search method

The search method used by the Vector Index. - exactPhraseMatch uses the keyword search to find full matches of a phrase. For example, New York doesn’t match York, only New York. - hybrid combines the results of keyword and vector. It gives a high probability to exact matches and adds semantic similarity to the mix. As it gives the best results for most use cases, it’s used as default. - keyword uses BM25 to rank exact matches. - vector uses the configured embedding model to calculate the cosine similarity as base for the probability.

search threshold

A threshold value for the search score. Only results with a score value greater or equal to the threshold will be shown.

EDG Copilot Linking search options

Note

A separate ontology asset collection should be used for the EDG Copilot Linking configuration if the underlying ontology is generic and not designed for a single target. The ontology for the configuration can be added under SettingsIncludes, like in the following example where SKOS arXiv AI Linking contains the EDG Copilot Linking configuration.

EDG Copilot Linking ontology include

Applying EDG Copilot Linking suggestions

EDG Copilot Linking based suggestions are shown in the Problems and Suggestions panel. Run AI Linking Suggestions must be enabled in the dropdown menu on the top right. It can be used combined with other actions. If this is not wanted, all other Run actions should be unchecked. The Apply button will create the suggested property. If the property shape allows multiple values, multiple suggestions can be applied.

EDG Copilot Linking configuration for SKOS Concept related

Note

Problems and Suggestions can be triggered for smaller batches using batch actions. In the tree of the Taxonomy Concepts panel, batch actions can be triggered in the dropdown menu that opens on right click.

Use EDG Copilot Natural Language to SPARQL

Enable the EDG Copilot Natural Language to SPARQL Feature

The following three steps are required to enable the Natural Language to SPARQL feature for an asset collection:

Configure the Vector Index for the Ontology

To enable the EDG Copilot Natural Language to SPARQL feature for an asset collection, it is necessary to index the ontology that is used. It requires configuring a predefined list of classes and properties in the ontology. This is required for finding relevant classes and properties for a given prompt. For more details on how to configure the Vector Index, please refer to Enabling the Vector Index for an Asset Collection.

The following classes must be configured:

  • Class: owl:Class

  • Property (rdf:Property)

  • Property Shape (sh:PropertyShape)

And the following properties must be configured:

  • labels: rdfs:label, order 1, keyword true

  • names: sh:name, order 2, keyword true

  • comments: rdfs:comment, order 3, keyword false

  • descriptions: sh:description, order 4, keyword false

The configuration should look like this:

Vector Index configuration for the ontology

Configure the Vector Index for the Asset Collection

It’s also required to configure the Vector Index for the asset collection that should be queried. There is no predefined configuration for that use case. The classes should be selected to cover all relevant resources that could be queried, while avoiding classes from imported asset collections that generate noise or represent irrelevant metadata. Properties for labels, names, descriptions, comments, and identifiers should be selected. For more details, see the Vector Index documentation.

Configure Natural Language to SPARQL for the Asset Collection

It’s mandatory to configure the SPARQL ontology graph. Select the ontology that has been configured for the Vector Index in the previous step. Optionally, tweak the SPARQL data search options and SPARQL ontology search options, like set a threshold to reduce the noise in the results or change the alpha value to give different weight to keyword and vector search results. It’s possible to select an alternative SPARQL LLM (Large Language Model), but the default model is recommended for most use cases.

Natural Language to SPARQL configuration

Use the EDG Copilot Natural Language to SPARQL Feature

When everything is configured, you can use the Natural Language to SPARQL feature. An additional input field appears at the bottom of the SPARQL Query panel, where you can enter your prompt. After entering your prompt, press the arrow up button to generate the SPARQL query. Please note that it may take a short while for the query to be generated.

Prompt input for Natural Language to SPARQL

Once generated, the query appears in a separate box for your review. Since generating the query is not deterministic, there is a button to regenerate the query if the initial result is not satisfying. If you are satisfied with the generated query, you can transfer it to the main SPARQL query editor by pressing the green accept button.

Query generated by Natural Language to SPARQL

Use EDG Copilot Tagging Content

Any asset collection for which the Vector Index has been enabled can be used by the AutoClassifier in the Content Tag Set. Unlike using Maui Server, this method doesn’t require a training step.

See also

See Content Classification in EDG for a detailed guide on content classification.

After creating a Content Tag Set, the AutoClassifier must be configured. Go to Manage, Advanced, Configure AutoClassifier.

TopBraid EDG Advanced Section of the Manage Tab

Under Content properties, select all properties with content that should be used by the AutoClassifier. In this example, content and title are used but other properties like filename can be of interest if the documents have meaningful filenames. The Tag Selection Strategy acts as a filter on the concepts of the taxonomy. In this example, only the most specific tags are used to ignore concepts with child nodes. The Probability threshold must be adapted to the Content Tag Set. Each combination of a corpus and a taxonomy has their own reasonable threshold. Check some documents in the Taggings tab to find a good threshold value. Once finished, press the Save Changes button.

TopBraid EDG AutoClassifier Configuration

The Taggings tab should show documents from the corpus. Select one to see concepts found by the AutoClassifier in Recommended Concepts.

TopBraid EDG Selected Document with Recommended Concepts