FAQ – TopBraid Auto Classifier
What can TopBraid EVN do besides manage vocabularies?
TopBraid EVN can put your managed vocabularies to work both by exporting them in a range of formats for other tools to import as well as through dynamic communication with other tools via a RESTful API. This lets it support search engines, Content Management Systems such as SharePoint, and other components of a Service Oriented Architecture that need to enrich content and search strings with semantic metadata.
How can I get started with AutoClassifier?
You can start by importing unstructured content you want to classify into EVN, selecting a vocabulary with concepts that will be used to classify this content, and providing training data AutoClassifier will learn on (either by manually tagging some resources, or by importing existing training data where available).
What professional services are available for AutoClassifier?
Every project and organization has different needs. TopQuadrant provides various flexible services related to unstructured content management goals, including:
- Content and tags sets lifecycle management;
- Classification strategy definition;
- Modeling services for vocabularies that act as AutoClassifier’s sources of classes;
- Integration of the automated classification into a larger workflow.
Data Integration and APIs
What formats and unstructured data sources are supported by AutoClassifier?
AutoClassifier and EVN Tagger are built on the TopBraid Live platform, and can make use of data, text and documents in any format that can be imported into TopBraid Live. This includes any format that can be converted to RDF, and data sources such as REST APIs. TopQuadrant offers TopBraid Composer as a powerful development environment for such conversions and integrations, and offers training and professional services to support the process. Examples of formats that have been successfully used are XML, PDF, Microsoft.
What APIs are available?
The main AutoClassifier functionalities can be accessed through SPARQL functions in a SPARQL Endpoint or specific RESTful web services described in the product documentation under the section titled “Using EVN Tagger”.
How can I evaluate the relevance of tags proposed by AutoClassifier?
Every tag proposed by AutoClassifier has a confidence value expressed as a percentage. Low-confidence tags can be discarded automatically, and proposed tags can always be reviewed manually.
We want to automate classification of a repository of ___ documents (e.g. PDF or DOCX), how can we run AutoClassifier on it?
One approach is to import these documents into an RDF content graph that can afterwards be managed in EVN infrastructure. Another approach is to integrate the functionality into an existing content repository via the AutoClassifier API.
How large should training datasets be?
The size of training datasets doesn’t have to increase in proportion with the size of the documents corpus you want to classify. Their quality matters, i.e. training documents should be tagged with concepts of which labelling forms do occur in the document, but you don’t need an exhaustive set of training documents for all available concepts before each of them can be assigned by the auto-classification process. More on training datasets can be found here.
What languages are supported for the unstructured content?
AutoClassifier relies on an algorithm that requires a language-specific stemmer and stopword lists. AutoClassifier currently includes these for English, French, German and Spanish, and we may add more if demand exists. The rest of the tool is language-agnostic. The quality of results should be fairly independent of the language.
How long does AutoClassifier needs to process a documents set?
This depends on many parameters such as:
For example, in one application, auto-classifying a set of 1000 short documents against a thesaurus of 26000 entries took 90 seconds on a 2015 laptop. Training AutoClassifier for this application with 100 training documents took less than 10 seconds.
Installation, Setup, Troubleshooting
What operating systems requirements apply to AutoClassifier?
AutoClassifier makes use of Maui Server for its machine learning capabilities, which itself can run on any environment supporting EVN.
How does authentication works with AutoClassifier?
AutoClassifier internally uses a web service packaged as a separate web application. This web application can be deployed either on the same application server as EVN, or a different one. The application server’s security features may be used to lock down access to the service. EVN can be configured to use a username and password when communicating with the service.