Skip to main content

Extraction Model and Indexing

12/05/2026

The Extractor uses an indexed data structure of the thesaurus to perform fast matching across all data. This data structure known as the Extraction Model must be refreshed when the information stored in the thesaurus changes to ensure the latest data is used for extraction. In the default extraction model, updating the index is a manual process and users must trigger a refresh after making changes to the thesaurus. This process requires a brief period of processing time before the new data is available for API calls.

The default extraction engine Elasticsearch serves as the distributed search and analytics engine responsible for the storage and retrieval of structured, unstructured, and vector data. It performs high-speed hybrid and vector search operations and provides the underlying indexing layer required for the persistent storage of taxonomy data as well as the execution of complex query analytics.

Tip

For more information on the alternative Lucene engine, skip to the last section of this page.

In this chapter, we will walk you through a variety of API calls. To get your copy of this project, please click here to download the sample project file. Follow the instructions on how to create a Graph Modeling project using the Create a Project from a Graph Modeling native file function.

You can use the sample project to execute some Extractor calls in your browser's address bar. One simple Extractor call for querying projects would be {{url}}/extractor/api/projects/ where {{url}} stands for the server running your installation. You can also use tools like Curl or Postman for executing the calls to the API. Keep in mind that you need to authenticate using either OAuth 2.0 to be able to access the API endpoints.

The extraction model relies on Concept Schemes containing Top Concepts and regular Concepts. Keep in mind that the Extractor needs top concepts to be able to perform categorization, whereas general concept information is contained in the model itself. Furthermore, a thesaurus may contain multiple concept schemes, meaning that you will need filters to narrow down the number of results.