Skip to main content

Corpus Management

This section is dedicated to the corpus management functionality in Graph Modeling.

The corpus management functionality in Graph Modeling supports you in extending thesauri with relevant terms derived from documents matching the domain of your thesauri. In addition corpora are used to improve entity extraction providing improved scoring of terms and concepts and offering shadow concept suggestions based on co-occurrences.

You can also start to create a new thesaurus from scratch based on a corpus.

Graph Modeling's Corpus Management Functionality

In order to enrich your thesaurus with terms, using the corpus management function, you can process documents (PDF, DOC, Powerpoint, TXT, etc.) that are related to your project's domain or harvest RSS feeds, web sites and DBpedia resources linked to the concepts in your thesaurus.

The Graph Modeling corpus management tightly integrates the Graph Modeling Extractor into the thesaurus management process. It uses the extractor's ability to analyse text and extract terms and phrases, which then are matched against the concepts in your thesaurus. You can then integrate extracted domain specific terms as new concepts or synonyms of existing concepts into your thesaurus.

The terms you decide to select and use for integration into your thesaurus from the extracted terms are called 'Candidate Concepts' in Graph Modeling. Find details about their handling and the possible workflow here: Candidate Concepts List

The following image shows an example Corpus Management view, where a corpus called 'Cocktails' already has been created:

1657042d0a685a.png

To learn in detail how to use the Corpus Management feature, refer to the following topics:

Note

Multiple corpora are available for Graph Modeling Enterprise Server and Graph Modeling Semantic Integrator.

Graph Modeling Advanced Server allow one corpus per project.

You can manage your corpus or corpora programmatically as well, or automated remotely by using the Graph Modeling Corpus API services, such as: Web Service Method: Create a New Corpus, Web Service Method: Upload a Document to a Corpus, Method: analyse corpus, Web Service Method: Request Concept Matches of a Corpus, etc.

In addition you can significantly improve extraction results of free terms by using a corpus. Details find here: Free Terms Extraction Based on a Text Corpus

Tip

If you would like to learn more about this topic, please watch this Graph Modeling Academy Tutorial video:

2.4 Corpus Management Basics