Terminology Extraction

Good terminology is what keeps a translation consistent. Rather than build a termbase by hand, Terminology Extraction reads your documents, finds the terms that matter, and helps you turn them into termbase entries. It can work in one language or two, and it walks you through reviewing every candidate before anything is saved. This guide covers the purpose and, in detail, the options at each step.

What it is for

The feature scans your text for candidate terms — single words and longer phrases — and proposes them for a termbase. You choose between two modes, and this choice shapes the whole run:

Mode	What you get
Monolingual	Terms found in one language. Good for building a source glossary.
Bilingual	Source terms and their translations, linked into pairs. Builds a full bilingual termbase.

Starting a run

Open Terminology Extraction Tasks and choose Create New Extraction. Each run is saved as a task you can pause and resume, so a large extraction never has to be done in one sitting.

Figure 1. The Terminology Extraction Tasks list. Create a new run, or resume one already in progress.

Step 1 — Configure the extraction

This is where most of the options live. Set them once for the run.

Option	What it does
Task Name	A name for the run, so you can find it again.
Extraction Type	Monolingual or Bilingual, as described above. This is locked once you add a file.
Select Files	The documents to scan — XLIFF, TMX, Word, PowerPoint, and IDML. Languages are detected from the first file.
LLM	The model that does the extraction. Choose it with Change….
Anonymization (Optional)	The same privacy profiles as elsewhere — Anonymizer, Numerical Obfuscation, and Boilerplate — applied before any text reaches the model.
Parent termbase	An existing termbase to compare against. Candidates already in it are flagged, so you only review what is genuinely new.
Minimum words/characters per batch	How much text is sent to the model at a time. The default is a sensible balance.
Retain Context	Includes the surrounding text, giving the model more to work with at a little extra cost.

Privacy applies here too

Extraction sends your text to a model, so the same protection matters. Attach an Anonymizer or Numerical Obfuscation profile in the Anonymization (Optional) section, and confidential content is masked before it leaves your machine — exactly as in guide 1.

Figure 2. Configuring a run. Name it, choose monolingual or bilingual, add files and a model, and set privacy and parent-termbase options.

Step 2 — Configure the prompt

Next you choose the prompt templates that guide the extraction. A Source Prompt Template is always needed; a bilingual run adds a Target Prompt Template for finding translations. You can add your own instructions, and for the target side decide how many reference examples to give the model and whether to favour the most distinct candidates.

Figure 3. Choosing the prompt templates and any extra instructions that shape what counts as a term.

Step 3 — Run and watch progress

Start the extraction and the wizard works through your text. It shows live progress — how many segments are done, how many terms have been found, and the tokens and cost so far — so a large run holds no surprises. You can cancel at any time.

Figure 4. Extraction in progress, with live counts for segments, terms, tokens, and cost.

Step 4 — Review the candidates

Nothing is added to a termbase without your say-so. The review screen lists every candidate term with how often it appears, and a context panel shows the sentences it came from. You decide what to keep:

Check or uncheck terms individually, or use All and None.
Set a minimum frequency to hide one-off terms.
Edit a term’s wording before you accept it.
If you set a parent termbase, terms already in it are marked Existing or New, and you can uncheck the existing ones in one click.

Figure 5. Reviewing candidates. Frequency, context, and parent-termbase flags help you keep the terms that matter.

In a bilingual run, the wizard then finds a translation for each approved term, and you review the pairs. Each pair shows a confidence score; you approve or reject it, mark a preferred link, and add notes. Pairs already present in a parent termbase are flagged so you do not duplicate them.

Figure 6. Reviewing bilingual links. Approve the source-and-translation pairs you want, with confidence scores to guide you.

Step 5 — Commit and export

Finally you decide where the terms go. Create a new termbase, or merge into an existing one. Choose how much to export — just the approved terms, or all of them — and which formats to write, including bAIbel’s own format and MultiTerm XML for use in other tools.

Before committing, run the Pre-flight Check. It summarises what will be created and what will be skipped as already existing, so you can confirm the outcome before anything is written. Then Commit & Export finishes the job.

Figure 7. Commit and export. Pick the destination termbase and formats, run the pre-flight check, then commit.

Terminology used in this guide

Terminology Extraction: The feature that finds candidate terms in documents and builds termbase entries.
Monolingual / Bilingual: Extracting terms in one language, or source terms with their translations linked into pairs.
Candidate: A term the wizard has proposed, awaiting your review.
Parent termbase: An existing termbase used to flag candidates that are already known.
Pre-flight check: A summary of what will be created or skipped, shown before you commit.