Good terminology is what keeps a translation consistent. Rather than build a termbase by hand, Terminology Extraction reads your documents, finds the terms that matter, and helps you turn them into termbase entries. It can work in one language or two, and it walks you through reviewing every candidate before anything is saved. This guide covers the purpose and, in detail, the options at each step.
The feature scans your text for candidate terms — single words and longer phrases — and proposes them for a termbase. You choose between two modes, and this choice shapes the whole run:
| Mode | What you get |
|---|---|
| Monolingual | Terms found in one language. Good for building a source glossary. |
| Bilingual | Source terms and their translations, linked into pairs. Builds a full bilingual termbase. |
Open Terminology Extraction Tasks and choose Create New Extraction. Each run is saved as a task you can pause and resume, so a large extraction never has to be done in one sitting.
This is where most of the options live. Set them once for the run.
| Option | What it does |
|---|---|
| Task Name | A name for the run, so you can find it again. |
| Extraction Type | Monolingual or Bilingual, as described above. This is locked once you add a file. |
| Select Files | The documents to scan — XLIFF, TMX, Word, PowerPoint, and IDML. Languages are detected from the first file. |
| LLM | The model that does the extraction. Choose it with Change…. |
| Anonymization (Optional) | The same privacy profiles as elsewhere — Anonymizer, Numerical Obfuscation, and Boilerplate — applied before any text reaches the model. |
| Parent termbase | An existing termbase to compare against. Candidates already in it are flagged, so you only review what is genuinely new. |
| Minimum words/characters per batch | How much text is sent to the model at a time. The default is a sensible balance. |
| Retain Context | Includes the surrounding text, giving the model more to work with at a little extra cost. |
Extraction sends your text to a model, so the same protection matters. Attach an Anonymizer or Numerical Obfuscation profile in the Anonymization (Optional) section, and confidential content is masked before it leaves your machine — exactly as in guide 1.
Next you choose the prompt templates that guide the extraction. A Source Prompt Template is always needed; a bilingual run adds a Target Prompt Template for finding translations. You can add your own instructions, and for the target side decide how many reference examples to give the model and whether to favour the most distinct candidates.
Start the extraction and the wizard works through your text. It shows live progress — how many segments are done, how many terms have been found, and the tokens and cost so far — so a large run holds no surprises. You can cancel at any time.
Nothing is added to a termbase without your say-so. The review screen lists every candidate term with how often it appears, and a context panel shows the sentences it came from. You decide what to keep:
In a bilingual run, the wizard then finds a translation for each approved term, and you review the pairs. Each pair shows a confidence score; you approve or reject it, mark a preferred link, and add notes. Pairs already present in a parent termbase are flagged so you do not duplicate them.
Finally you decide where the terms go. Create a new termbase, or merge into an existing one. Choose how much to export — just the approved terms, or all of them — and which formats to write, including bAIbel’s own format and MultiTerm XML for use in other tools.
Before committing, run the Pre-flight Check. It summarises what will be created and what will be skipped as already existing, so you can confirm the outcome before anything is written. Then Commit & Export finishes the job.