# Import and Export a Thesaurus
# Import a Thesaurus
You can import a synonym list to use ready-made dictionaries and fill a thesaurus with values more quickly.
You can as well add terms to an already existing thesaurus. When importing, the system can add completely new parents and names with synonyms and misspellings or add names, synonyms and misspellings to existing parents and names.
For synonyms and misspellings, you can decide whether to add or replace them to existing entries.
- Create a new thesaurus or open an existing thesaurus.
- Click action "Import thesaurus".
- Select a character encoding (default: UTF-8).
- Select an import behaviour (details).
- Drag and drop a thesaurus file or select it from your computer.
Please note: This file must be a CSV file.
Import pop-up window
# Import Behaviour
In case the system recognizes existing/duplicate terms, you can decide whether to add or replace synonyms and misspellings. Duplicates are checked both between the import file and an existing thesaurus, and within an import file.
If the system recognizes duplicate terms in the CSV file itself, synonyms and misspellings further down in the file will be added to or replace entries further up.
# Export a Thesaurus
You can also export an existing thesaurus, e.g. to edit/extend it outside the 4ALLPORTAL.
- Open a thesaurus and click action "Export thesaurus". A pop-up window opens.
- Set a character to delimiter the values (default:
,
). Please note: Choose the character depending on the tool you want to work with. Microsoft Excel requires a;
, for example. - Select a character encoding (default: UTF-8).
- A CSV file will be downloaded (example).
- Optionally, you can import the file into a spreadsheet.
Export pop-up window
# Supported Character Encodings
We support the following character encodings:
- US-ASCII
- ISO-8859-1
- UTF-8 (default)
- UTF-16BE
- UTF-16LE
- UTF-16
Please note: When re-importing a CSV file, make sure you choose the same character set that you used when exporting.
# CSV File Specifications
When importing or exporting the synonym tree of a thesaurus, the content must be stored in a CSV file: A comma-separated list document with a header and a new line for each thesaurus term:
parent,name,synonym_0,synonym_1,wrongspelling_0
,motorcycle,motorbike,motor-bike,
,bike,bicycle,cycle,bycycle
If you are creating or editing a CSV file, this structure must be ensured:
- Header line: The first line is the header. It is relevant for the correct mapping and linking of terms and must contain the following:
parent
,name
,synonym_0
,(...)wrongspelling_0
,(...). In a spreadsheet, each entry from the header would be a separate column. - Header size: The number of
synonym
andwrongspelling
entries is variable and depends on your synonym tree's complexity. An extra entry is required for each synonym or misspelling. The term(s) with the most synonyms and misspellings determine the number of required entries. - Term lines: Each term from the thesaurus needs a separate line in the file. Synonyms and misspellings do not require a separate line.
- Comma-separation: The commas separate the "columns" (entries) defined in the header.
- The notation is without spaces between comma and entry.
- The CVS file expects all commas from the header, even if there may be fewer entries (compare our example here).
- If a term has no parent (top level), the notation is:
,term
.
Tip
If you create a CSV for import, you can omit the numbering for synonym
and wrongspelling
- these are just for clarity.
# CSV Example
We want to export our example thesaurus "Vehicles":
The according CSV file vehicles.csv
looks like this:
parent,name,synonym_0,synonym_1,synonym_2,wrongspelling_0,wrongspelling_1,wrongspelling_2
,motorcycle,motorbike,motor-bike,,,,
,bike,bicycle,cycle,,bycycle,,
bike,electric bicycle,electric bike,e-bike,ebike,,,
electric bicycle,pedelec,pedal electric cycle,pedal assist,EPAC,pedilec,padelec,pedelac
bike,mountain bike,mountain bicycle,MTB,mountain-bike,,,
motorcycle,cruiser,,,,,,
- The header has eighth entries. Next to always required
parent
andname
, one term comes with three synonyms and three misspellings. - Lines
,motorcycle
and,bike
start with a comma, because there is no parent entry. - Line
,motorcycle
ends with four commas, because it only has two synonyms (out of possible three) and no misspellings (out of possible three). - Line
electric bicycle
starts without a comma, becauseelectric bicycle
defines the parent. Termpedelec
is the child element.pedal electric cycle
,pedal assist
andEPAC
are the synonyms.pedilec
,padelec
andpedelac
are the misspellings.
If opened as a spreadsheet, the file looks like this: