Canonical Library Guide¶

The canonical library centralises every approved reference value. Reviewers can curate records manually or load large batches of rows in one action. This guide documents the workflows now supported by the dedicated Canonical Library page in the reviewer UI and provides an Abu Dhabi–specific dataset ready for import.

Page overview¶

The library page offers five key capabilities:

Filter & search – Narrow the table by dimension and/or keyword to locate existing canonical values quickly.
Create & edit – Launch the editor modal to add brand-new entries or update labels, dimensions, descriptions, and any dimension-specific attributes.
Bulk import – Upload CSV/TSV/Excel files or paste tabular rows, review the automatic column suggestions, and map headers to canonical fields before committing the import.
Attributes – Capture the extra fields defined for the active dimension (for example, codes or international identifiers) alongside each canonical value.
Export – Download the filtered table to CSV for auditing or offline collaboration. The export includes attribute columns so downstream tools can preserve the additional metadata.

Canonical Library Grid — Browse and filter canonical values organized by dimension

All changes are persisted via the /api/reference/canonical endpoints exposed by the FastAPI backend.

Adding or editing records¶

Open Canonical Library from the navigation bar.
Use the New canonical value button to capture the dimension, canonical label, and optional description (e.g., Arabic translation or code).
To edit a row, choose Edit in the table. Adjust any field and save to persist the change.
Use Delete to remove records that are no longer valid. A confirmation modal protects against accidental deletions.

Edit Canonical Value — Edit canonical values with dimension-specific attributes

Bulk import walkthrough¶

The importer accepts CSV, TSV, or Excel workbooks. Provide a header row describing each column; the preview step inspects the headers and sample rows, proposes sensible defaults (label, dimension, description, and attribute candidates), and lets you map or ignore each column before creating any records. Excel uploads no longer need to strip out metadata tabs or title banners—the backend now scans every worksheet, discards prefatory rows until it finds the first genuine header, and keeps the data immediately below it even when earlier cells are merged. When the dataset targets a brand-new dimension, you can capture the dimension label and optional description inline—the backend will create the dimension and its attribute schema automatically during the import.

Bulk Import Modal — Upload and preview files before importing canonical values

Column Mapping — Map file columns to canonical value fields

If an Excel workbook contains multiple sheets, the preview step now lists every sheet so you can choose which one to import. Work through larger templates one tab at a time without having to split the workbook manually.
Every import now performs a dry run before committing rows. The UI highlights any canonical values that already exist and lets you decide whether to update those entries or skip the duplicates entirely.
Columns can be separated by commas, tabs, or multiple spaces when pasting raw text.
Empty dimension cells inherit the selected target dimension when no dimension column is mapped—helpful for single-dimension datasets.
Attribute columns can be mapped to existing schema keys or defined on the fly for new dimensions. Attribute types default to text but can be adjusted to numeric or boolean as part of the mapping step.
Backend logs include the resolved filename, detected columns, and the number of created versus skipped rows. Check the FastAPI container logs for entries such as Bulk canonical import received or Generated bulk import preview when diagnosing issues.
Uploading both a file and pasted rows prioritises the file contents; remove the file to import the pasted data instead.

Abu Dhabi regional dataset¶

A ready-to-use dataset that mirrors the source material provided with this task lives at docs/data/abu_dhabi_canonical.tsv. It contains the emirate, region, and district hierarchy with English labels, numeric codes, and Arabic names in the description.

To bulk load the dataset:

Open docs/data/abu_dhabi_canonical.tsv in your editor or run:
```
cat docs/data/abu_dhabi_canonical.tsv
```
Copy all rows starting from the second line (skip the header row).
In the Reviewer UI, select Bulk import, upload the TSV file (or paste the copied rows), and click Review mappings.
Map the detected columns to the canonical label, dimension (or default dimension), and any attributes. Adjust attribute data types if you're creating a new dimension.
Click Import rows to create the records. The importer reports how many values were created, highlights any issues inline, and automatically sorts the additions alongside existing values.

Tips for custom datasets¶

Include stable identifiers (codes) either in dedicated attribute columns or the description so downstream consumers can map raw values reliably.
Use consistent dimensions (e.g., region, district, currency) to keep filtering predictable.
When storing multilingual labels, concatenate translations with clear separators, for example: English | Arabic.
The importer ignores blank lines and lines prefixed with #, enabling lightweight in-line commentary.
For spreadsheet imports, multiple worksheets are supported. The loader inspects every sheet and selects the first region that looks like a tabular dataset, so workbooks with cover sheets or audit notes continue to import without manual editing.

Troubleshooting¶

"Unable to load canonical library" toast – Verify the backend container is running. The UI now reports which resources failed to load, and the backend automatically recreates the default configuration if it has been wiped—refresh the page after clearing a database.
Import validation errors – Ensure each row contains at least two columns (dimension + canonical label). Codes and descriptions may be optional but are strongly recommended.
Duplicate records – Duplicates are surfaced during the dry run; choose whether to overwrite the existing canonical values or skip the incoming rows directly in the importer.

With these enhancements the canonical library is resilient enough to manage the full Abu Dhabi reference taxonomy and any future expansions.