Testing Guide¶

This guide covers testing strategies, test execution, and best practices for RefData Hub.

Test Overview¶

RefData Hub uses two testing frameworks:

Backend: pytest for Python unit and integration tests
Frontend: Vitest for React component and integration tests

Backend Testing (Pytest)¶

Prerequisites¶

cd api

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies with test extras
pip install -r requirements.txt
pip install pytest pytest-cov pytest-asyncio httpx

Running Tests¶

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run specific test file
pytest tests/test_matcher.py

# Run specific test
pytest tests/test_matcher.py::test_rank_with_embeddings

# Run with coverage
pytest --cov=app --cov-report=html

# Run integration tests only
pytest -m integration

Test Structure¶

tests/
├── conftest.py              # Pytest configuration and fixtures
├── test_api.py              # API integration tests
├── test_matcher.py           # Matcher unit tests
├── test_config.py            # Configuration tests
├── test_database_migrations.py
├── test_source_connections.py
├── test_value_mapping_io.py
└── test_targetdb_seed.py

Fixtures¶

File: tests/conftest.py

import pytest
from fastapi.testclient import TestClient
from sqlalchemy import create_engine
from sqlmodel import Session, SQLModel

from api.app.main import app
from api.app.database import get_session

# Test database (SQLite in-memory)
@pytest.fixture
def test_db():
    engine = create_engine("sqlite:///:memory:")
    SQLModel.metadata.create_all(engine)
    with Session(engine) as session:
        yield session

# Override dependency
@pytest.fixture
def client(test_db):
    def override_get_session():
        return test_db

    app.dependency_overrides[get_session] = override_get_session
    with TestClient(app) as test_client:
        yield test_client
    app.dependency_overrides.clear()

Writing Tests¶

Unit Tests¶

Example: tests/test_matcher.py

import pytest
from api.app.matcher import SemanticMatcher
from api.app.models import CanonicalValue, SystemConfig

def test_rank_with_embeddings():
    """Test TF-IDF embedding matching."""
    config = SystemConfig(
        matcher_backend="embedding",
        top_k=5
    )

    canonical_values = [
        CanonicalValue(
            id=1,
            dimension="marital_status",
            canonical_label="Single",
            description="Never married"
        ),
        CanonicalValue(
            id=2,
            dimension="marital_status",
            canonical_label="Married",
            description="Currently married"
        )
    ]

    matcher = SemanticMatcher(config, canonical_values)
    matches = matcher.rank("unmarried")

    assert len(matches) > 0
    assert matches[0].score > 0.5
    assert matches[0].canonical_id == 1

Integration Tests¶

Example: tests/test_api.py

def test_create_canonical_value(client):
    """Test creating a canonical value via API."""
    response = client.post(
        "/api/reference/canonical",
        json={
            "dimension": "marital_status",
            "canonical_label": "Divorced"
        }
    )

    assert response.status_code == 200
    data = response.json()
    assert data["dimension"] == "marital_status"
    assert data["canonical_label"] == "Divorced"
    assert "id" in data

def test_get_canonical_values(client, test_db):
    """Test retrieving canonical values."""
    # Seed test data
    from api.app.models import CanonicalValue
    test_db.add(CanonicalValue(
        dimension="education",
        canonical_label="Bachelor"
    ))
    test_db.commit()

    # Fetch from API
    response = client.get("/api/reference/canonical")

    assert response.status_code == 200
    data = response.json()
    assert len(data) >= 1
    assert any(v["canonical_label"] == "Bachelor" for v in data)

Test Markers¶

Use markers to categorize tests:

import pytest

@pytest.mark.unit
def test_matcher_ranking():
    pass

@pytest.mark.integration
def test_api_endpoint():
    pass

@pytest.mark.slow
def test_large_import():
    pass

Run specific markers:

pytest -m unit        # Only unit tests
pytest -m integration # Only integration tests
pytest -m "not slow" # Skip slow tests

Async Tests¶

import pytest

@pytest.mark.asyncio
async def test_async_operation():
    result = await some_async_function()
    assert result is not None

Frontend Testing (Vitest)¶

Prerequisites¶

cd reviewer-ui

# Install dependencies
npm install

Running Tests¶

# Run all tests
npm test

# Run in watch mode
npm test -- --watch

# Run with coverage
npm test -- --coverage

# Run specific file
npm test -- CanonicalLibraryPage.test.tsx

# Run UI mode (helps debug)
npm test -- --ui

Test Structure¶

reviewer-ui/src/
├── components/
│   └── ui.test.tsx        # UI component tests
├── pages/
│   ├── CanonicalLibraryPage.test.tsx
│   ├── ConnectionsPage.test.tsx
│   ├── FieldMappingsPage.test.tsx
│   ├── MatchInsightsPage.test.tsx
│   └── ...
├── api.test.ts              # API client tests
├── App.test.tsx            # Main app tests
└── indexHtml.test.ts        # Index page tests

Writing Component Tests¶

Example: reviewer-ui/src/components/ui.test.tsx

import { describe, it, expect } from 'vitest';
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { Form, Button } from './ui';

describe('UI Components', () => {
  describe('Button', () => {
    it('renders button with text', () => {
      render(<Button>Click me</Button>);
      expect(screen.getByText('Click me')).toBeInTheDocument();
    });

    it('calls onClick handler', async () => {
      const handleClick = vi.fn();
      render(<Button onClick={handleClick}>Click me</Button>);

      const button = screen.getByText('Click me');
      await userEvent.click(button);

      expect(handleClick).toHaveBeenCalledTimes(1);
    });
  });

  describe('Form', () => {
    it('renders form with fields', () => {
      render(
        <Form onSubmit={() => {}}>
          <Form.Group>
            <Form.Label>Name</Form.Label>
            <Form.Control type="text" name="name" />
          </Form.Group>
        </Form>
      );

      expect(screen.getByLabelText('Name')).toBeInTheDocument();
    });
  });
});

Writing Page Tests¶

Example: reviewer-ui/src/pages/CanonicalLibraryPage.test.tsx

import { describe, it, expect, beforeEach, vi } from 'vitest';
import { render, screen, waitFor } from '@testing-library/react';
import { CanonicalLibraryPage } from './CanonicalLibraryPage';
import * as api from '../api';

// Mock API calls
vi.mock('../api', () => ({
  fetchCanonicalValues: vi.fn(),
  createCanonicalValue: vi.fn(),
}));

describe('CanonicalLibraryPage', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  it('displays canonical values after loading', async () => {
    const mockValues = [
      { id: 1, dimension: 'status', canonical_label: 'Active' },
      { id: 2, dimension: 'status', canonical_label: 'Inactive' },
    ];

    vi.mocked(api.fetchCanonicalValues).mockResolvedValue(mockValues);

    render(<CanonicalLibraryPage />);

    await waitFor(() => {
      expect(screen.getByText('Active')).toBeInTheDocument();
      expect(screen.getByText('Inactive')).toBeInTheDocument();
    });
  });

  it('shows loading state initially', () => {
    vi.mocked(api.fetchCanonicalValues).mockReturnValue(new Promise(() => {}));

    render(<CanonicalLibraryPage />);

    expect(screen.getByText(/loading/i)).toBeInTheDocument();
  });

  it('handles error state', async () => {
    vi.mocked(api.fetchCanonicalValues).mockRejectedValue(
      new Error('Failed to fetch')
    );

    render(<CanonicalLibraryPage />);

    await waitFor(() => {
      expect(screen.getByText(/error/i)).toBeInTheDocument();
    });
  });
});

Writing API Client Tests¶

Example: reviewer-ui/src/api.test.ts

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { fetchCanonicalValues, createCanonicalValue } from './api';

describe('API Client', () => {
  beforeEach(() => {
    global.fetch = vi.fn();
  });

  it('fetches canonical values', async () => {
    const mockResponse = [
      { id: 1, dimension: 'status', canonical_label: 'Active' },
    ];

    vi.mocked(fetch).mockResolvedValue({
      ok: true,
      json: async () => mockResponse,
    } as Response);

    const values = await fetchCanonicalValues();

    expect(values).toEqual(mockResponse);
    expect(fetch).toHaveBeenCalledWith(
      expect.stringContaining('/api/reference/canonical')
    );
  });

  it('creates canonical value', async () => {
    const new_value = {
      dimension: 'status',
      canonical_label: 'Pending',
    };

    vi.mocked(fetch).mockResolvedValue({
      ok: true,
      json: async () => ({ id: 2, ...new_value }),
    } as Response);

    const result = await createCanonicalValue(new_value);

    expect(result).toEqual({ id: 2, ...new_value });
    expect(fetch).toHaveBeenCalledWith(
      expect.stringContaining('/api/reference/canonical'),
      expect.objectContaining({
        method: 'POST',
        headers: expect.objectContaining({
          'Content-Type': 'application/json',
        }),
        body: expect.stringContaining(JSON.stringify(new_value)),
      })
    );
  });
});

Testing User Interactions¶

import { render, screen, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { MyForm } from './MyForm';

describe('MyForm', () => {
  it('submits form with entered data', async () => {
    const handleSubmit = vi.fn();
    render(<MyForm onSubmit={handleSubmit} />);

    // Type into input
    const input = screen.getByLabelText('Name');
    await userEvent.type(input, 'John Doe');

    // Click submit
    const button = screen.getByText('Submit');
    await userEvent.click(button);

    // Verify submission
    await waitFor(() => {
      expect(handleSubmit).toHaveBeenCalledWith({ name: 'John Doe' });
    });
  });
});

Integration Tests¶

End-to-End Test Example¶

File: tests/test_e2e.py

def test_full_mapping_workflow(client, test_db):
    """Test complete workflow from connection to mapping."""

    # 1. Create dimension
    dimension_response = client.post(
        "/api/reference/dimensions",
        json={"code": "status", "label": "Status"}
    )
    assert dimension_response.status_code == 200

    # 2. Create canonical values
    client.post("/api/reference/canonical", json={
        "dimension": "status",
        "canonical_label": "Active"
    })
    client.post("/api/reference/canonical", json={
        "dimension": "status",
        "canonical_label": "Inactive"
    })

    # 3. Create source connection
    connection_response = client.post(
        "/api/source/connections",
        json={
            "name": "Test DB",
            "db_type": "postgres",
            "host": "localhost",
            "port": 5432,
            "database": "test",
            "username": "user",
            "password": "pass"
        }
    )
    assert connection_response.status_code == 200
    connection_id = connection_response.json()["id"]

    # 4. Create field mapping
    mapping_response = client.post(
        f"/api/source/connections/{connection_id}/mappings",
        json={
            "source_table": "customers",
            "source_field": "status_raw",
            "ref_dimension": "status"
        }
    )
    assert mapping_response.status_code == 200

    # 5. Ingest samples
    client.post(
        f"/api/source/connections/{connection_id}/samples",
        json={
            "source_table": "customers",
            "source_field": "status_raw",
            "values": [
                {"raw_value": "active", "count": 100},
                {"raw_value": "inactive", "count": 50}
            ]
        }
    )

    # 6. Get match statistics
    stats_response = client.get(
        f"/api/source/connections/{connection_id}/match-stats"
    )
    assert stats_response.status_code == 200
    stats = stats_response.json()

    assert len(stats) > 0
    assert stats[0]["match_rate"] > 0

Test Coverage¶

Backend Coverage¶

# Run tests with coverage
pytest --cov=app --cov-report=html

# View report
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux

Target Coverage: - Unit tests: > 80% - Integration tests: > 60% - Overall: > 70%

Frontend Coverage¶

# Run tests with coverage
npm test -- --coverage

# View report
open coverage/index.html  # macOS

Mocking and Stubbing¶

Backend Mocking¶

import pytest
from unittest.mock import patch, MagicMock

def test_with_mocked_llm():
    """Test matcher with mocked LLM."""
    with patch('api.app.matcher.openai.ChatCompletion.create') as mock_llm:
        mock_llm.return_value = {
            "choices": [{
                "message": {
                    "content": '[{"id": 1, "score": 0.9}]'
                }
            }]
        }

        # Run code that uses LLM
        result = some_function_that_uses_llm()

        # Verify LLM was called
        mock_llm.assert_called_once()
        assert result is not None

Frontend Mocking¶

import { render, screen } from '@testing-library/react';
import { vi } from 'vitest';
import { MyComponent } from './MyComponent';

describe('MyComponent', () => {
  it('uses mocked API', () => {
    const mockFn = vi.fn();
    mockFn.mockResolvedValue({ data: 'test' });

    render(<MyComponent fetchData={mockFn} />);

    expect(mockFn).toHaveBeenCalled();
  });
});

Test Best Practices¶

Backend¶

Isolate Tests: Each test should be independent
Use Fixtures: Reuse test setup with pytest fixtures
Mock External Dependencies: Don't make real API calls
Test Edge Cases: Empty inputs, null values, errors
Use Descriptive Names: test_create_canonical_value_success

Arrange-Act-Assert Pattern:

def test_example():
    # Arrange
    input = "test"

    # Act
    result = process(input)

    # Assert
    assert result == expected

Frontend¶

Test User Behavior, Not Implementation: Focus on what users see
Use waitFor for Async: Wait for elements to appear
Mock API Calls: Don't make real network requests
Test Accessibility: Use getByRole, getByLabel
Avoid Implementation Details: Don't test React internals

Continuous Integration¶

GitHub Actions Example¶

File: .github/workflows/test.yml

name: Tests

on: [push, pull_request]

jobs:
  backend:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'

    - name: Install dependencies
      run: |
        cd api
        pip install -r requirements.txt
        pip install pytest pytest-cov

    - name: Run tests
      run: |
        cd api
        pytest --cov=app --cov-report=xml

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./api/coverage.xml

  frontend:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Setup Node.js
      uses: actions/setup-node@v3
      with:
        node-version: '18'

    - name: Install dependencies
      run: |
        cd reviewer-ui
        npm ci

    - name: Run tests
      run: |
        cd reviewer-ui
        npm test -- --coverage

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./reviewer-ui/coverage/coverage-final.json

Troubleshooting¶

Tests Failing with Import Errors¶

Issue: ModuleNotFoundError: No module named 'api'

Solution:

# Add project root to Python path
export PYTHONPATH="${PYTHONPATH}:/path/to/refdata-hub"

# Or use pytest.ini configuration
# [pytest]
# pythonpath = .

Frontend Tests Failing¶

Issue: Tests fail in CI but pass locally

Solutions: 1. Check for timezone differences (use fake timers) 2. Ensure proper cleanup between tests 3. Check for race conditions in async tests 4. Use waitFor instead of find for dynamic content

Slow Tests¶

Solutions: 1. Mark slow tests with @pytest.mark.slow 2. Skip in CI with pytest -m "not slow" 3. Use in-memory database instead of PostgreSQL 4. Mock expensive operations (LLM calls)

Testing Guide¶

Test Overview¶

Backend Testing (Pytest)¶

Prerequisites¶

Running Tests¶

Test Structure¶

Fixtures¶

Writing Tests¶

Unit Tests¶

Integration Tests¶

Test Markers¶

Async Tests¶

Frontend Testing (Vitest)¶

Prerequisites¶

Running Tests¶

Test Structure¶

Writing Component Tests¶

Writing Page Tests¶

Writing API Client Tests¶

Testing User Interactions¶

Integration Tests¶

End-to-End Test Example¶

Test Coverage¶

Backend Coverage¶

Frontend Coverage¶

Mocking and Stubbing¶

Backend Mocking¶

Frontend Mocking¶

Test Best Practices¶

Backend¶

Frontend¶

Continuous Integration¶

GitHub Actions Example¶

Troubleshooting¶

Tests Failing with Import Errors¶

Frontend Tests Failing¶

Slow Tests¶

Additional Resources¶