Implementation Details

uwotm8.convert.convert_american_to_british_spelling ¶

convert_american_to_british_spelling(text, strict=False)

Convert American English spelling to British English spelling.

PARAMETER	DESCRIPTION
`text`	TYPE: `str`
`strict`	TYPE: `bool` DEFAULT: `False`

PARAMETER	DESCRIPTION
`text`	The text to convert. TYPE: `str`
`strict`	Whether to raise an exception if a word cannot be converted. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`Any`	The text with American English spelling converted to British English spelling.

Source code in uwotm8/convert.py

def convert_american_to_british_spelling(  # noqa: C901
    text: str, strict: bool = False
) -> Any:
    """
    Convert American English spelling to British English spelling.

    Args:
        text: The text to convert.
        strict: Whether to raise an exception if a word cannot be converted.

    Returns:
        The text with American English spelling converted to British English spelling.
    """
    if not text.strip():
        return text
    try:

        def should_skip_word(word: str, pre: str, post: str, match_start: int, match_end: int) -> bool:
            """Check if the word should be skipped for conversion."""
            # Skip if within code blocks
            if "`" in pre or "`" in post:
                return True

            # Skip if word is in the ignore_list
            if word.lower() in CONVERSION_IGNORE_LIST:
                return True

            # Check for hyphenated terms (e.g., "3-color", "x-coordinate")
            # If the word is part of a hyphenated term, we should skip it
            if "-" in pre and pre.rstrip().endswith("-"):
                return True

            # Check for URL/URI context
            line_start = text.rfind("\n", 0, match_start)
            if line_start == -1:
                line_start = 0
            else:
                line_start += 1

            line_end = text.find("\n", match_end)
            if line_end == -1:
                line_end = len(text)

            line_context = text[line_start:line_end]

            # Skip if word appears to be in a URL/URI
            return "://" in line_context or "www." in line_context

        def preserve_capitalization(original: str, replacement: str) -> str:
            """Preserve the capitalization from the original word in the replacement."""
            if original.isupper():
                return replacement.upper()
            elif original.istitle():
                return replacement.title()
            return replacement

        def replace_word(match: re.Match) -> Any:
            """
            Replace a word with its British English spelling.

            Args:
                match: The match object.

            Returns:
                The word with its spelling converted to British English.
            """
            # The first group contains any leading punctuation/spaces
            # The second group contains the word
            # The third group contains any trailing punctuation/spaces
            pre, word, post = match.groups()

            if should_skip_word(word, pre, post, match.start(), match.end()):
                return match.group(0)

            if american_spelling_exists(word.lower()):
                try:
                    british = get_british_spelling(word.lower())
                    british = preserve_capitalization(word, british)
                    return pre + british + post
                except Exception:
                    if strict:
                        raise
            return match.group(0)

        # Match any word surrounded by non-letter characters
        # Group 1: Leading non-letters (including empty)
        # Group 2: The word itself (only letters)
        # Group 3: Trailing non-letters (including empty)
        pattern = r"([^a-zA-Z]*?)([a-zA-Z]+)([^a-zA-Z]*?)"
        return re.sub(pattern, replace_word, text)
    except Exception:
        if strict:
            raise
        return text

Word Context Detection¶

The convert_american_to_british_spelling function includes special handling for various text contexts:

Hyphenated Terms¶

Words that are part of hyphenated terms are preserved in their original form. For example:

"3-color" remains "3-color" (not converted to "3-colour")
"x-coordinate" remains "x-coordinate" (not converted to "x-coordinate")
"multi-colored" remains "multi-colored" (not converted to "multi-coloured")

This is useful for preserving technical terminology and compound adjectives where conversion might be inappropriate.

Code Blocks¶

Words within code blocks (surrounded by backticks) are not converted, preserving code syntax and variable names.

URLs and URIs¶

Words that appear in lines containing URLs or URIs (identified by "://" or "www.") are not converted to avoid breaking links.

Conversion Ignore List¶

An ignore list of words that should not be converted is maintained, including technical terms that have different meanings in different contexts:

"program" vs "programme" (in computing contexts)
"disk" vs "disc" (in computing contexts)
"analog" vs "analogue" (in technical contexts)
And others

Capitalization Preservation¶

The function preserves the capitalization pattern of the original word:

ALL CAPS words remain ALL CAPS
Title Case words remain Title Case
lowercase words remain lowercase