Implementation Details
uwotm8.convert.convert_american_to_british_spelling
convert_american_to_british_spelling(text, strict=False)
Convert American English spelling to British English spelling.
PARAMETER | DESCRIPTION |
text | TYPE: str |
strict | TYPE: bool DEFAULT: False |
PARAMETER | DESCRIPTION |
text | TYPE: str |
strict | Whether to raise an exception if a word cannot be converted. TYPE: bool DEFAULT: False |
RETURNS | DESCRIPTION |
Any | The text with American English spelling converted to British English spelling. |
Source code in uwotm8/convert.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125 | def convert_american_to_british_spelling( # noqa: C901
text: str, strict: bool = False
) -> Any:
"""
Convert American English spelling to British English spelling.
Args:
text: The text to convert.
strict: Whether to raise an exception if a word cannot be converted.
Returns:
The text with American English spelling converted to British English spelling.
"""
if not text.strip():
return text
try:
def should_skip_word(word: str, pre: str, post: str, match_start: int, match_end: int) -> bool:
"""Check if the word should be skipped for conversion."""
# Skip if within code blocks
if "`" in pre or "`" in post:
return True
# Skip if word is in the ignore_list
if word.lower() in CONVERSION_IGNORE_LIST:
return True
# Check for hyphenated terms (e.g., "3-color", "x-coordinate")
# If the word is part of a hyphenated term, we should skip it
if "-" in pre and pre.rstrip().endswith("-"):
return True
# Check for URL/URI context
line_start = text.rfind("\n", 0, match_start)
if line_start == -1:
line_start = 0
else:
line_start += 1
line_end = text.find("\n", match_end)
if line_end == -1:
line_end = len(text)
line_context = text[line_start:line_end]
# Skip if word appears to be in a URL/URI
return "://" in line_context or "www." in line_context
def preserve_capitalization(original: str, replacement: str) -> str:
"""Preserve the capitalization from the original word in the replacement."""
if original.isupper():
return replacement.upper()
elif original.istitle():
return replacement.title()
return replacement
def replace_word(match: re.Match) -> Any:
"""
Replace a word with its British English spelling.
Args:
match: The match object.
Returns:
The word with its spelling converted to British English.
"""
# The first group contains any leading punctuation/spaces
# The second group contains the word
# The third group contains any trailing punctuation/spaces
pre, word, post = match.groups()
if should_skip_word(word, pre, post, match.start(), match.end()):
return match.group(0)
if american_spelling_exists(word.lower()):
try:
british = get_british_spelling(word.lower())
british = preserve_capitalization(word, british)
return pre + british + post
except Exception:
if strict:
raise
return match.group(0)
# Match any word surrounded by non-letter characters
# Group 1: Leading non-letters (including empty)
# Group 2: The word itself (only letters)
# Group 3: Trailing non-letters (including empty)
pattern = r"([^a-zA-Z]*?)([a-zA-Z]+)([^a-zA-Z]*?)"
return re.sub(pattern, replace_word, text)
except Exception:
if strict:
raise
return text
|
Word Context Detection
The convert_american_to_british_spelling
function includes special handling for various text contexts:
Hyphenated Terms
Words that are part of hyphenated terms are preserved in their original form. For example:
- "3-color" remains "3-color" (not converted to "3-colour")
- "x-coordinate" remains "x-coordinate" (not converted to "x-coordinate")
- "multi-colored" remains "multi-colored" (not converted to "multi-coloured")
This is useful for preserving technical terminology and compound adjectives where conversion might be inappropriate.
Code Blocks
Words within code blocks (surrounded by backticks) are not converted, preserving code syntax and variable names.
URLs and URIs
Words that appear in lines containing URLs or URIs (identified by "://" or "www.") are not converted to avoid breaking links.
Conversion Ignore List
An ignore list of words that should not be converted is maintained, including technical terms that have different meanings in different contexts:
- "program" vs "programme" (in computing contexts)
- "disk" vs "disc" (in computing contexts)
- "analog" vs "analogue" (in technical contexts)
- And others
Capitalization Preservation
The function preserves the capitalization pattern of the original word:
- ALL CAPS words remain ALL CAPS
- Title Case words remain Title Case
- lowercase words remain lowercase