API reference
tasks
detail_detection
async
detail_detection(responses_df: DataFrame, llm: LLM, question: str, batch_size: int = 20, prompt_template: str = DETAIL_DETECTION, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, concurrency: int = 10) -> tuple[pd.DataFrame, pd.DataFrame]
Identify responses that provide high-value detailed evidence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses to analyze. |
required |
llm
|
LLM
|
LLM instance to use for detail detection. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of responses to process in each batch. |
20
|
prompt_template
|
str
|
Prompt template string. |
DETAIL_DETECTION
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
tuple[pd.DataFrame, pd.DataFrame]: (processed results, unprocessable rows) |
Source code in src/themefinder/tasks.py
find_themes
async
find_themes(responses_df: DataFrame, llm: LLM, question: str, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, verbose: bool = True, concurrency: int = 10) -> dict[str, str | pd.DataFrame]
Process survey responses through a multi-stage theme analysis pipeline.
This pipeline performs sequential analysis steps: 1. Initial theme generation 2. Theme condensation (combining similar themes) 3. Theme refinement 4. Mapping responses to refined themes 5. Detail detection
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses |
required |
llm
|
LLM
|
LLM instance for text analysis |
required |
question
|
str
|
The survey question |
required |
system_prompt
|
str
|
System prompt to guide the LLM's behaviour. |
CONSULTATION_SYSTEM_PROMPT
|
verbose
|
bool
|
Whether to show information messages during processing. |
True
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
dict[str, str | DataFrame]
|
Dictionary containing results from each pipeline stage: - question: The survey question string - themes: DataFrame with the final themes output - mapping: DataFrame mapping responses to final themes - detailed_responses: DataFrame with detail detection results - unprocessables: DataFrame containing inputs that could not be processed |
Source code in src/themefinder/tasks.py
theme_clustering
async
theme_clustering(themes_df: DataFrame, llm: LLM, max_iterations: int = 5, target_themes: int = 10, significance_percentage: float = 10.0, return_all_themes: bool = False, system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> tuple[pd.DataFrame, pd.DataFrame]
Perform hierarchical clustering of themes using an agentic approach.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
themes_df
|
DataFrame
|
DataFrame containing themes. |
required |
llm
|
LLM
|
LLM instance for clustering. |
required |
max_iterations
|
int
|
Maximum number of clustering iterations. |
5
|
target_themes
|
int
|
Target number of themes to cluster down to. |
10
|
significance_percentage
|
float
|
Percentage threshold for selecting significant themes. |
10.0
|
return_all_themes
|
bool
|
If True, returns all clustered themes. |
False
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
Tuple of (clustered themes DataFrame, empty DataFrame). |
Source code in src/themefinder/tasks.py
theme_condensation
async
theme_condensation(themes_df: DataFrame, llm: LLM, question: str, batch_size: int = 75, prompt_template: str = THEME_CONDENSATION, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, concurrency: int = 10, **kwargs) -> tuple[pd.DataFrame, pd.DataFrame]
Condense and combine similar themes identified from survey responses.
When the theme count exceeds the batch size, a first pass condenses within each batch independently, then a second pass merges across batches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
themes_df
|
DataFrame
|
DataFrame containing the initial themes. |
required |
llm
|
LLM
|
LLM instance to use for theme condensation. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of themes to process in each batch. |
75
|
prompt_template
|
str
|
Prompt template string. |
THEME_CONDENSATION
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
tuple[pd.DataFrame, pd.DataFrame]: (processed results, unprocessable rows) |
Source code in src/themefinder/tasks.py
theme_generation
async
theme_generation(responses_df: DataFrame, llm: LLM, question: str, batch_size: int = 50, partition_key: str | None = None, prompt_template: str = THEME_GENERATION, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, concurrency: int = 10) -> tuple[pd.DataFrame, pd.DataFrame]
Generate themes from survey responses using an LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses. |
required |
llm
|
LLM
|
LLM instance to use for theme generation. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of responses to process in each batch. |
50
|
partition_key
|
str | None
|
Column name to use for batching related responses together. |
None
|
prompt_template
|
str
|
Prompt template string. |
THEME_GENERATION
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
tuple[pd.DataFrame, pd.DataFrame]: (processed results, unprocessable rows) |
Source code in src/themefinder/tasks.py
theme_mapping
async
theme_mapping(responses_df: DataFrame, llm: LLM, question: str, refined_themes_df: DataFrame, batch_size: int = 20, prompt_template: str = THEME_MAPPING, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, concurrency: int = 10) -> tuple[pd.DataFrame, pd.DataFrame]
Map survey responses to refined themes using an LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses. |
required |
llm
|
LLM
|
LLM instance to use for theme mapping. |
required |
question
|
str
|
The survey question. |
required |
refined_themes_df
|
DataFrame
|
DataFrame of refined themes. |
required |
batch_size
|
int
|
Number of responses to process in each batch. |
20
|
prompt_template
|
str
|
Prompt template string. |
THEME_MAPPING
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
tuple[pd.DataFrame, pd.DataFrame]: (processed results, unprocessable rows) |
Source code in src/themefinder/tasks.py
theme_refinement
async
theme_refinement(condensed_themes_df: DataFrame, llm: LLM, question: str, batch_size: int = 10000, prompt_template: str = THEME_REFINEMENT, system_prompt: str = CONSULTATION_SYSTEM_PROMPT, concurrency: int = 10) -> tuple[pd.DataFrame, pd.DataFrame]
Refine and standardise condensed themes using an LLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condensed_themes_df
|
DataFrame
|
DataFrame containing the condensed themes. |
required |
llm
|
LLM
|
LLM instance to use for theme refinement. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of themes to process in each batch. |
10000
|
prompt_template
|
str
|
Prompt template string. |
THEME_REFINEMENT
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. |
CONSULTATION_SYSTEM_PROMPT
|
concurrency
|
int
|
Number of concurrent API calls to make. |
10
|
Returns:
| Type | Description |
|---|---|
tuple[DataFrame, DataFrame]
|
tuple[pd.DataFrame, pd.DataFrame]: (processed results, unprocessable rows) |