API reference
find_themes
async
find_themes(responses_df: pd.DataFrame, llm: Runnable, question: str, system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> dict[str, pd.DataFrame]
Process survey responses through a multi-stage theme analysis pipeline.
This pipeline performs sequential analysis steps: 1. Sentiment analysis of responses 2. Initial theme generation 3. Theme condensation (combining similar themes) 4. Theme refinement 5. Mapping responses to refined themes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses |
required |
llm
|
Runnable
|
Language model instance for text analysis |
required |
question
|
str
|
The survey question |
required |
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
dict[str, pd.DataFrame]: Dictionary containing results from each pipeline stage: - question: The survey question - sentiment: DataFrame with sentiment analysis results - topics: DataFrame with initial generated themes - condensed_topics: DataFrame with combined similar themes - refined_topics: DataFrame with refined theme definitions - mapping: DataFrame mapping responses to final themes |
Source code in src/themefinder/core.py
sentiment_analysis
async
sentiment_analysis(responses_df: pd.DataFrame, llm: Runnable, question: str, batch_size: int = 10, prompt_template: str | Path | PromptTemplate = 'sentiment_analysis', system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> pd.DataFrame
Perform sentiment analysis on survey responses using an LLM.
This function processes survey responses in batches to analyze their sentiment using a language model. It maintains response integrity by checking response IDs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses to analyze. Must contain 'response_id' and 'response' columns. |
required |
llm
|
Runnable
|
Language model instance to use for sentiment analysis. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of responses to process in each batch. Defaults to 10. |
10
|
prompt_template
|
str | Path | PromptTemplate
|
Template for structuring the prompt to the LLM. Can be a string identifier, path to template file, or PromptTemplate instance. Defaults to "sentiment_analysis". |
'sentiment_analysis'
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame containing the original responses enriched with sentiment analysis results. |
Note
The function uses response_id_integrity_check to ensure responses maintain their original order and association after processing.
Source code in src/themefinder/core.py
theme_generation
async
theme_generation(responses_df: pd.DataFrame, llm: Runnable, question: str, batch_size: int = 50, partition_key: str | None = 'position', prompt_template: str | Path | PromptTemplate = 'theme_generation', system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> pd.DataFrame
Generate themes from survey responses using an LLM.
This function processes batches of survey responses to identify common themes or topics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses. Must include 'response_id' and 'response' columns. |
required |
llm
|
Runnable
|
Language model instance to use for theme generation. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of responses to process in each batch. Defaults to 50. |
50
|
partition_key
|
str | None
|
Column name to use for batching related responses together. Defaults to "position" for sentiment-enriched responses, but can be set to None for sequential batching or another column name for different grouping strategies. |
'position'
|
prompt_template
|
str | Path | PromptTemplate
|
Template for structuring the prompt to the LLM. Can be a string identifier, path to template file, or PromptTemplate instance. Defaults to "theme_generation". |
'theme_generation'
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame containing identified themes and their associated metadata. |
Source code in src/themefinder/core.py
theme_condensation
async
theme_condensation(themes_df: pd.DataFrame, llm: Runnable, question: str, batch_size: int = 10000, prompt_template: str | Path | PromptTemplate = 'theme_condensation', system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> pd.DataFrame
Condense and combine similar themes identified from survey responses.
This function processes the initially identified themes to combine similar or overlapping topics into more cohesive, broader categories using an LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
themes_df
|
DataFrame
|
DataFrame containing the initial themes identified from survey responses. |
required |
llm
|
Runnable
|
Language model instance to use for theme condensation. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of themes to process in each batch. Defaults to 10000. |
10000
|
prompt_template
|
str | Path | PromptTemplate
|
Template for structuring the prompt to the LLM. Can be a string identifier, path to template file, or PromptTemplate instance. Defaults to "theme_condensation". |
'theme_condensation'
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame containing the condensed themes, where similar topics have been combined into broader categories. |
Source code in src/themefinder/core.py
theme_refinement
async
theme_refinement(condensed_themes_df: pd.DataFrame, llm: Runnable, question: str, batch_size: int = 10000, prompt_template: str | Path | PromptTemplate = 'theme_refinement', system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> pd.DataFrame
Refine and standardize condensed themes using an LLM.
This function processes previously condensed themes to create clear, standardized theme descriptions. It also transforms the output format for improved readability by transposing the results into a single-row DataFrame where columns represent individual themes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
condensed_themes
|
DataFrame
|
DataFrame containing the condensed themes from the previous pipeline stage. |
required |
llm
|
Runnable
|
Language model instance to use for theme refinement. |
required |
question
|
str
|
The survey question. |
required |
batch_size
|
int
|
Number of themes to process in each batch. Defaults to 10000. |
10000
|
prompt_template
|
str | Path | PromptTemplate
|
Template for structuring the prompt to the LLM. Can be a string identifier, path to template file, or PromptTemplate instance. Defaults to "topic_refinement". |
'theme_refinement'
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A single-row DataFrame where: - Each column represents a unique theme (identified by topic_id) - The values contain the refined theme descriptions - The format is optimized for subsequent theme mapping operations |
Note
The function adds sequential response_ids to the input DataFrame and transposes the output for improved readability and easier downstream processing.
Source code in src/themefinder/core.py
theme_mapping
async
theme_mapping(responses_df: pd.DataFrame, llm: Runnable, question: str, refined_themes_df: pd.DataFrame, batch_size: int = 20, prompt_template: str | Path | PromptTemplate = 'theme_mapping', system_prompt: str = CONSULTATION_SYSTEM_PROMPT) -> pd.DataFrame
Map survey responses to refined themes using an LLM.
This function analyzes each survey response and determines which of the refined themes best matches its content. Multiple themes can be assigned to a single response.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
responses_df
|
DataFrame
|
DataFrame containing survey responses. Must include 'response_id' and 'response' columns. |
required |
llm
|
Runnable
|
Language model instance to use for theme mapping. |
required |
question
|
str
|
The survey question. |
required |
refined_themes_df
|
DataFrame
|
Single-row DataFrame where each column represents a theme (from theme_refinement stage). |
required |
batch_size
|
int
|
Number of responses to process in each batch. Defaults to 20. |
20
|
prompt_template
|
str | Path | PromptTemplate
|
Template for structuring the prompt to the LLM. Can be a string identifier, path to template file, or PromptTemplate instance. Defaults to "theme_mapping". |
'theme_mapping'
|
system_prompt
|
str
|
System prompt to guide the LLM's behavior. Defaults to CONSULTATION_SYSTEM_PROMPT. |
CONSULTATION_SYSTEM_PROMPT
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame containing the original responses enriched with theme mapping results, ensuring all responses are mapped through ID integrity checks. |