API Reference#

Readability#

`ReadabilityAnalyzer`#

Class for analyzing text readability with math-aware normalization.

from mathipy import ReadabilityAnalyzer

analyzer = ReadabilityAnalyzer()
result = analyzer.analyze("Solve for x: 2x + 5 = 15")

Returns a dictionary with:

flesch_reading_ease: Flesch Reading Ease score (0–100)
flesch_kincaid_grade: Flesch-Kincaid grade level
gunning_fog: Gunning Fog index
smog_index: SMOG index
automated_readability_index: ARI score
coleman_liau_index: Coleman-Liau index
linsear_write_formula: Linsear Write formula
dale_chall_readability: Dale-Chall readability score
average_grade_level: Average of FK, Fog, and SMOG
low_confidence: True if text is shorter than 20 words

Math Content#

`MathContentAnalyzer`#

Class for math content analysis and CCSSM domain classification.

from mathipy import MathContentAnalyzer

analyzer = MathContentAnalyzer()
result = analyzer.analyze("What is the area of a triangle with base 6 and height 4?")

Returns a dictionary with:

pattern_matches: Detected math patterns (equations, fractions, etc.)
symbol_counts: Counts of math symbols by type
total_math_symbols: Total math symbols found
numbers: Extracted numbers with count, range, and properties
vocabulary: Matched math terms and counts
domain_classification: Primary domain, confidence, and scores
math_density: Ratio of math patterns to word count

Domain categories: arithmetic, algebra, geometry, statistics, calculus, fractions

Cognitive Load#

`CognitiveLoadEstimator`#

Class for estimating cognitive load components.

from mathipy import CognitiveLoadEstimator

estimator = CognitiveLoadEstimator()
result = estimator.estimate(text, readability_grade=5.2, math_terms=["equation", "solve"])

Returns a dictionary with:

numeric_elements: Count of numbers in text
variable_count: Count of single-letter variables
operation_count: Count of math operations (+, -, *, /, ^, =, <, >)
word_count: Total words in text
sentence_count: Total sentences in text
math_term_count: Count of CCSSM-aligned math keywords
element_density: Ratio of (numeric elements + variables) to word count
avg_sentence_length: Average words per sentence

Visual#

`VisualFeatureExtractor`#

Class for extracting complexity features from assessment images.

Requires: pip install mathipy[vision]

from mathipy import VisualFeatureExtractor

extractor = VisualFeatureExtractor()
features = extractor.extract("item_image.png")

Accepts a file path, Path object, or numpy array. Returns a dictionary with:

dimensions: Width, height, aspect ratio, channels
pixel_statistics: Mean, std, min, max, median, contrast
edge_metrics: Canny edge ratio, Sobel/Laplacian statistics
structural_elements: Detected lines, circles, shapes (triangles, rectangles, etc.)
frequency_domain: Low/mid/high frequency energy ratios
complexity_score: Summary with edge_ratio, total_shapes, and high_freq_ratio

OCR#

`MultimodalOCR`#

Class for extracting text and math from images using vision LLMs.

Requires: pip install mathipy[ocr] and a GEMINI_API_KEY or OPENAI_API_KEY.

from mathipy import MultimodalOCR

ocr = MultimodalOCR(provider="gemini")
result = ocr.extract("item_image.png")

Parameters:

provider: "gemini" or "openai"
model: Model name (defaults to gemini-2.5-flash or gpt-4o)
api_key: API key (or set via .env file)

Accepts image path, URL, bytes, PDF, DOCX, or text file. Returns a dictionary with:

content_type: text_only, image_only, or mixed
full_text: All extracted text
image_description: Visual content description
question_text: Main question/problem statement
math_expressions: List of LaTeX expressions
answer_choices: Dictionary of answer choices
extraction_confidence: Confidence score (0–1)

Visual Model Classification#

`VisualModelClassifier`#

Classify which of the 20 visual model types appear in an assessment image.

Requires: pip install mathipy[ocr] and a GEMINI_API_KEY or OPENAI_API_KEY.

from mathipy import VisualModelClassifier

classifier = VisualModelClassifier(provider="gemini")
result = classifier.classify("item_image.png")

Parameters:

provider: "gemini" or "openai"
model: Model name (defaults to gemini-2.5-flash or gpt-4o)
api_key: API key (or set via .env file)

Returns a dictionary with a boolean per model type, "primary" (str), and "model_count" (int).

`visual_model_groups`#

Dictionary mapping each of the 20 visual model types to a broader category.

from mathipy import visual_model_groups
print(visual_model_groups["number_line"])  # "number_quantity"

`visual_model_info`#

List of tuples with (model_name, group, CCSSM_domains, grade_band) for each visual model type.

from mathipy import visual_model_info
for name, group, domains, grades in visual_model_info[:3]:
    print(f"{name}: {group}, {domains}, {grades}")

Item-Level Extraction#

`ItemFeatureExtractor`#

Run all four analyzers (readability, math content, cognitive load, visual) and return a flat prefixed dictionary.

from mathipy import ItemFeatureExtractor

extractor = ItemFeatureExtractor()
result = extractor.extract(
    text="Solve for x: 2x + 5 = 15",
    image_paths=["item_image.png"],
)

Returns a flat dict with readability_*, math_*, cognitive_*, and visual_* keys.

Utilities#

`safe_get`#

Retrieve a nested value from a dict, returning a default on any miss.

from mathipy import safe_get

data = {"a": {"b": {"c": 42}}}
safe_get(data, "a", "b", "c")          # 42
safe_get(data, "a", "x", default=0)    # 0

`compute_interrater_reliability`#

Compute agreement and Cohen’s kappa between two raters.

from mathipy import compute_interrater_reliability

result = compute_interrater_reliability([1, 2, 3, 1], [1, 2, 1, 1])
print(result)  # {"agreement": 0.75, "kappa": 0.58, "n": 4}

Sample Data#

from mathipy.data import get_sample_csv, get_sample_image, list_sample_images

csv_path = get_sample_csv()
images = list_sample_images()
image_path = get_sample_image("2024-4M10 #2")

API Reference#

Readability#

ReadabilityAnalyzer#

Math Content#

MathContentAnalyzer#

Cognitive Load#

CognitiveLoadEstimator#

Visual#

VisualFeatureExtractor#

OCR#

MultimodalOCR#

Visual Model Classification#

VisualModelClassifier#

visual_model_groups#

visual_model_info#

Item-Level Extraction#

ItemFeatureExtractor#

Utilities#

safe_get#

compute_interrater_reliability#

Sample Data#

`ReadabilityAnalyzer`#

`MathContentAnalyzer`#

`CognitiveLoadEstimator`#

`VisualFeatureExtractor`#

`MultimodalOCR`#

`VisualModelClassifier`#

`visual_model_groups`#

`visual_model_info`#

`ItemFeatureExtractor`#

`safe_get`#

`compute_interrater_reliability`#