API Reference#
Readability#
ReadabilityAnalyzer#
Class for analyzing text readability with math-aware normalization.
from mathipy import ReadabilityAnalyzer
analyzer = ReadabilityAnalyzer()
result = analyzer.analyze("Solve for x: 2x + 5 = 15")
Returns a dictionary with:
- flesch_reading_ease
Flesch Reading Ease score (0–100)
- flesch_kincaid_grade
Flesch-Kincaid grade level
- gunning_fog
Gunning Fog index
- smog_index
SMOG index
- automated_readability_index
ARI score
- coleman_liau_index
Coleman-Liau index
- linsear_write_formula
Linsear Write formula
- dale_chall_readability
Dale-Chall readability score
- average_grade_level
Average of FK, Fog, and SMOG
- low_confidence
Trueif text is shorter than 20 words
Math Content#
MathContentAnalyzer#
Class for math content analysis and CCSSM domain classification.
from mathipy import MathContentAnalyzer
analyzer = MathContentAnalyzer()
result = analyzer.analyze("What is the area of a triangle with base 6 and height 4?")
Returns a dictionary with:
- pattern_matches
Detected math patterns (equations, fractions, etc.)
- symbol_counts
Counts of math symbols by type
- total_math_symbols
Total math symbols found
- numbers
Extracted numbers with count, range, and properties
- vocabulary
Matched math terms and counts
- domain_classification
Primary domain, confidence, and scores
- math_density
Ratio of math patterns to word count
Domain categories: arithmetic, algebra, geometry, statistics, calculus, fractions
Cognitive Load#
CognitiveLoadEstimator#
Class for estimating cognitive load components.
from mathipy import CognitiveLoadEstimator
estimator = CognitiveLoadEstimator()
result = estimator.estimate(text, readability_grade=5.2, math_terms=["equation", "solve"])
Returns a dictionary with:
- numeric_elements
Count of numbers in text
- variable_count
Count of single-letter variables
- operation_count
Count of math operations (+, -, *, /, ^, =, <, >)
- word_count
Total words in text
- sentence_count
Total sentences in text
- math_term_count
Count of CCSSM-aligned math keywords
- element_density
Ratio of (numeric elements + variables) to word count
- avg_sentence_length
Average words per sentence
Visual#
VisualFeatureExtractor#
Class for extracting complexity features from assessment images.
Requires: pip install mathipy[vision]
from mathipy import VisualFeatureExtractor
extractor = VisualFeatureExtractor()
features = extractor.extract("item_image.png")
Accepts a file path, Path object, or numpy array. Returns a dictionary with:
- dimensions
Width, height, aspect ratio, channels
- pixel_statistics
Mean, std, min, max, median, contrast
- edge_metrics
Canny edge ratio, Sobel/Laplacian statistics
- structural_elements
Detected lines, circles, shapes (triangles, rectangles, etc.)
- frequency_domain
Low/mid/high frequency energy ratios
- complexity_score
Summary with edge_ratio, total_shapes, and high_freq_ratio
OCR#
MultimodalOCR#
Class for extracting text and math from images using vision LLMs.
Requires: pip install mathipy[ocr] and a GEMINI_API_KEY or OPENAI_API_KEY.
from mathipy import MultimodalOCR
ocr = MultimodalOCR(provider="gemini")
result = ocr.extract("item_image.png")
Parameters:
- provider
"gemini"or"openai"- model
Model name (defaults to
gemini-2.5-flashorgpt-4o)- api_key
API key (or set via
.envfile)
Accepts image path, URL, bytes, PDF, DOCX, or text file. Returns a dictionary with:
- content_type
text_only,image_only, ormixed- full_text
All extracted text
- image_description
Visual content description
- question_text
Main question/problem statement
- math_expressions
List of LaTeX expressions
- answer_choices
Dictionary of answer choices
- extraction_confidence
Confidence score (0–1)
Visual Model Classification#
VisualModelClassifier#
Classify which of the 20 visual model types appear in an assessment image.
Requires: pip install mathipy[ocr] and a GEMINI_API_KEY or OPENAI_API_KEY.
from mathipy import VisualModelClassifier
classifier = VisualModelClassifier(provider="gemini")
result = classifier.classify("item_image.png")
Parameters:
- provider
"gemini"or"openai"- model
Model name (defaults to
gemini-2.5-flashorgpt-4o)- api_key
API key (or set via
.envfile)
Returns a dictionary with a boolean per model type, "primary" (str), and "model_count" (int).
visual_model_groups#
Dictionary mapping each of the 20 visual model types to a broader category.
from mathipy import visual_model_groups
print(visual_model_groups["number_line"]) # "number_quantity"
visual_model_info#
List of tuples with (model_name, group, CCSSM_domains, grade_band) for each visual model type.
from mathipy import visual_model_info
for name, group, domains, grades in visual_model_info[:3]:
print(f"{name}: {group}, {domains}, {grades}")
Item-Level Extraction#
ItemFeatureExtractor#
Run all four analyzers (readability, math content, cognitive load, visual) and return a flat prefixed dictionary.
from mathipy import ItemFeatureExtractor
extractor = ItemFeatureExtractor()
result = extractor.extract(
text="Solve for x: 2x + 5 = 15",
image_paths=["item_image.png"],
)
Returns a flat dict with readability_*, math_*, cognitive_*, and visual_* keys.
Utilities#
safe_get#
Retrieve a nested value from a dict, returning a default on any miss.
from mathipy import safe_get
data = {"a": {"b": {"c": 42}}}
safe_get(data, "a", "b", "c") # 42
safe_get(data, "a", "x", default=0) # 0
compute_interrater_reliability#
Compute agreement and Cohen’s kappa between two raters.
from mathipy import compute_interrater_reliability
result = compute_interrater_reliability([1, 2, 3, 1], [1, 2, 1, 1])
print(result) # {"agreement": 0.75, "kappa": 0.58, "n": 4}
Sample Data#
from mathipy.data import get_sample_csv, get_sample_image, list_sample_images
csv_path = get_sample_csv()
images = list_sample_images()
image_path = get_sample_image("2024-4M10 #2")