API Reference#
Readability#
ReadabilityAnalyzer#
Class for analyzing text readability with math-aware normalization.
from mathipy import ReadabilityAnalyzer
analyzer = ReadabilityAnalyzer()
result = analyzer.analyze("Solve for x: 2x + 5 = 15")
Returns a dictionary with:
- flesch_reading_ease
Flesch Reading Ease score (0–100)
- flesch_kincaid_grade
Flesch-Kincaid grade level
- gunning_fog
Gunning Fog index
- smog_index
SMOG index
- automated_readability_index
ARI score
- coleman_liau_index
Coleman-Liau index
- linsear_write_formula
Linsear Write formula
- dale_chall_readability
Dale-Chall readability score
- average_grade_level
Average of FK, Fog, and SMOG
- low_confidence
Trueif text is shorter than 20 words
Math Content#
MathContentAnalyzer#
Class for math content analysis and CCSSM domain classification.
from mathipy import MathContentAnalyzer
analyzer = MathContentAnalyzer()
result = analyzer.analyze("What is the area of a triangle with base 6 and height 4?")
Returns a dictionary with:
- pattern_matches
Detected math patterns (equations, fractions, etc.)
- symbol_counts
Counts of math symbols by type
- total_math_symbols
Total math symbols found
- numbers
Extracted numbers with count, range, and properties
- vocabulary
Matched math terms and counts
- domain_classification
Primary domain, confidence, and scores
- math_density
Ratio of math patterns to word count
Domain categories: arithmetic, algebra, geometry, statistics, calculus, fractions
Cognitive Load#
CognitiveLoadEstimator#
Class for estimating cognitive load components.
from mathipy import CognitiveLoadEstimator
estimator = CognitiveLoadEstimator()
result = estimator.estimate(text, readability_grade=5.2, math_terms=["equation", "solve"])
Returns a dictionary with:
- intrinsic_cognitive_load
Load from inherent item complexity (0–1)
- extraneous_cognitive_load
Load from presentation/language (0–1)
- germane_cognitive_load
Load from schema building (0–1)
- total_cognitive_load
Weighted total (0–1)
- numeric_elements
Count of numbers in text
- variable_count
Count of single-letter variables
- operation_count
Count of math operations
Visual#
VisualFeatureExtractor#
Class for extracting complexity features from assessment images.
Requires: pip install mathipy[vision]
from mathipy import VisualFeatureExtractor
extractor = VisualFeatureExtractor()
features = extractor.extract("item_image.png")
Accepts a file path, Path object, or numpy array. Returns a dictionary with:
- dimensions
Width, height, aspect ratio, channels
- pixel_statistics
Mean, std, min, max, median, contrast
- edge_metrics
Canny edge ratio, Sobel/Laplacian statistics
- structural_elements
Detected lines, circles, shapes (triangles, rectangles, etc.)
- frequency_domain
Low/mid/high frequency energy ratios
- complexity_score
Overall score (0–1) and level (low/medium/high)
OCR#
MultimodalOCR#
Class for extracting text and math from images using vision LLMs.
Requires: pip install mathipy[ocr] and a GEMINI_API_KEY or OPENAI_API_KEY.
from mathipy import MultimodalOCR
ocr = MultimodalOCR(provider="gemini")
result = ocr.extract("item_image.png")
Parameters:
- provider
"gemini"or"openai"- model
Model name (defaults to
gemini-2.5-flashorgpt-4o)- api_key
API key (or set via
.envfile)
Accepts image path, URL, bytes, PDF, DOCX, or text file. Returns a dictionary with:
- content_type
text_only,image_only, ormixed- full_text
All extracted text
- image_description
Visual content description
- question_text
Main question/problem statement
- math_expressions
List of LaTeX expressions
- answer_choices
Dictionary of answer choices
- extraction_confidence
Confidence score (0–1)
Sample Data#
from mathipy.data import get_sample_csv, get_sample_image, list_sample_images
csv_path = get_sample_csv()
images = list_sample_images()
image_path = get_sample_image("2024-4M10 #2")