Why I wrote this article
I became interested in LangExtract, an information extraction library compatible with Google's Gemini model, and decided to try it out using the fairy tale Little Red Riding Hood as a case study.
What is LangExtract?
LangExtract is a cutting-edge information extraction library that leverages Google's Gemini language model. It efficiently and accurately extracts structured information such as "characters," "emotions," and "relationships" from natural language text.
It combines techniques like few-shot learning, source grounding, and controlled generation to ensure reliable extraction aligned closely with the original text.
Target Audience
Developers and engineers interested in natural language processing and information extraction.
What you will learn
How to perform structured information extraction using LangExtract and gain a practical sense of its applications.
Theme of this experiment
Using LangExtract to extract "characters," "emotions," and "relationships" from the text of Little Red Riding Hood.
Terminology
Technical Background
LangExtract utilizes the Gemini model and supports chunk processing, parallel extraction, and multi-stage processing as a powerful information extraction library.
gemini-2.5-flash
)We used the following English and Japanese texts with respective few-shot examples.
(Source: https://www.grimmstories.com/language.php?grimm=026&l=ja&r=en)
extractions=[
lx.data.Extraction(
extraction_class="character",
extraction_text="LITTLE RED-CAP",
attributes={"role": "protagonist"},
),
lx.data.Extraction(
extraction_class="emotion",
extraction_text="I'll go visit my grandmother.",
attributes={"feeling": "resolved"},
),
]
extractions=[
lx.data.Extraction(
extraction_class="character",
extraction_text="赤ずきん",
attributes={"role": "主人公"},
),
lx.data.Extraction(
extraction_class="relationship",
extraction_text="おばあさんのところへ行く",
attributes={"to": "おばあさん", "relationship": "訪問"},
),
]
Example extraction results from the Japanese version:
Example extraction results from the English version:
Overall, the English extraction demonstrated very high accuracy, while the Japanese version had many null values, suggesting that there is still room for improvement in multilingual support.
In this experiment, we extracted characters, emotions, and relationships from the fairy tale Little Red Riding Hood using LangExtract.
Thanks to the Gemini model's few-shot learning and source grounding capabilities, we were able to obtain highly accurate structured data.
These information extraction techniques are very useful in practical NLP work and research. If you are interested, please try setting up the environment and experimenting with it yourself.
2025-08-11
Introducing a template that combines the latest Python tools with AI automation to achieve efficient, high-quality development. Accelerate your development with dependency management, code quality assurance, and AI-integrated workflows.
2025-08-09
Rork is an AI tool that generates native mobile apps from natural language descriptions and supports building and deploying them to the App Store and Google Play. In this article, we share the process and impressions from inputting the requirements of a home life management app into Rork and testing the flow from "generation → functional check → store preparation."
2025-08-12
Anything (formerly Create) is an AI platform that automatically generates web and mobile apps from natural language prompts. This article outlines the process and evaluation of prototyping a home life management app using a real-world specification example.