Back to Home

Extracting Characters and Emotions from Little Red Riding Hood Using LangExtract

2025-08-10

1. Introduction

  • Why I wrote this article
    I became interested in LangExtract, an information extraction library compatible with Google's Gemini model, and decided to try it out using the fairy tale Little Red Riding Hood as a case study.

  • What is LangExtract?
    LangExtract is a cutting-edge information extraction library that leverages Google's Gemini language model. It efficiently and accurately extracts structured information such as "characters," "emotions," and "relationships" from natural language text.
    It combines techniques like few-shot learning, source grounding, and controlled generation to ensure reliable extraction aligned closely with the original text.

  • Target Audience
    Developers and engineers interested in natural language processing and information extraction.

  • What you will learn
    How to perform structured information extraction using LangExtract and gain a practical sense of its applications.


2. Overview

  • Theme of this experiment
    Using LangExtract to extract "characters," "emotions," and "relationships" from the text of Little Red Riding Hood.

  • Terminology

    • Source grounding
      Extracting information while retaining the original position data in the text.
    • Controlled generation
      Controlling output format and structure to achieve stable results.
    • Few-shot learning
      Learning and extracting based on a small number of example data.
  • Technical Background
    LangExtract utilizes the Gemini model and supports chunk processing, parallel extraction, and multi-stage processing as a powerful information extraction library.


3. Experiment: Extracting from Little Red Riding Hood

3.1 Environment and Model Used

  • Repository:
  • Model used: Gemini series (e.g., gemini-2.5-flash)

3.2 Original Text and Few-Shot Examples

We used the following English and Japanese texts with respective few-shot examples.
(Source: https://www.grimmstories.com/language.php?grimm=026&l=ja&r=en)

English Few-Shot Example

extractions=[
    lx.data.Extraction(
        extraction_class="character",
        extraction_text="LITTLE RED-CAP",
        attributes={"role": "protagonist"},
    ),
    lx.data.Extraction(
        extraction_class="emotion",
        extraction_text="I'll go visit my grandmother.",
        attributes={"feeling": "resolved"},
    ),
]

Japanese Few-Shot Example

extractions=[
    lx.data.Extraction(
        extraction_class="character",
        extraction_text="赤ずきん",
        attributes={"role": "主人公"},
    ),
    lx.data.Extraction(
        extraction_class="relationship",
        extraction_text="おばあさんのところへ行く",
        attributes={"to": "おばあさん", "relationship": "訪問"},
    ),
]

3.3 Extraction Results

Example extraction results from the Japanese version:
Japanese extraction results

Example extraction results from the English version:
English extraction results


Overall, the English extraction demonstrated very high accuracy, while the Japanese version had many null values, suggesting that there is still room for improvement in multilingual support.

4. Conclusion

In this experiment, we extracted characters, emotions, and relationships from the fairy tale Little Red Riding Hood using LangExtract.
Thanks to the Gemini model's few-shot learning and source grounding capabilities, we were able to obtain highly accurate structured data.

These information extraction techniques are very useful in practical NLP work and research. If you are interested, please try setting up the environment and experimenting with it yourself.


Related Posts

2025-08-11

Introducing a template that combines the latest Python tools with AI automation to achieve efficient, high-quality development. Accelerate your development with dependency management, code quality assurance, and AI-integrated workflows.

2025-08-09

Rork is an AI tool that generates native mobile apps from natural language descriptions and supports building and deploying them to the App Store and Google Play. In this article, we share the process and impressions from inputting the requirements of a home life management app into Rork and testing the flow from "generation → functional check → store preparation."

2025-08-12

New

Anything (formerly Create) is an AI platform that automatically generates web and mobile apps from natural language prompts. This article outlines the process and evaluation of prototyping a home life management app using a real-world specification example.