LLM4Schema.org: Generating Schema.org Markups with Large Language Models

Tracking #: 3867-5081

Authors: 
Minh-Hoang Dang
Thi Hoang Thi Pham
Pascal Molli
Hala Skaf-Molli
Alban Gaignard

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Integrating Schema.org markup into web pages has resulted in the generation of billions of RDF triples. However, around 75% of web pages still lack this critical markup. Large Language Models (LLMs) present a promising solution by automatically generating the missing Schema.org markup. Despite this potential, there is currently no benchmark to evaluate the markup quality produced by LLMs. This paper introduces LLM4Schema.org, an innovative approach for assessing the performance of LLMs in generating Schema.org markup. Unlike traditional methods, LLM4Schema.org does not require a predefined ground truth. Instead, it compares the quality of LLM-generated markup against human-generated markup. Our findings reveal that 40–50% of the markup produced by GPT-3.5 and GPT-4 is invalid, non-factual, or non-compliant with the Schema.org ontology. These errors underscore the limitations of LLMs in adhering strictly to structured ontologies like Schema.org without additional filtering and validation mechanisms. We demonstrate that specialized LLM-powered agents can effectively identify and eliminate these errors. After applying such filtering for both human and LLM-generated markup, GPT-4 shows notable improvements in quality and outperforms humans. LLM4Schema.org highlights both the potential and the challenges of leveraging LLMs for semantic annotations, emphasizing the critical role of careful curation and validation to achieve reliable results.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/May/2025
Suggestion:
Accept
Review Comment:

I appreciate the authors' effort in addressing the comments and extending the experiment about human-based validation of the MeMR metric. I think the manuscript could be accepted.

Review #2
Anonymous submitted on 01/Jun/2025
Suggestion:
Accept
Review Comment:

The authors have carefully addressed all reviewer comments from the previous round. The revised manuscript includes:

- A detailed clarification of why OpenAI models were not compared,
- A concise yet informative summary of the prompting strategy in the methodology section,
- A clarified and simplified discussion of SHACL and its limitations (or lack thereof) in the given context,
- A revised paragraph on generalizability that clearly explains how the approach extends to other ontologies, and
- An improved evaluation of the MeMR method with a larger and more robust user study (36 pages, 23 participants).

In my view, the manuscript is now ready for publication.