Content uploaded by Johannes Frey
Author content
All content in this area was uploaded by Johannes Frey on Jul 10, 2023
Content may be subject to copyright.
1,2∗ 1,2† 1,2† 1†
1 1,2 1,2
1Institute for Applied Informatics, Goerdelerring 9, 04109 Leipzig, Germany, https:// infai.org
2https:// AKSW.org
LLM-KG-Bench
prompt engineering
LLM-
KG-Bench
Semantics ’23: 19th International Conference on Semantic Systems, September 20–22, 2023, Leipzig, Germany
∗
†
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
Knowledge Base Construction from
Pre-trained Language Models (LM-KBC) Challenge
Beyond the Imitation Game
(BIG-bench) Benchmark
Language Model Evaluation Harness
LLM-KG-Bench
LLM-KG-Bench
BIG-bench
LLM-KG-Bench
generate_text
evaluate_model
Benchmark-Runner
Bench Task (connector, size)
Query generator
AI-Model-connector
Answer Evaluator
plot
Stats
Storage
AI
Text
Text
API
Stats
Task-Info
addon queries
Benchmark
Collection
Connector
Collection
Iterate (according to config):
Sizes x Iterations x Connectors x Benchmarks
Benchmark Config:
Iterations=10
Sizes={1k, 10k, 1m}
Connectors={1,2}
Benchmarks= {1,3,4}
Basic LLM-KG-Bench framework architecture
LLM-KG-Bench
seaborn
LLM-KG-Bench
Subset of metrics from initial tasks. Shown are the F1 scores and mean error of person count
foaf:Person
foaf:knows
persons_relative_error
= 0
> 0
< 0
−1
persons_relative_error
LLM-KG-Bench
LLM-KG-Bench
arXiv:2303.08774
arXiv:2206.04615
arXiv:2306.08302
arXiv:2304.02711
LLM-KG-Bench