Large-Scale Analysis of Wikipedia’s Link Structure and its Applications in Learning Path Construction

Aug 4, 2023 • Yiding Song & Chun Hei Leung (IEEE IRI 2023)

Abstract

As the largest encyclopedia in history, Wikipedia represents an unprecedented unification of the world’s knowledge. Its internal links are an invaluable resource for understanding the relationships between concepts and information organization on the Web. However, such link structures are not thoroughly examined and barely visualized. In this paper, we take a graph-theoretic approach to investigate the link structure of English Wikipedia, providing an up-to-date snapshot of its knowledge organization, including degree distributions, strongly connected components, and disconnected subgraphs. To the best of our knowledge, we also perform the first k-core visualization over all of Wikipedia. Our results suggest Wikipedia is highly connected, with 90.05% of articles reachable from one another. Inbound links are found to be a better measure of an article’s importance than outbound links and demonstrate a more centralized mode of connection. Based on our observations, we propose a novel, end-to-end framework for automatically constructing learning paths, using Wikipedia links to recursively shortlist and rank prerequisite concepts for understanding new topics.

Citation

Use the following BibTeX entry to cite this work:

@inproceedings{song2023large,
  author={Song, Yiding and Leung, Chun Hei},
  booktitle={2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)}, 
  title={Large-Scale Analysis of Wikipedia’s Link Structure and its Applications in Learning Path Construction}, 
  year={2023},
  pages={254-260},
  doi={10.1109/IRI58017.2023.00051}
}