Content Association In Wikipedia
Download Content Association In Wikipedia
Statistics 441 downloads
Folder Computer Programs
Description Wikipedia, the free online encyclopedia, contains a wealth of intellectually and monetarily free content (in common terminology, “free as in speech and free as in beer”). The sheer number of users editing the corpus means that the majority of the articles are well-written and largely factual. However, the relationship between related articles, usually inferred by the See Also links at the bottom of each article, are generally incomplete compared to the relationships implied by words linked amidst the text of each article. We propose a PHP framework to spider Wikipedia, collecting both full-text word lists and lists containing only the words from the text of internal links. We propose comparing the relative performance of a system that attempts to find similarity metrics between articles based on the full text of each article and one based on only on the linked words in each article. This implementation uses the TF-IDF algorithm to normalize word frequency and the cosine similarity metric to rank article similarity. Please see for a full description and documentation, including information on installing and using this program.
Short Link
First Upload 05 May 2010 01:16:57 pm
Last Update 05 May 2010 01:17:45 pm
Content Association In Wikipedia 10 0
Not yet rated
Contents Test "" in jsTIfied calculator emulator Test entire contents in jsTIfied calculator emulator

File Name File Size Test in Emulator
func_wiki.php 8276 ----
wiki_nlp.php 6728 ----
func_misc.php 486 ----