Login [Register]
Don't have an account? Register now to chat, post, use our tools, and much more.
Content Association In Wikipedia
Download Content Association In Wikipedia
Statistics 289 downloads
Folder Computer Programs
Author
Description Wikipedia, the free online encyclopedia, contains a wealth of intellectually and monetarily free content (in common terminology, “free as in speech and free as in beer”). The sheer number of users editing the corpus means that the majority of the articles are well-written and largely factual. However, the relationship between related articles, usually inferred by the See Also links at the bottom of each article, are generally incomplete compared to the relationships implied by words linked amidst the text of each article. We propose a PHP framework to spider Wikipedia, collecting both full-text word lists and lists containing only the words from the text of internal links. We propose comparing the relative performance of a system that attempts to find similarity metrics between articles based on the full text of each article and one based on only on the linked words in each article. This implementation uses the TF-IDF algorithm to normalize word frequency and the cosine similarity metric to rank article similarity. Please see http://www.cemetech.net/projects/item.php?id=30 for a full description and documentation, including information on installing and using this program.
Screenshots
First Upload 05 May 2010 01:16:57 pm
Last Update 05 May 2010 01:17:45 pm
Rating
Content Association In Wikipedia 10 0
Not yet rated
Reviews
Contents Test "wikinlp.zip" in jsTIfied calculator emulator Test entire contents in jsTIfied calculator emulator

File Name File Size Test in Emulator
func_wiki.php 8276 ----
wiki_nlp.php 6728 ----
func_misc.php 486 ----

Advertisement