Author |
Message |
|
todlangweilig
Advanced Member
Joined: 14 Feb 2006 Posts: 470
|
Posted: 16 Mar 2006 04:50:34 pm Post subject: |
|
|
Im trying to make a python program that will look up words at Dictionary.com. It will take a text file input, convert the search words to a url and retrieve the html file with the definition of the word. The problem Im running into is how do I parse through the html file? Im not trying to render html or make a general purpose html parser. I just need to get to the definition, find the part of speech(s) and the definition(s). Any ideas on how to do this simply and easily? I figure more minds are better than one. |
|
Back to top |
|
|
elfprince13 Retired
Super Elite (Last Title)
Joined: 11 Apr 2005 Posts: 3500
|
Posted: 16 Mar 2006 04:58:32 pm Post subject: |
|
|
look through the source code of dictionary.com and see if they maintain a consistent layout on every page. then keep track of the number of times a certain tag occurs before the portions of text you want, then take out a slice from the end of that tag, till the beginning of the next tag.
smart usage of the basic string functions and slicing will serve you well
[edit]
btw...wxPython has some built in HTML stuff. Im not sure what the default Python libraries have though.
Last edited by Guest on 16 Mar 2006 05:04:59 pm; edited 1 time in total |
|
Back to top |
|
|
todlangweilig
Advanced Member
Joined: 14 Feb 2006 Posts: 470
|
Posted: 16 Mar 2006 05:26:24 pm Post subject: |
|
|
Im dont believe i have wxPython, im using the built-in module urllib, which seems to work just fine.
it looks like they are using nested <ol> 's and <li> 's. I probably really should read up on my string functions :P
But if anyone else has more ideas, please contribute
Last edited by Guest on 16 Mar 2006 05:28:45 pm; edited 1 time in total |
|
Back to top |
|
|
elfprince13 Retired
Super Elite (Last Title)
Joined: 11 Apr 2005 Posts: 3500
|
Posted: 16 Mar 2006 05:49:05 pm Post subject: |
|
|
cant the number of <ol> and <li>'s until the info you want
Code: file = ...
file = file.lower()
A = 0
numTillData = 6 //Just an example...count em for yourself
curIndex = 0
While A≠numTillData:
curIndex = file.index("<li>",curIndex)
A = A + 1
Data = file[curIndex + 4:file.index("</li>",curIndex)]
w00t w00t. there you go.
Last edited by Guest on 16 Mar 2006 05:50:07 pm; edited 1 time in total |
|
Back to top |
|
|
Arcane Wizard `semi-hippie`
Super Elite (Last Title)
Joined: 02 Jun 2003 Posts: 8993
|
Posted: 16 Mar 2006 06:53:04 pm Post subject: |
|
|
Also, look for one of these strings:
"entries found for" or "No entry found for"
If the first is found, get the number of entries found (position of the above string - 1 word) and continue getting the definition(s), otherwise, don't. |
|
Back to top |
|
|
todlangweilig
Advanced Member
Joined: 14 Feb 2006 Posts: 470
|
Posted: 16 Mar 2006 10:16:58 pm Post subject: |
|
|
WOOHOO, its working and im done with my homework :biggrin: :biggrin:
I've included my program, so if you have python, give it a try. its in a very rough stage though. I haven't gotten it to go through and remove the tags yet, thats next.
fyi- i decided to use Merriam-Webster because its more consistant and easier to work with.
TO USE:
**rename dictionary.txt to dictionary.py**
make a txt file, *.txt, enter word(s) you would like defined, one word per line and pressing enter after each word. if you dont press enter on the last word it'll give an error(bug dealing with how i remove new line chars).
save file, run program, it will display "Enter file path:", enter full pathname. it will display "Done!" and output to file *1.txt, then open and read
Last edited by Guest on 17 Mar 2006 11:03:57 pm; edited 1 time in total |
|
Back to top |
|
|
elfprince13 Retired
Super Elite (Last Title)
Joined: 11 Apr 2005 Posts: 3500
|
Posted: 17 Mar 2006 09:12:07 am Post subject: |
|
|
you could probably make it do input/output on the command line is well. read in lines until you get a blank one. then look up the words and output them. handy lil program though.
Last edited by Guest on 17 Mar 2006 09:14:30 am; edited 1 time in total |
|
Back to top |
|
|
sexybear979
Newbie
Joined: 27 Mar 2006 Posts: 28
|
Posted: 03 Apr 2006 06:13:48 pm Post subject: |
|
|
can i someone compile that program and attach it? |
|
Back to top |
|
|
elfprince13 Retired
Super Elite (Last Title)
Joined: 11 Apr 2005 Posts: 3500
|
Posted: 03 Apr 2006 07:25:07 pm Post subject: |
|
|
you dont usually compile Python. its a scripting language. you can download the Python engine from www.python.org |
|
Back to top |
|
|
sexybear979
Newbie
Joined: 27 Mar 2006 Posts: 28
|
Posted: 03 Apr 2006 08:21:00 pm Post subject: |
|
|
ahhh... does that make python worth using if people have to run an engine to use it, or is it for behind-the-scenes stuff? |
|
Back to top |
|
|
elfprince13 Retired
Super Elite (Last Title)
Joined: 11 Apr 2005 Posts: 3500
|
Posted: 03 Apr 2006 08:26:04 pm Post subject: |
|
|
sexybear979 wrote: ahhh... does that make python worth using if people have to run an engine to use it, or is it for behind-the-scenes stuff?
[post="73956"]<{POST_SNAPBACK}>[/post]
once you install the engine you shouldn't have to worry about it again. just start any python programs you have by double clicking. |
|
Back to top |
|
|
todlangweilig
Advanced Member
Joined: 14 Feb 2006 Posts: 470
|
Posted: 03 Apr 2006 08:38:16 pm Post subject: |
|
|
I think what sexybear979 is talking about is a "frozen binary", i haven't much of a clue how to make one though.
I think ill get to work on this program as soon as i get done with my retarded english project |
|
Back to top |
|
|
|