This is an archived, read-only copy of the United-TI subforum , including posts and topic from May 2003 to April 2012. If you would like to discuss any of the topics in this forum, you can visit Cemetech's Your Projects subforum. Some of these topics may also be directly-linked to active Cemetech topics. If you are a Cemetech member with a linked United-TI account, you can link United-TI topics here with your current Cemetech topics.

This forum is locked: you cannot post, reply to, or edit topics. Project Ideas/Start New Projects => Your Projects
Author Message
SniperOfTheNight


Advanced Member


Joined: 24 May 2003
Posts: 260

Posted: 06 Jun 2003 03:38:41 pm    Post subject:

I think that's basically the idea. It's like the eReader they have for the TI-89.
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 07 Jun 2003 01:00:48 am    Post subject:

I was kinda thinking about word compression and I had a few crazy ideas. Here's one (kinda impractical, but oh well):

We make some kind of 'include' file to be loaded on the calculator. The file would equate different bytes to different words

Example:
01h=The
02h=cow
03h=jumped
04h=over
05h=moon

Now, we make a pc program to convert text to hex using these equates. The output for 'The cow jumped over the moon' would be:
01 02 03 04 01 05

We load the include file and the hex file on the calculator, plus another program to convert the hex back to text.

Well, technically I compressed a six word sentence down to 6 bytes (75% compression from 24 Cool. But, the include file would probably take up massive amounts of space, plus, there's what, 100,000 words in the english language (impossible to fit that on a calculator) (but...maybe the pc program could find the most popular words in the text and build the most size-efficient include file specific only for that text)

Well, that's my impractical idea. what do you think? Confused
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 07 Jun 2003 01:03:49 am    Post subject:

And I guess the include file could only include 256 different words. Neutral
Back to top
Jeremiah Walgren
General Operations Director


Know-It-All


Joined: 24 May 2003
Posts: 1937

Posted: 07 Jun 2003 01:33:12 am    Post subject:

I thought there was something like 650,000 words in the English language...
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 07 Jun 2003 09:55:18 am    Post subject:

That'd make it all the more impossible Smile
But, there are some other ideas I'm coming up with...
Back to top
SniperOfTheNight


Advanced Member


Joined: 24 May 2003
Posts: 260

Posted: 07 Jun 2003 10:31:56 am    Post subject:

It would probably make more sense to make it a different equate for each letter. If you wanted to have every word that you would find in a book,that would be wayt too much work.
Back to top
Spyderbyte


Advanced Member


Joined: 29 May 2003
Posts: 372

Posted: 07 Jun 2003 11:44:57 am    Post subject:

But then there wouldn't be any compression. You would just end up with a series of numbers as long, if not longer than the original word. Since there are 26 letters, some numbers would have to be two digits, which is not only twice as long as the letter, but then you'd have to figure out some way to tell if the number was 23 or 2 and 3.

But then again, there might be something about how a letter is stored that would make it worthwhile.

Just my two cents worth. Very Happy

Spyderbyte
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 07 Jun 2003 12:27:59 pm    Post subject:

The thing is that all ascii characters are already equated to a certain byte, so there wouldn't be any compression. But say if we equated combinations of two letters to a single byte. We could equate the most common combinations (qu, in, st, sh). Granted, there are over 6 million two-letter combinations, but we could only do the most used ones. (you won't find jk, xw, qw, and all those other weird combinations in any words in English, so there's no purpose in equating them.) That would provide at most 50 percent compression.
Back to top
Adm.Wiggin
aka Tianon


Know-It-All


Joined: 02 Jun 2003
Posts: 1874

Posted: 07 Jun 2003 12:47:33 pm    Post subject:

ya, combos like qu,th,ch,tch(catch),and... u could do stuff like that, including 3 letter combos...
Back to top
NETWizz
Byte by bit


Bandwidth Hog


Joined: 20 May 2003
Posts: 2369

Posted: 08 Jun 2003 03:32:40 am    Post subject:

The loger the text file, the better the compression will be with the Zif or Huffman routine.

As for indexing, it may not be easy because we would probably have to use a 2 byte index due to the fact that there will be more than 256 words.

Also, we would need to make two passes to count the number of time each word is used.

If we use the word book only once, we save nothing by compressing it; in fact, we waste room in the index.

Real compression does not really separate words.

e.g.

The quick brown fox jumped over the lazy dog, and the fox jacked a soda.
x j can be comressed because it occurs more than once!
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 09 Jun 2003 12:42:35 am    Post subject:

I'm not familiar with the Zif or Huffman routines. Can you elaborate?

With the two byte index: Yeah, it would provide a larger amount of compression because more words would be equated, but the issue would be space (I'm not sure if we could fit 65536 words and their equates on the calculator.) Even so, that's a lot of different words (I probably don't even use that amount in a day's worth of conversation).

Basically, I think an idea like this is hopeless, but it would be an interesting way to do things...
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 09 Jun 2003 01:08:43 am    Post subject:

Ok, I skimmed this from the web on how Huffman works:

"This algorithm, developed by D.A. Huffman, is based on the fact that in an input stream certain tokens occur more often than others. Based on this knowledge, the algorithm builds up a weighted binary tree according to their rate of occurrence. Each element of this tree is assigned a new code word, with the length of the code word being determined by its position in the tree. Therefore, the token which is most frequent and becomes the root of the tree is assigned the shortest code. Each less common element is assigned a longer code word. The least frequent element is assigned a code word which may have become twice as long as the input token. "

My question: What would be the basis for our tokens? Words? Letter combos?


Another crazy idea: We don't use ALL of the 256 character ASCII set, right? I'd venture to say that we normally use only about 100 characters in normal texts (as in literary novels and stuff). Well, if we cut those extra 156 characters out, it'd only take 7 bits per character instead of 8--it's not much but it adds up when you have a 20000 byte text!
Back to top
NETWizz
Byte by bit


Bandwidth Hog


Joined: 20 May 2003
Posts: 2369

Posted: 09 Jun 2003 03:29:55 am    Post subject:

We probably use fewer than 80.

Anyway, we would make the program in assembly meaning we would simply be working with bytes. 00 to FF.

Huffman would work, but I do not think the calculator has enough ram and processing speed to really be able to compress a large text file fast.

Lastly, Zif uses an easier method than the tree, so it should be easier to make. It also requires less ram and should run a little faster.

Still, it essentially does the same thing replaces the most frequently used characters with shorter symbols.
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 09 Jun 2003 09:50:07 am    Post subject:

It seems to me that if we used Zif (or Huffman), we'd still have to build an index as to what tokens the shorter symbols represent, right? That would take up more space, too...

Yeah, I guess it would be more around 80 characters. Anyway, we could set up a program to read 7 bits at a time. If there was a character that isn't in the chosen 80, we could have a sign bit thingy like 1111111b, which would tell that there was a full 8 bit character to follow. This way, most of the characters would be 7 bits, and every once in a while, there would be a 15 bit character. This method would still provide compression (unless an idiot tried to compress a text full of those other characters -- then it would double the size.)

I wasn't thinking about compressing the texts on the calculator, but on a PC, rather. The calculator would be able to read the compressed format without uncompressing it. (Mind you, I have noexperience in PC programming, so I don't know who'd write the compression prgm.)


Last edited by Guest on 09 Jun 2003 09:51:15 am; edited 1 time in total
Back to top
JoeImp
Enlightened


Active Member


Joined: 24 May 2003
Posts: 747

Posted: 09 Jun 2003 02:26:41 pm    Post subject:

Just tell me what you want done, and what the output will be, and I'll get started on it. I'm assuming we're going to be using C++, does anyone else want to use something different? We should mabey get a forum topic on this project if we are really going to get to work on it.
Back to top
NETWizz
Byte by bit


Bandwidth Hog


Joined: 20 May 2003
Posts: 2369

Posted: 09 Jun 2003 04:13:06 pm    Post subject:

I do not know exactly what we need to do, but I do know that it will be complicated.

I think the simplest compression would be to use bits instesd of bytes, sience we will not need a whole byte for each character.
Back to top
Spyderbyte


Advanced Member


Joined: 29 May 2003
Posts: 372

Posted: 09 Jun 2003 04:27:28 pm    Post subject:

I would think 6 bits would be plenty. 36 letters/numbers and 28 other symbols. All you would really need are punctuation anyway.

Spyderbyte
Back to top
NETWizz
Byte by bit


Bandwidth Hog


Joined: 20 May 2003
Posts: 2369

Posted: 09 Jun 2003 04:39:55 pm    Post subject:

Yes, 6 bits would be enough.
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 09 Jun 2003 07:46:40 pm    Post subject:

Hey thanks for the offer Joe!
But first, We should get a couple things straight:

I'm not sure that we could do 6 bits: I think we'd definitely want 26 upper and lower case letters, 10 numbers, "." and "," , plus a sign bit thingy (anything else?)-- All those would require at least 7 bits. We could cut out things like Z and X since they almost never are capitalized, but... or we could just leave the numbers off and hope that the compressed text is never math class notes Very Happy

I guess it is possible to do 6 bits, if we were real careful about which ones were included..
Back to top
John Barrus


Member


Joined: 25 May 2003
Posts: 131

Posted: 09 Jun 2003 11:19:30 pm    Post subject:

Well, I looked in the TI dev guide for the characters and their hex addresses and, much to my disappointment, the characters are not in any convenient order. Maybe we could just re-index the characters so that the important 60 or so will be at the front and all the rest would follow? It probably wouldn't affect the size of the reader by more than about 500 bytes.
Back to top
Display posts from previous:   
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
    » Goto page Previous  1, 2, 3  Next
» View previous topic :: View next topic  
Page 2 of 3 » All times are UTC - 5 Hours

 

Advertisement