Many of us are familiar with the ticalc.org CD from a while ago. With the magic of modern technology, you too can build a CD (well, DVD) containing the ticalc.org file archives.

The main thing enabling this is the WARC of the site I built a few months ago. I used the following short python script to extract files from the WARC:
Code:
import os
import warc

GROUPS = ('/pub/', '/archives/files/', '/images/', '/style.css', '/global.js', '/mfunctions.js')
GROUPS = ['http://www.ticalc.org' + s for s in GROUPS]

def extract_files(filename):
    f = warc.WARCFile(filename)
    for record in f:
        if record.type == 'response':
            if any(map(record.url.startswith, GROUPS)) or record.url == 'http://www.ticalc.org/':
                relpath = record.url[len('http://www.ticalc.org/'):]
                response_code = record.payload.readline()
                # Skip redirects and whatnot
                if not response_code.startswith('HTTP/1.1 200'):
                    continue

                print relpath
                if len(relpath) == 0 or relpath[-1] == '/':
                    relpath += 'index.html'
                (dirpath, _) = os.path.split(relpath)
                dirpath = 'out/' + dirpath
                if not os.path.exists(dirpath):
                    os.makedirs(dirpath)

                # Skip past response header
                hdr = 0
                for line in record.payload:
                    if line == '\r\n':
                        break
                    else:
                        hdr = 0
                # Write response body
                with open('out/' + relpath, 'wb') as f:
                    f.write(record.payload.read())

if __name__ == '__main__':
    import sys
    extract_files(sys.argv[1])
(Requires the IA WARC library.)

Running this script emits about 170 thousand files into the 'out' directory, containing the file archives and a few top-level pages. Because I don't rewrite any links, you'll want to browse if with a web server (rather than directly from the filesystem), such as python's http.server module.

Building an actual disc image from that directory is pretty straightforward, for example with mkisofs:

Code:
$ mkisofs -iso-level 3 -udf -V "ticalc.org August 2014" -o ticalc.org.iso out

Result:
Rad stuff.
Someone elsewhere wanted the zip files from ticalc.org so it was convenient that I'd already done this before, but the script above doesn't work in Python 3 because the warc library seems unmaintained and doesn't work in Python 3. I've updated it to work with Python 3 using the warcio library.

Code:
#!/usr/bin/env python3
# ticalc.org archive extractor
#
# Given a WARC of ticalc.org, writes out the file archives contents to files.
# Specify the path to a WARC on the command line, or if none is specified the
# WARC from https://archive.org/details/ticalc-2014-08 will be streamed.

import os
import shutil
from warcio.archiveiterator import ArchiveIterator

GROUPS = ('/pub/', '/archives/files/', '/images/', '/style.css', '/global.js', '/mfunctions.js')
GROUPS = ['http://www.ticalc.org' + s for s in GROUPS]

def extract_files(archive):
    for record in archive:
        if record.rec_type == 'response':
            url = record.rec_headers.get_header('WARC-Target-URI')
            if any(map(url.startswith, GROUPS)) or url == 'http://www.ticalc.org/':
                relpath = url[len('http://www.ticalc.org/'):]
                # Skip redirects and whatnot
                if not record.http_headers.get_statuscode() == '200':
                    continue

                print(relpath)
                if len(relpath) == 0 or relpath[-1] == '/':
                    relpath += 'index.html'
                (dirpath, _) = os.path.split(relpath)
                dirpath = 'out/' + dirpath
                if not os.path.exists(dirpath):
                    os.makedirs(dirpath)

                # Write response body
                with open('out/' + relpath, 'wb') as f:
                    shutil.copyfileobj(record.content_stream(), f)

if __name__ == '__main__':
    import sys
    args = sys.argv[1:]
    if args:
        stream = open(args[0])
    else:
        import requests
        stream = requests.get('https://archive.org/download/ticalc-2014-08/ticalc.org-20140813.warc.gz', stream=True).raw
    extract_files(ArchiveIterator(stream))
10 years on from my last capture of ticalc as a whole, there's been some recent talk that ticalc.org seems unwell (for example, poll results pages show an error instead of the actual poll results and it's been nearly 6 months since any files were accepted into the archives) so I grabbed a new archive: https://archive.org/details/ticalc-2024-04.

Improvements in user-friendly web archiving tooling make it easier for anybody to access the contents of this new archive, since I've built a WACZ archive that can be easily loaded by tools like Replay Webpage or pywb. After downloading ticalc-20240405.wacz and loading it into Replay Webpage, you can browse the archive like this (including fulltext search!):


It seems like Replay Webpage might not be totally happy with a large (several gigabytes) archive like this one, since if I search for "Doors CS" and follow links to the DCS 7.4 page in the archives it claims there isn't an archived copy of that but by inspecting the underlying WARC and indexes I can see there is a stored copy. So, uh, take it with a grain of salt if a viewing tool claims a page isn't present in the archive. The file extraction tool I shared earlier in this thread should work fine as well, but it's somewhat more inconvenient to use.
That's not good news. Thank you for creating an archive! Who all are meant to maintain the site?
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement