Many of us are familiar with the ticalc.org CD from a while ago. With the magic of modern technology, you too can build a CD (well, DVD) containing the ticalc.org file archives.

The main thing enabling this is the WARC of the site I built a few months ago. I used the following short python script to extract files from the WARC:
Code:
import os
import warc

GROUPS = ('/pub/', '/archives/files/', '/images/', '/style.css', '/global.js', '/mfunctions.js')
GROUPS = ['http://www.ticalc.org' + s for s in GROUPS]

def extract_files(filename):
    f = warc.WARCFile(filename)
    for record in f:
        if record.type == 'response':
            if any(map(record.url.startswith, GROUPS)) or record.url == 'http://www.ticalc.org/':
                relpath = record.url[len('http://www.ticalc.org/'):]
                response_code = record.payload.readline()
                # Skip redirects and whatnot
                if not response_code.startswith('HTTP/1.1 200'):
                    continue

                print relpath
                if len(relpath) == 0 or relpath[-1] == '/':
                    relpath += 'index.html'
                (dirpath, _) = os.path.split(relpath)
                dirpath = 'out/' + dirpath
                if not os.path.exists(dirpath):
                    os.makedirs(dirpath)

                # Skip past response header
                hdr = 0
                for line in record.payload:
                    if line == '\r\n':
                        break
                    else:
                        hdr = 0
                # Write response body
                with open('out/' + relpath, 'wb') as f:
                    f.write(record.payload.read())

if __name__ == '__main__':
    import sys
    extract_files(sys.argv[1])
(Requires the IA WARC library.)

Running this script emits about 170 thousand files into the 'out' directory, containing the file archives and a few top-level pages. Because I don't rewrite any links, you'll want to browse if with a web server (rather than directly from the filesystem), such as python's http.server module.

Building an actual disc image from that directory is pretty straightforward, for example with mkisofs:

Code:
$ mkisofs -iso-level 3 -udf -V "ticalc.org August 2014" -o ticalc.org.iso out

Result:
Rad stuff.
Someone elsewhere wanted the zip files from ticalc.org so it was convenient that I'd already done this before, but the script above doesn't work in Python 3 because the warc library seems unmaintained and doesn't work in Python 3. I've updated it to work with Python 3 using the warcio library.

Code:
#!/usr/bin/env python3
# ticalc.org archive extractor
#
# Given a WARC of ticalc.org, writes out the file archives contents to files.
# Specify the path to a WARC on the command line, or if none is specified the
# WARC from https://archive.org/details/ticalc-2014-08 will be streamed.

import os
import shutil
from warcio.archiveiterator import ArchiveIterator

GROUPS = ('/pub/', '/archives/files/', '/images/', '/style.css', '/global.js', '/mfunctions.js')
GROUPS = ['http://www.ticalc.org' + s for s in GROUPS]

def extract_files(archive):
    for record in archive:
        if record.rec_type == 'response':
            url = record.rec_headers.get_header('WARC-Target-URI')
            if any(map(url.startswith, GROUPS)) or url == 'http://www.ticalc.org/':
                relpath = url[len('http://www.ticalc.org/'):]
                # Skip redirects and whatnot
                if not record.http_headers.get_statuscode() == '200':
                    continue

                print(relpath)
                if len(relpath) == 0 or relpath[-1] == '/':
                    relpath += 'index.html'
                (dirpath, _) = os.path.split(relpath)
                dirpath = 'out/' + dirpath
                if not os.path.exists(dirpath):
                    os.makedirs(dirpath)

                # Write response body
                with open('out/' + relpath, 'wb') as f:
                    shutil.copyfileobj(record.content_stream(), f)

if __name__ == '__main__':
    import sys
    args = sys.argv[1:]
    if args:
        stream = open(args[0])
    else:
        import requests
        stream = requests.get('https://archive.org/download/ticalc-2014-08/ticalc.org-20140813.warc.gz', stream=True).raw
    extract_files(ArchiveIterator(stream))
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement