We currently host three public-resource wikis on Cemetech:
  • WikiPrizm: Casio Prizm technical documentation
  • Doors CS: Doors CS documentation, both technical and user-facing
  • Learn: mostly programming-related documentation, much of it republished from other sites

They're okay resources for users, but as an administrator they're an annoyance: the MediaWiki software underpinning them needs maintenance (periodic updates, etc) and spam is a large problem. We changed wiki accounts to require manual approval several years ago which stopped the frequent spam, but the cost there is that the barrier to improving documentation for new users is raised significantly.

For these reasons, I've been experimenting with some tools to convert a MediaWiki data dump into a static site built from Markdown files (which can be stored in a git repository). This is easier to host because it only requires a regular web server (no application server involved), is no less accessible for new contributors and could be more accessible depending on the platform on which the repository is hosted, and I think it's nicer to use both as a consumer of the content and as an author.

I think this tooling is just about good enough to start actually using, so here's what the WikiPrizm data looks like when run through this: https://cemetech.gitlab.io/wikiprizm/. The corresponding git repository is https://gitlab.com/cemetech/wikiprizm.

What I'd like from everybody else is feedback on whether this seems like a good solution and how usable this new system seems. If I get positive feedback (or none at all, which I'll have to assume is positive) then I'll be comfortable moving ahead with bringing the existing wikis down and converting them with this tool, then hosting the new version at the same location as the old wikis.

Of particular note, I know the page organization is somewhat messy right now: some pages aren't categorized very well, and some categories behave weirdly. That shouldn't be very difficult to fix later, but because the wiki-to-markdown conversion is one-way I don't want to make those data fixes until the wikis are turned down. Otherwise, any changes I make might need to be made again later after importing the final version of the wiki's data.
I've had a bit of a look around the Markdown WikiPrizm instance.

Most of my points are probably obvious, I just want to make sure you're aware:
    - There are a few formatting characters visible in the Markdown version. For example, the main page has a bunch of '<b>' and '<big>' tags visible on the page. Another example is on the GetKey page, where a few of the paragraphs end with a visible '\'.
    - Underscores are included in some (but not all) links. MediaWiki uses underscores for spaces and they seem to have been in kept in some instances for whatever reason.
    - Similar to the random underscores, a few titles have incorrect capitalisation and random spaces in them. For example: 'PrintCXY' is now 'Print Cxy'.
    - Some links are rendered weirdly. For example, in the G3A File Format page, there are a whole bunch of ^ symbols where WikiPrizm had superscript references.
    - The theme feels harder to read. This can be fixed later, but the new theme has less spacing and well-defined sections than the original. It's also not as wide, which I personally dislike.
    - It will also be important to consider how changes are made to the wiki.
    You probably have a plan, but it isn't obvious to me how one would edit a page. There is a button on the bottom, but it currently just goes to GitLab and complains about a page not existing. Would editing/creating a page involve creating a merge request? Would this then require administration?


Overall, if the original wiki was hard to maintain, I guess it's worth making the change - but it still has a while to go before being a good replacement.

Also, would the Markdown version, when completed, be published to prizm.cemetech.net?
If so, could there be a simple redirect to prevent (or reduce the amount of) broken old links?
dr-carlos wrote:
There are a few formatting characters visible in the Markdown version. For example, the main page has a bunch of '<b>' and '<big>' tags visible on the page. Another example is on the GetKey page, where a few of the paragraphs end with a visible '\'.
Some links are rendered weirdly. For example, in the G3A File Format page, there are a whole bunch of ^ symbols where WikiPrizm had superscript references.
This seems like a limitation of wikitext handling, but the weird behaviors are rare enough that they can be fixed manually. I have changed it to make Pandoc not try to emit superscripts because Hugo's default markdown engine doesn't support them.

Quote:
Underscores are included in some (but not all) links. MediaWiki uses underscores for spaces and they seem to have been in kept in some instances for whatever reason.
I haven't found any examples of this offhand. Can you point to one?

Quote:
Similar to the random underscores, a few titles have incorrect capitalisation and random spaces in them. For example: 'PrintCXY' is now 'Print Cxy'.
The way MediaWiki handles page titles makes this annoying because it's impossible to differentiate between underscores that should be underscores and those that should be spaces. I made the generator emit explicit page titles that exactly match the MediaWiki ones because that seems most accurate, so now everything has underscores and automatic word breaks are gone.

Quote:
The theme feels harder to read. This can be fixed later, but the new theme has less spacing and well-defined sections than the original. It's also not as wide, which I personally dislike.
This is easy to adjust if you have specific proposals; the stylesheet is separate from the content.

Quote:
You probably have a plan, but it isn't obvious to me how one would edit a page. There is a button on the bottom, but it currently just goes to GitLab and complains about a page not existing. Would editing/creating a page involve creating a merge request? Would this then require administration?
The edit page doesn't go anywhere yet because the XML dump is still the source of truth. Once we check in the markdown files it'll go to the right place.
For editing, I expect to be generous with granting people commit permissions: the wiki already requires manual account approval, so it's not really any change in requiring that people be granted permission to edit.

Quote:
Also, would the Markdown version, when completed, be published to prizm.cemetech.net?
If so, could there be a simple redirect to prevent (or reduce the amount of) broken old links?

Yes, that's the intent. Redirects are slightly more difficult but can probably be done with a hack.
This looks really nice so far, good work Smile
Tari wrote:
This seems like a limitation of wikitext handling, but the weird behaviors are rare enough that they can be fixed manually. I have changed it to make Pandoc not try to emit superscripts because Hugo's default markdown engine doesn't support them.

Makes sense. I normally use HTML with Hugo because of these kinds of limitations - would this be helpful/possible in this situation? (I haven't looked at how you set it up)

Tari wrote:
I haven't found any examples of this offhand. Can you point to one?

This was mostly referring to the underscores which you state later about intentionally including.
For example, on the Prizm Programming Portal you can see that the underscores are included in page titles but not in links with custom text.

Tari wrote:
The way MediaWiki handles page titles makes this annoying because it's impossible to differentiate between underscores that should be underscores and those that should be spaces. I made the generator emit explicit page titles that exactly match the MediaWiki ones because that seems most accurate, so now everything has underscores and automatic word breaks are gone.

Yep, I understand. I guess that's the right call to make - this will just require manually changing page names.

On a side note, will changing page names (e.g. removing underscores) require manually changing all links as well?

Tari wrote:
This is easy to adjust if you have specific proposals; the stylesheet is separate from the content.

Great. I'll make an issue or MR once the content has been improved.

Tari wrote:
The edit page doesn't go anywhere yet because the XML dump is still the source of truth. Once we check in the markdown files it'll go to the right place.

Ah, okay. When do you think the markdown files will be added to the repo?

Tari wrote:
For editing, I expect to be generous with granting people commit permissions: the wiki already requires manual account approval, so it's not really any change in requiring that people be granted permission to edit.

Okay, that makes sense. You've probably already thought of this, but I think it's probably best if there is some kind of semi-regular backup (as it is much easier to delete the whole project in one commit than it was when using MediaWiki).

Tari wrote:
Yes, that's the intent. Redirects are slightly more difficult but can probably be done with a hack.

Great. Obviously the problem with redirects is changes in page titles, but it would be good (given that most pages with links to the wiki probably aren't being updated) to try.
Made some more changes and I'm pretty happy with the results now. Of note, I implemented another pandoc filter that turns sequences of inline code with hard line breaks into code blocks, which seems like a weird quirk of wikitext and how pandoc interprets it. That means we get this in the markdown:
Code:
blah blah blah

    int foo;
    int bar;
(a regular indented code block) rather than:
Code:
blah blah blah

`int foo;`\
`int bar;`\
which looks worse and is more awkward to edit.

dr-carlos wrote:
Tari wrote:
This seems like a limitation of wikitext handling, but the weird behaviors are rare enough that they can be fixed manually. I have changed it to make Pandoc not try to emit superscripts because Hugo's default markdown engine doesn't support them.

Makes sense. I normally use HTML with Hugo because of these kinds of limitations - would this be helpful/possible in this situation? (I haven't looked at how you set it up)
You can always write HTML inside the markdown, but having pandoc do that when translating from wikitext isn't easy. There are so few actual uses of that formatting in the wiki that I don't think it really matters.

Quote:
On a side note, will changing page names (e.g. removing underscores) require manually changing all links as well?
You can create aliases in page front matter to add redirects, so I'd expect that to be used most often when moving pages around. Intra-page links should usually use the ref shortcode which by default causes an error at build-time so it's easy to find links that you break as well (the error is turned off right now because the wiki has some broken links).

Quote:
Tari wrote:
The edit page doesn't go anywhere yet because the XML dump is still the source of truth. Once we check in the markdown files it'll go to the right place.

Ah, okay. When do you think the markdown files will be added to the repo?
Once I don't want to make any more automated changes in the conversion script, which seems like it should be soon.

Quote:
You've probably already thought of this, but I think it's probably best if there is some kind of semi-regular backup (as it is much easier to delete the whole project in one commit than it was when using MediaWiki).
It's trivial to revert a commit, so there's nothing special to be done here. The main branch should be protected against force pushes so somebody can't wipe out all the history, but that's all it needs.

Quote:
Great. Obviously the problem with redirects is changes in page titles, but it would be good (given that most pages with links to the wiki probably aren't being updated) to try.
I implemented a hack that should work, where there's a page called index.php (since that's what handles things on mediawiki) that pulls out the target page name and looks up the new path to that page. Some pages don't work as expected right now because Hugo (annoyingly) strips some non-ASCII characters from output paths; they'll probably need to be renamed, but we can also update the JSON file that drives the redirector later (it's generated by the conversion script as well, so it'll get checked in once the markdown gets checked in).
Seems good.
Let us know when people can start editing the Markdown.
As you'll have noticed by now if you've looked at https://prizm.cemetech.net/ , I've flipped things over to the new system and done some manual cleanup by reorganizing pages and fixing assorted weird formatting that got mangled in the conversion.
Tari wrote:
As you'll have noticed by now if you've looked at https://prizm.cemetech.net/, I've flipped things over to the new system and done some manual cleanup by reorganizing pages and fixing assorted weird formatting that got mangled in the conversion.

This looks a lot better, thanks!
Sorry, but I have to say this is much worse... Some of the old links don't redirect properly, the search is sometimes unreliable, much of the formatting is messed up and I have to use archive.org to see what it was originally meant to say (EDIT: this doesn't work, they didn't archive all the pages), it seems like it would be harder to make changes as they have to be manually approved (though it does reduce the initial barrier a bit) etc.

I imagine with some more changes it can be at least as good as the original, but I think for now it's not ready to replace the original
Those things might be fixable, but without examples of what you think is broken it's difficult to say.

At least for making changes, (as I described earlier in this thread) most contributors will probably only require additional the first time which isn't really a change from before.
Sorry, I should have been more specific.

Here are some issues with the new site:

- I think this has already been mentioned but there are some formatting issues:

Some tables like the one on https://prizm.cemetech.net/Syscalls/Serial/Serial_Open/ are broken
The links inside the code block at https://prizm.cemetech.net/Tutorials/Reading_Input/ are broken

- The search is a bit wonky:

If you want to get to the "Reading Input" page, typing "input" or even "Reading Input" doesn't work. You have to type in "reading" or "Reading_Input". If the search was a bit more forgiving of this sort of thing it would be easier to find things.

- The categorization is a little confusing

Clicking Syscalls doesn't bring up a full list of syscalls, even if it appears to at first. Instead some of them are hidden behind submenus. I'm not sure but I think the original allowed pages to be in multiple categories so you could view a whole list without expanding the pages.

- The issue I mentioned about redirects seems to be fixed now

Could you keep up a read-only copy of the original site please? It would help with fixing some of the issues like the formatting ones so that there is a reference, maybe I could make some PRs to fix things. Or maybe I can reconstruct it from the exported XML
Heath wrote:
- I think this has already been mentioned but there are some formatting issues:

Some tables like the one on https://prizm.cemetech.net/Syscalls/Serial/Serial_Open/ are broken
The links inside the code block at https://prizm.cemetech.net/Tutorials/Reading_Input/ are broken
The serial table seems to be the only one broken in that way; caused by nested tables translating badly. I've reformatted it so it looks reasonable again. Links inside code blocks are a thing that kind of works in wiki markup but not at all in markdown so I've simply removed the links inside code blocks on the input tutorial page.

Quote:
- The search is a bit wonky:

If you want to get to the "Reading Input" page, typing "input" or even "Reading Input" doesn't work. You have to type in "reading" or "Reading_Input". If the search was a bit more forgiving of this sort of thing it would be easier to find things.
This is probably a side effect of pages having wrong titles (because mediawiki doesn't allow spaces in titles). Having demonstrated how to correct titles with underscores, I can now search for "input" and get "Reading Input" as a result.

Quote:
- The categorization is a little confusing

Clicking Syscalls doesn't bring up a full list of syscalls, even if it appears to at first. Instead some of them are hidden behind submenus. I'm not sure but I think the original allowed pages to be in multiple categories so you could view a whole list without expanding the pages.
Autogenerating a listing is probably doable with some templating work.

Quote:
Could you keep up a read-only copy of the original site please? It would help with fixing some of the issues like the formatting ones so that there is a reference, maybe I could make some PRs to fix things. Or maybe I can reconstruct it from the exported XML
I've pointed http://old.prizm.cemetech.net/ at the old instance, though no promises about how long it'll stay up (not that I have any plans to take it down at this point). Importing the XML into another mediawiki instance is another pretty easy option.
I've gotten around to doing the same sort of conversion for the DCS wiki as I did for wikiprizm, now live at https://dcs.cemetech.net/ backed by the git repository at https://gitlab.com/cemetech/dcswiki. The old version remains available for the forseeable future at http://old.dcs.cemetech.net/.

I took the liberty of doing a lot of reorganization of the Doors CS wiki, removing obsolete information (hints about Doors CE 9, since it's basically cancelled) and splitting it into obvious divisions of the user's manual and developer documentation (itself split neatly into color and monochrome sections).

Here's a side-by-side visual comparison of the frontpage:
  1. How difficult would it be to restore a site image (e.g. the chevrons)?
  2. I need to edit Doors CS 7.4 into that.
KermMartian wrote:
How difficult would it be to restore a site image (e.g. the chevrons)?
So easy it's already done.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement