I have a hobby that’s probably a little unique.
I like to seek out interesting historical texts – papers, essays, even books – whose copyright has expired, and repackage them using modern technology. That (currently) means using:
- semantic XHTML5 to mark up texts;
- SVG 1.1 to produce diagrams and charts;
- MathML 3.0 to describe mathematical equations; and
- EPUB 3 to package everything into ebooks that can be read on most devices.
I also take care to use accessibility features in these technologies.
Most of the texts I repackage are already widely available online, for example in Project Gutenberg. However, they are often in very poor condition. Often they were scanned and converted using OCR technology, which can lead to peculiar ‘typos’ in the text, depending on the typeface and how ‘messy’ the original documents are (for example, I have seen words like ‘the’ misinterpreted as ‘tire’). Worse, they almost always fail to include the diagrams, formulae, and sometimes even the tables. Those things they do include are usually marked up quite poorly, with no respect for semantics, which can make reading difficult – if not impossible – for people with accessibility challenges.
My process is to first find an intact version of an original printing of the text, if at all possible, or at the very least a version of it that has been correctly formatted and presented (usually a physical copy, or a set of scans from a physical copy). Then I reproduce the text in proper, semantic markup – this is important, because it means that screen-readers for the visually impaired can read the text correctly. Next, I style the text to reproduce – as much as reasonably practical – the original appearance. Where I don’t have an original document to work from, I try to reproduce the appropriate ‘feel’. Finally, I package it all up in an EPUB 3 ebook (with some EPUB 2 compatibility features to account for poor-quality readers).
In all cases, the original text is – of course – in the public domain, and everything else in the ebook that is not part of the original text – the structure, the cover image, the layout, etc. – is released under a Creative Commons Attribution + ShareAlike licence. Licence details for each book can be found within the book.
If you find any problems in one of these books – such as typos or display issues – most of the books have an associated GitHub project. You can find a link on the book’s information page, or within the book itself. Please file an issue describing the problem, and I’ll fix it in future versions.
There’s also a list of potential future books. If you see something there you’d like to encourage me to work on quicker, or if you have another suggestion in mind, feel free to contact me and let me know.