There are many ways to get to an ePUB – this covers just one way of building them and is the method we’ve used. It may not be the best or most efficient, and it also requires a number of software packages. Though if you are interested in what you are getting for your money from an ePUB conversion site, then they will be doing something along these lines (not necessarily using the same packages or applying the same checks).
Because ePUBs can be read on many different device types, on many different readers, by apps, on phones, and on computers, our start position was to make sure we had text that was as clean as possible, so before we started we first made sure the book used as few layouts and formats as it could. We’d done this already when building the files for the paperback and kindle format. First we made sure the paragraph formats were consistent, that tabs, fonts, indents, spacing etc were using as few different types as possible and that these were all the same. Then we looked for a tool to convert the ePUB. The one we decided to use was QuarkXpress 9, (which we upgraded to 184.108.40.206, but has to be at least 9.2). It was a disappointing tool as it required workarounds but did do the job in the end, and with practice is actually quicker than the number of steps listed below imply. The books being converted by the approach below are fiction English novels with only a couple of images. The aim was to put in as little html as is needed, and as far as possible use the default settings in the ereader, so that any settings the ereader lets the user update and control will be reflected in your book.
- start with as clean a book text as you can get to
- go through the layouts and see how many different paragraph and format types you need. We had the following: – heading page – title text – heading page – subtitle text – heading page – author byline – heading page – company title – title verso page – text – copyright page – text – chapters – heading – chapters – para one (no indent) – chapters – main pages (indent) – chapters – end para (indent, plus space after) – chapters – special text for a letter – figure – for pictures and logos
- give each format a name from the standard QuarkXpress layouts allowed in style.css (in the folder \QuarkXpress9\XTensions\DigitalPublishing\Templates\css)- which aren’t many and aren’t changeable within QuarkXpress (however with a workaround they can be changed so that they are used during the export, but not the display). The choices are: – body – byline – figure – figure-caption – figure-credit – indented-para – pullquote – title1 – title2 – chapter-name – headline1 – headline2
- select one of each of these standard formats and assign one to each of the formats you want in the document
- QuarkXpress style.css also contains text formats which we kept as they were: bold, strikethrough, strikethrough-and-underline, superscript, subscript, superior
- amend the QuarkXpress style.css file to give the format you want for each of the layouts, set the indent, spacing, font size, weight etc – this will only be applied when you export the file to an ePUB – when you look at it in QuarkXpress it will show the standard layouts so it won’t look as you want it to until you export it.
- load your text into QuarkXPress
- split the text into chapters (articles and components), and fix any errors during the load – for instance cutting and pasting your text into the reflow view will lose some formatting like italics so reset those if you have/want them
- load your cover (if you want one, some ebook loaders want the cover separate to the epub file)
- go through all the text and allocate the format you want (that will link to your new format in style.css) to each of the paragraphs, or text you want to control. This is a manual process that requires going through the whole file.
- setup the metadata for the file
- enter the table of contents information, or select the data to be used ofr the table of contents (we use the article names)
- export the file to ePUB – this will use your new style.css
- check this layout in as many ePUB viewers as you can
- get some ePUB reading devices and test it in these too. Calibre (http://calibre-ebook.com/) is a useful tool for loading your manually built ePUB onto many different ereaders
- We found a few problems with the file generated that then needed a further step to fix them: – if underlined text included non-underlined spaces within the text these didn’t export correctly and needed to be fixed manually afterwards by amending the html, – any embedded non-breaking characters didn’t always render correctly depending upon the ereader so we removed them, – if there was a change of font (or anything in a span) followed by punctuation, then that punctuation could wrap around to the next line – we resolved this by moving the punctuation to within the span command – not all ereaders render centered text the same, so we had to use a combination of centered within the style.css format PLUS use a span set to centered for all the text
- the metadata and text wasn’t quite how we wanted it so we used Sigil (https://code.google.com/p/sigil/) to finish the job: – to add more metadata settings – to remove the any position set to absolute (in toc.css) – add html links to any https contained in the text – add descriptions to any image files – remove numbers in the toc.css style for the toc entries – remove ‘style=”padding-left:30px;”‘ which was added into the html for all formats – change the title from h1 to h3 – remove the additional css files (horizontal.css and vertical.css) – add a span format for centered text – in the first text file (Flow_2.html) amend body from <body class=”body”> to <body class=”body” id=”startpos”> – in the content.opf file in the ePUB, add a control for the first text file <reference href=”Text/Flow_2.html” title=”1″ type=”text” /> – find all centered text and add the span command around your text
- depending upon the distributor service you use then it may be necessary to remove the cover image from the file as some like to handle this separately
- next validate the ePUB file – use Sigil’s option for FlightCrew validation to check the file – use Sigil’s options for W3 validation checks – use the online ePubValidator (http://validator.idpf.org/) to check your file format is ok
- amend anything that comes up in the checks. If necessary go back to the QuarkXpress file and amend it there, re-export it and reapply the changes post export
Even though we checked and rechecked the files we still had a final error – which rather annoyingly showed up on one of the ereaders we’d actually used to validate the files which had shown there was no problem. This was because the ePUB then goes through the formatter for the ereader delivery/retail site and they may add/amend the files slightly. This is why the centered text needs more than one approach.
Although we weren’t very impressed with needing to use a workaround to get a useable ePUB file out of QuarkXpress, in the end it was pretty straightforward to do, and with practice relatively quick as well.
You don’t especially need QuarkXpress to generate the ePUB file. Although we went that route as it is an industry standard package, we actually found that the ePUB files generated from the donation based package Calibre were actually also pretty clean (with little spurious html that could be rendered incorrectly by ereaders) – although the files generated did have much larger css files. In fact Calibre got one thing right that QuarkXpress didn’t – embedded non-underlined spaces next to underlined text was correct in Calibre.
Also, if you only have one or two ePUB to convert and you do want to use QuarkXpress (or Adobe InDesign) then have a look at their 30-day trial offers, or have a look at offers on eBay first as they are expensive packages, or both now offer options for their latest packages from the cloud (SaaS) but again this is pretty expensive.
Good luck with your conversions.
absolute position – an html command that gives a specific location for the text/picture article, component – a QuarkXpress terms for subsections of the publication centered – text/pictures that appear in the middle of the page (using American spelling) css – cascading style sheet – a common holding place for the html commands used to control the layout embed – contained with ePUB – electronic publication, a zipped file containing files for each book chapter and layout files that make up an e-book html – command language within <> brackets that controls the layout of webpages, and ePUB files html links – the web page address for the page being referred to h1, h3 – html commands for the main heading and the (normally) smaller heading no 3 metadata – the collection of values that are used to describe the ePUB file, its title, description, ISBN etc SaaS – software as a service – software that is held on the cloud span – an html command that control layout for a small set of text within the commands toc – table of contents
Originally posted on wordpress.com on 17 August 2014 and amended 25 September 2014.
Although we don’t use QuarkXpress to build our ePubs anymore (we now use Jutoh), we’ve migrated this post from wordpress.com because it was the one we referred back to the most. We’ve kept this in case we need the procedure again.