What is Code?

A proposal by Tech-1

Challenge

Tech-1 has been tasked with converting the Bloomberg feature What is Code? into a functioning e-book.

Approach

In considering our approach to the task, we have built a clear picture of who we believe our primary and secondary markets are, and have built our specifications from there.

We envision this text being primarily read and used as reference material by people who are in contact with software developers in a professional setting who wish to build an understanding of the work that is being carried out, as well as the people who are involved with it.  To a lesser extent, we also see this text as an informative overview of code and coding that could be of interest to a more casual observer.  With these two audiences in mind we have chosen the following specifications for the e-book:

Format — Epub

This format is well-established, has broad appeal and opens access to the text across a wide range of devices.  It also allows for colour images (importance discussed below), which we have chosen as more important than the “flashy” features offered by the epub3 format.  We have deliberately chosen to eschew the epub3 as it limits access to a select few devices, which is an important consideration given that our target audiences are not necessarily early-adopters of technology.

File Size — 4MB goal

In researching file sizes for various e-books, we have found varying sources claiming a range of 300KB to 4MB, but Amazon allows file sizes of up to 650MB.  Given the image-heavy nature of the original webpage the raw epub file we will be building on was slow to transfer in our trial run, so we will aim to reduce the file size for convenience.  However, the file currently sits at 5MB and while we intend to delete some of the images, we are interested in keeping many of them so if the trade-off is between file size and a strong visual presence on a reader, we will reassess our goal.

Visual Elements — Strong presence but still-images only and fewer than webpage

Our experience of reading the original text was that the visual elements, whether still, moving or interactive, helped both with comprehension of the concepts being explained and with alleviating some of the work of reading a heavy text.  With this in mind, as well as our chosen file format, we will aim to retain the visual nature of the webpage while removing or altering elements that will not translate directly (either by converting to a still image or removing entirely), as well as some that we do not consider strong supporters of the text.  This choice has also been made in consideration of the time and resource limitations we face.  We will take into account the recommended image sizes for e-book readers, and the original web-destined image files are already small which aids us in achieving our file-size goal.

The changes and deletions of all images will be completed in HTML prior to converting to an epub file format as we have found that the process is simpler and the images are only stored in one location.  The task will be carried out as follows:

  • Create a detailed list of all images present in original webpage
  • Categorise list into ‘delete’, ‘alter’ and ‘keep as is’
  • Remove the ‘delete’ category of images from the HTML document and accompanying folders.
  • Convert the moving images into stills by extracting frames from GIFs or screenshotting where appropriate
  • Insert stills into file in place of their predecessors
  • Convert file as a whole to epub format to assess changes and adapt as necessary

Sidenotes — Transform into endnotes

The group agreed that many of the side-notes throughout the text added texture in both added information and some humour.  With this in mind we would like to retain as many as possible, excluding any corrections, and change them into endnotes linked within the text.  To ensure ease of navigation, we will also include a ‘return button’ to take a reader from an endnote back to the passage of text they were originally reading. This will be carried out as follows:

  • Use regular expressions to differentiate notes from corrections using identifier <a class=“footref correx”>
  • The notes can be similarly identified and moved to an endnotes sections using regular expressions.  

Table of Contents

We are keen to include a functioning table of contents for two reasons; firstly, that the text may be used as reference material, and secondly that it is a long, dense text so we foresee easy navigation as an important tool for a reader. This will be carried out as follows:

  • Use Sigil to create header hierarchy metadata within index file to create TOC using the <h1> designation for chapter and <h2> for subsections.

Short Paragraph Length

Again considering readability, we intend to retain the short paragraph lengths of the original text.  The experience of reading dense text on an e-reader can be unpleasant, so the paragraph breaks are another visual aid that we consider to be important.

  • use a RegEx to find

General Styling

We aim to remain true to the integrity of the original webpage with our style and formatting choices in our e-book. The original webpage feels energetic and informal, clear and to the point. Although the webpage incorporates creative illustrative/multimedia elements, the text itself is simply styled, mirroring the straightforward writing. In our e-book, we’ve chosen to use Helvetica, a clear, no nonsense font with a modern sensibility, for our headings (size 14) and subheadings (size 12). We’re pairing this with 11 pt Palatino for the body of the text–a standard e-book combination. We will align headings and subheadings centre page and set them alongside section and chapter numbers. (Currently, the numbers are set one line above the headings themselves. This leaves a lot of blank space, which looks odd on most e-readers.)

There are a few hyperlinks in the text, mostly linking to sources. We will find these links using RegEx (perhaps by searching for “href” or “http”?) and add the sources to the end along with the other endnotes. We will also use a regular expression to remove the numerous Twitter logos that are currently peppered throughout the ebook (search for: ‘<img src=”images/twitter-blk.png”’ and replace with nothing).

We will centre all images and format their captions as Helvetica-Narrow, 8 point font. To centre the images, we will need to use a regular expression to add a <div align=”middle”> tag to all the images. We’ve tried out a few variations on how to do this most efficiently, and so far, our best RegEx is as follows:

FIND <img src=(.+)>    and     REPLACE WITH <div align=”middle”> <img src=$1> </div>

Unfortunately, this expression centres most, but not all images in the doc. Further finessing of the expression and/or finetuning after doing the find & replace will be required for our final product. But this is a start, at least. Also, we’ve discovered that it’s not possible to use regular expressions when editing in Calibre, because most of the complex expressions aren’t searchable. For editing the bulk of the HTML (including the styling of the images) we’ll need to use Notepad++ (or the Mac equivalent) and then preview it in a browser.

We will make all other style changes using CSS in Calibre, so that we can preview how changes to the stylesheet affect the ebook as a whole. Adjusting the formatting with CSS only (and not HTML) should save time and ensure that formatting is uniform throughout the book. Alice will work independently on the stylesheet while the rest of the team tackles the HTML. After the HTML has been perfected, she will test her styling with the updated book to ensure that her formatting still works after these updates. We’ve allotted time to reconcile any unwanted style changes that occur in the schedule (below).

Delegation of Tasks

Roles:

Erik — Chief Information Officer
  • First pass at text (removal of corrections and other excluded elements)
  • Creation of table of contents
  • Deletion and replacement of images as appropriate
Roshane — Marketing Director
  • Creation of marketing and distribution plan
  • CIO adviser and coding support
Zoe — Creative Director
  • Breakdown of existing visual elements (with support from Ali)
  • Transformation of moving and interactive elements into still images
  • Collaboration with CIO on images in final product
Alice — Style.
  • Creation and monitoring of CSS document
  • Formatting and readability of final product
Ali — COmmunications Director
  • Working document control and distribution
  • Report creation and editing
  • To liaise with Marketing and Creative Directors
Lynn — Designer
  • Visual styling of documents (including report and presentation slides)
  • Creation of book cover
  • Support to Alice

Predicted Challenges

Challenges are likely to arise as we work through each stage.  At this time we foresee the following as potential issues that we will need to negotiate:

  • Limited time
  • Limited experience in some areas
  • Competing priorities for all group members, schedule coordination
  • Social loafing and/or over-work of some members
  • Needing to accept limitations of project and possibility of not achieving our original goals

Existing Resources

Our current resources stand as follows:

  • Existing knowledge of group members in various areas
  • Online pool of coding resources (created by PUB607 class)
  • Other members of PUB607 class
  • Juan
  • Book cover designer (through Lynn)

Timeline

Project will be completed and delivered according the following schedule:

Gantt Chart