A quick primer on floppy disks

I've always been fascinated by floppy disks from the crazy stories of Steve Wozniak designing the Disk II controller using a handful of logic chips and carefully-timed software to the amazing tricks to create - and break - copy protection recently popularised by 4am.

I'm going to be writing a few articles about data preservation and copy protection but first we need a short primer.

Media sizes

There were all sorts of attempts at creating sizes but these were the major players:

  • 8" - The grand-daddy of them all but not used on home PC's. They did look cool with the IMSAI 8080 in Wargames tho.
  • 5.25" - 1976 saw the 5.25" disk format appear from Shugart Associates soon to be adopted by the BBC Micro and IBM PC with 360KB being the usual capacity for double-sided disks.
  • 3" - Hitachi developed the 3" drive which saw some 3rd-party solutions before being adopted by the Oric and Tatung Einstein. Matsushita licenced it's simpler cheaper version of the drive to Amstrad where it saw use on the CPC, PCW and Spectrum +3.
  • 3.5" - Sony developed this around 1982 and it was quickly adopted by the PC, Mac, Amiga, Atari ST and many third-party add-ons.

Sides

Almost all disks are double-sided but many drives are single-sided to reduce manufacturing costs. In the case of 5.25" disks some were made with the ability to be flipped or kits to turn them into flippable disks. The 3" disk had this built in. Double-sided drives write to both using two heads while single-sided drives just require you flip them.

An interesting artifact of this is while you could read single-sided disks in a double-sided drive by flipping it as usual a single-sided drive has challenges with disks written by a double-sided drive.

While they can be physically read the data is effectively backwards due to the head underneath seeing the drive rotate in the opposite direction. Flux-level imagers can read these and theoretically invert the image to compensate. Computers back in the day had little chance. There is also the complication that most formats interleave the data between the heads for read speed rather than writing one side then the other. Long answer short: Read double-sided disks on a double-sided drive.

Tracks

Each side of the disk has the surface broken down into a number of rings known as tracks that start at track 0 on the outside and work their way in. 40 tracks is typical in earlier lower-density media and 80 in higher density depending on both the drive itself and the designation of the media.

Some disks provided an extra hole for the drive itself to be able to identity if it was high density or not while others like Amstrad's 3" media simply had a different colored label while the media itself was identical.

Some custom formats and copy protection systems pushed this number up to 41 or 42 tracks so it's always worth imaging at least one extra track to make sure it's unformatted and you're not losing anything. Additionally some machines like the C64 used fewer tracks - 35 -

Sectors

Finally we have sectors which are segments of a track. Typically a disk will have 9 or 10 sectors all the same size but some machines have more or less. Each sector is typically a power of 2 in length - 128 bytes through 1024 bytes (1KB) is typical although some copy protection pushes this higher. Each sector has an ID number and while they might be numbered sequentially they are often written out-of-order to improve speed where the host machine can't process the read before the next sector whizzes by. By writing the sector out of order we can optimize them at least for the standard DOS/OS that will be processing them in a technique called interleaving.

Encoding

Floppy disks themselves can store only magnetic charges that are either on or off. You might imaging the computer would map binary 1's to a magnetic charge and a 0 to no charge but this immediately causes problems:

  • Timing drives rotate the disk at slightly different speeds and too infrequent changes in the data will mean we loose sync
  • Strong bits too many on-bits together will cause a strong magnetic charge that will leak over to neighbouring areas
  • Weak bits too many off-bits together will leave such a weak magnetic charge we will pick up background noise

In order to avoid these problems encoding schemes map the computers binary 1's and 0's into on-disk sequences. Two simpler-to-explain ones include:

  • FM - Stores 0 as 10 and 1 as 11 on the disk which gives giving 50% efficiency
  • GCR - Stores a nibble (4 bits) as one of the 16 approved 5-bit sequences on-disk giving 80% efficiency

Other schemes use different tables or invert bit sequences (in the case of MFM which is the most popular) to ensure that every flux transition is wider apart meaning you can actually write the data at twice the density and still be within the tolerances of the disk head's ability to spot transitions.

Controller

Off-the-shelf controller chips added a cost to a disk system and so some systems - notably the Apple ][ and Amiga - performed it using their own custom logic and software. This gave way to some interesting disk formats and incredible copy-protection mechanisms.

Meanwhile companies like Western Digital and NEC produced dedicated floppy controller chips such as the WD1770 (BBC Master), WD1772 (Atari ST), WD 1793 (Beta Disk), VL1772 (Disciple/+D) and the NEC 765A (Spectrum, Amstrad) which trade that flexibility for some simplicity of integration.

Finally there were general-purpose processors which were repurposed for controlling the floppy such as the Intel 8271 (BBC Micro) or MOS 6502 (inside the Commodore 64's 1541 drive).

Copy protection

Many people think computers are all digital and so the only way to protect information is via encryption and obfuscation. While both techniques are used in copy protection the floppy disks themselves existing in our analogue world are open to all sorts of tricks to make things harder to copy from exploiting weak bits to creating tracks so long they wrap back onto themselves etc. Check out Poc || GTFO issue 0x10 for some of the crazy techniques on the Apple ][.

Adding reading time to Nuxt3 content

I've been using Nuxt2 quite a bit for my sites (including this one) and am now starting to use Nuxt3 for a few new ones and am finding the docs lacking in many places or confusing in others so hope to post a few more tips in the coming weeks.

Today we have the "reading time" popularised by sites like Medium.

Nuxt3

In Nuxt3 if we are using the content module then create a new file called /server/plugins/reading-time.ts with the following contents:

import { visit } from 'unist-util-visit'

const wpm = 225

export default defineNitroPlugin((nitroApp) => {
  nitroApp.hooks.hook('content:file:afterParse', (file) => {
    if (file._id.endsWith('.md') && !file.wordCount) {
      file.wordCount = 0
      visit(file.body, (n: any) => n.type === 'text', (node) => {
        file.wordCount += node.value.trim().split(/\s+/).length
      })
      file.minutes = Math.ceil(file.wordCount / wpm)
    }
  })
})

This is a little more convoluted than in Nuxt2 because there we were able to look at the plain text and set a property before parsing (beforeParse), however in Nuxt3 those properties do not persist all the way through to the page.

Nuxt2

For completeness in case you are still using Nuxt2 the equivalent there was to modify nuxt.config.js to add this block:

export default {
  // ...
  hooks: {
    "content:file:beforeInsert": document => {
      if (document.extension === ".md" && document.minutes === undefined) {
        document.minutes = readingTime(document.text);
      }
      document.category = getCategory(document);
    }
  },
  // ...
}

Then place this function at the end of the file:

export const readingTime = (text) => {
  const wpm = 225;
  const words = text.trim().split(/\s+/).length;
  return Math.ceil(words / wpm);
}

Displaying it

Now, anywhere on your page where you are displaying content from those markdown files you can simply do:

<p v-if="post.minutes">{{ post.minutes}} minutes</p>

I've chosen a reading speed of 225 words-per-minute but obviously reading speed is highly subjective.

One flexible alternative would be to record the number of words an article has and then in the front-end divide it by a user-configurable value. While it's unlikely worth the effort on a small site where somebody is hitting a single article if you're putting out a lot of related interesting content a viewer might peruse through it could be nice.

Estimating JSON size

I've been working on a system that heavily uses message queuing (RabbitMQ via MassTransit specifically) and occasionally the system needs to deal with large object graphs that need to be processed different - either broken into smaller pieces of work or serialized to an external source and a pointer put into the message instead.

The first idea was to serialize all messages to a MemoryStream but unfortunately this has some limitations, specifically:

a. For smaller messages the entire stream is duplicated due to the way the MassTransit interface works b. For larger messages we waste a lot of memory

For short-lived HTTP requests this is generally not a problem but for long-running message queue processing systems we want to be a bit more careful with GC pressures.

Which led me to two possible solutions:

LengthOnlyStream

This method is 100% accurate taking into consideration JSON attributes etc. and yet only requires a few bytes of memory.

Basically it involves a new Stream class that does not record what is written merely the length.

Pros & cons

  • + 100% accurate
  • + Can work with non-JSON serialization too
  • - Still goes through the whole serialization process

Source

internal class LengthOnlyStream : Stream {
    long length;

    public override bool CanRead => false;
    public override bool CanSeek => false;
    public override bool CanWrite => true;
    public override long Length => length;
    public override long Position { get; set; }
    public override void Flush() { }
    public override int Read(byte[] buffer, int offset, int count) => throw new NotImplementedException();
    public void Reset() => length = 0;
    public override long Seek(long offset, SeekOrigin origin) => 0;
    public override void SetLength(long value) => length = value;
    public override void Write(byte[] buffer, int offset, int count) => length += count - offset;
}

Usage

You can now get the actual serialized size with:

using var countStream = new LengthOnlyStream();
JsonSerializer.Serialize(countStream, damien, typeof(Person), options);
var size = countStream.Length;

JsonEstimator

In our particular case we don't need to be 100% accurate and instead would like to minimize the amount of work done as the trade-off.

This is where JsonEstimator comes into play:

Pros & cons

  • - Not 100% accurate (ignores Json attributes)
  • + Fast and efficient

Source

public static class JsonEstimator
{
    public static long Estimate(object obj, bool includeNulls) {
        if (obj is null) return 4;
        if (obj is Byte || obj is SByte) return 1;
        if (obj is Char) return 3;
        if (obj is Boolean b) return b ? 4 : 5;
        if (obj is Guid) return 38;
        if (obj is DateTime || obj is DateTimeOffset) return 35;
        if (obj is Int16 i16) return i16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int32 i32) return i32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Int64 i64) return i64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt16 u16) return u16.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt32 u32) return u32.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is UInt64 u64) return u64.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is String s) return s.Length + 2;
        if (obj is Decimal dec) {
            var left = (long)Math.Floor(dec % 10);
            var right = BitConverter.GetBytes(decimal.GetBits(dec)[3])[2];
            return right == 0 ? left : left + right + 1;
        }
        if (obj is Double dou) return dou.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is Single sin) return sin.ToString(CultureInfo.InvariantCulture).Length;
        if (obj is IDictionary dict) return EstimateDictionary(dict, includeNulls);
        if (obj is IEnumerable enumerable) return EstimateEnumerable(enumerable, includeNulls);

        return EstimateObject(obj, includeNulls);
    }

    static long EstimateEnumerable(IEnumerable enumerable, bool includeNulls) {
        long size = 0;
        foreach (var item in enumerable)
            size += Estimate(item, includeNulls) + 1; // ,
        return size > 0 ? size + 1 : 2;
    }

    static readonly BindingFlags publicInstance = BindingFlags.Instance | BindingFlags.Public;

    static long EstimateDictionary(IDictionary dictionary, bool includeNulls) {
        long size = 2; // { }
        bool wasFirst = true;
        foreach (var key in dictionary.Keys) {
            var value = dictionary[key];
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += Estimate(key, includeNulls) + 1 + Estimate(value, includeNulls); // :,
            }
        }
        return size;
    }

    static long EstimateObject(object obj, bool includeNulls) {
        long size = 2;
        bool wasFirst = true;
        var type = obj.GetType();
        var properties = type.GetProperties(publicInstance);
        foreach (var property in properties) {
            if (property.CanRead && property.CanWrite) {
                var value = property.GetValue(obj);
                if (includeNulls || value != null) {
                    if (!wasFirst)
                        size++;
                    else
                        wasFirst = false;
                    size += property.Name.Length + 3 + Estimate(value, includeNulls);
                }
            }
        }

        var fields = type.GetFields(publicInstance);
        foreach (var field in fields) {
            var value = field.GetValue(obj);
            if (includeNulls || value != null) {
                if (!wasFirst)
                    size++;
                else
                    wasFirst = false;
                size += field.Name.Length + 3 + Estimate(value, includeNulls);
            }
        }
        return size;
    }
}

Usage

var size = JsonSizeCalculator.Estimate(damien, true);

Wrap-up

I also had the chance to play with Benchmark.NET and also to look at various optimizations of the estimator (using Stack<T> instead of a recursion, reducing foreach allocations and a simpler switch that did not involve pattern matching) but none of them yielded positive results on my test set - I likely need a bigger set of objects to see the real benefits.

Regards,

[)amien

Using variable web fonts for speed

Webfonts are now ubiquitous across the web to the point where most of the big players even have their own typefaces and the web looks a lot better for it.

Unfortunately the problem still exists that either the browser has to wait before it draws anything while it is getting the font or it renders without the font then re-renders the page again once the font is available. Neither is a great solution and Google's PageSpeed will hit you for either so what is an enterprising web developer to do?

I was tasked with exactly this problem while redesigning my wife's web site - MKG Marketing Inc. Having been brought in right at the start while design and technologies were still on the table gave me some control that we might be able to mitigate this quite well.

What we need to do is reduce the amount of time it takes to get this font on the screen and then decide whether we want to go with the block or swap approach.

Variable fonts are a boon

The first opportunity here was to go with a good variable font.

Variable fonts - unlike traditional static fonts - have a number of axes which you can think of as sliders in which to push and pull the design.

The most obvious one is how light or bold a font is - rather than just say a light (200), a normal (400) and a bold (700) the browser can make a single font file perform all these roles and anything inbetween. Another one might be how thin or wide a font is - eliminating the need for separate condensed or expanded fonts - another might be how slanted or italic the design is or even whether a font is a serif or a sans-serif... and plenty of small serifs inbetween.

This means instead of having to load 3 or more web fonts we need just one which will decrease our page load time.

Getting a variable font

Google Fonts helpfully has an option to "Show only variable fonts" however it is important to note that while Google can show you them.. it is not capable of serving you the actual variable font to your site! Instead it will use the variable font to make you a set of static fonts to serve up which kind of defeats a lot of the point in using them.

Anyway, feel free to browse their site for a variable font that looks great on your site. Even use their CSS temporary to see what it would look like. Once you're happy it's time to head into the font information to see who created the font and where you can get the actual variable font file from.

For example, the rather nice Readex Pro says in its about section to head over to their GitHub repository to contribute.

Unlike most open source code project binaries for fonts often get checked in as users don't typically have the (sometimes commercial such as Glyphs or FontLab) tool-chain to build them. This is good for us as inside the /fonts/variable/ folder we see two .ttf files. You will note that the filename includes [wght] on one which means it contains the weight axes. The second in this case is [HEXP,wght] which in this case is for "hyper-expansion". Choose the one that has just the axes you need.

You can also check the download link from Google Fonts - that often has a full variable font they are using. It will probably contain much more than you want. Alternatively, some fonts such as Recursive even have an online configurator that lets you choose what options you want and provides a ready-to-go fully-subset and compressed woff2 (so you can skip straight to serving the font!)

Preparing the font

Now you could just load that .ttf file up somewhere but there are still some optimizations we can do. We want to shrink this file down as much as possible and there are two parts to that.

Firstly we want to strip out everything we don't need also known as subsetting. In the case of this font for example if I were using it on damieng.com I could strip out the Arabic characters as I do not write anything in Arabic (browsers will fall back to an installed font that does should they need to such as somebody entering something into your contact form).

Subsetting the font

The open source fontTools comes to the rescue here. I installed it in WSL but Linux and other operating systems should be the same:

pip install fonttools brotli zopfli

The subset command has a number of options to choose what to keep. Unfortunately there is no easy language option but you can specify unicode ranges --unicodes= or even text files full of characters --text-file= or simply provide a list of characters with --text=. If you go the unicode range route then Unicodepedia has some good coverage of what you'll need.

For this site we just needed Latin and the Latin-1 supplement, so:

fonttools subset ReadexPro[wght].ttf --unicodes=U+0020-007F,U+00A0-00FF --flavor=woff2

Which produces a ReadexPro[wght].subset.woff2 weighing in at just 16KB (down from 188KB).

Serving the font

Now at this point you could just upload the woff2 file to your CDN and reference it from your CSS.

Your CSS would look like this:

@font-face {
  font-family: "ReadexPro";
  src: url(/fonts/readexpro[wght].subset.woff2) format("woff2");
  font-weight: 100 900;
}

The important bit here is font-weight: 100 900 which lets the browser know that this is a variable font capable of handling all the font weights from 100 to 900. You can do this with other properties too:

AxesCSS propertyNamed exampleNumeric example
[slnt]font-stylenormal obliquenormal oblique 30deg 50deg
[wdth]font-stretchcondensed expanded50% 200%
[wght]font-weightthin heavy100 900

Do not instead create a @font-face per weight all linking to the same url as the browsers are often not smart enough to de-duplicate the requests and you're back to effectively waiting for multiple resources to load.

Now this might be fast enough, you could measure it and test... but this font is only 16KB so there's one more trick we can do.

With the url in the CSS the browser must first request the HTML, then the CSS, then finally the font. If you're smart you'll have them all on the same host so at least there will be no extra DNS lookups slowing things down in there. (Also note that requesting popular files from well-known CDNs will not reuse local copies any more in order to provide better security so it really is worth hosting your own again especially if you have a CDN in front of your site such as Cloudflare, CloudFront etc).

But we can do better!

Base64 the font into the CSS!

By Base64 encoding the font you can drop it directly into the CSS thereby removing a whole other HTTP request/wait! Simply encode your woff2 file using a either an online tool like Base64Encode.org where you click the "click here to select a file", choose your .woff2 then click Encode and download the encoded text file or the command-line tool such as base64, I used wsl again with:

base64 readexpro[wght].subset.woff -w0 | clip.exe

Which stuffs the base64 encoded version into the clipboard ready to be pasted.

Now change your @font-face replacing the url with a data section, like this:

@font-face {
  font-family: "ReadexPro";
  font-display: block;
  font-weight: 100 900;
  src: url(data:application/x-font-woff;charset=utf-8;base64,d09GMgABAAAAAEK0ABQAAAAAgZAAAEJFAAEz
  [truncated for this blog post]AQ==) format("woff2");
}

And hey presto! You can now happily slip that font-display: block in as well as the font is always going to be available at the same time as the CSS :)

Hope that helps!

Regards

Damien