Just how big IS a terabyte?

December 21, 2016

Being an old guy in computing terms, I still remember the occasion when one of my colleagues told me that they’d bought a new PC with a gigabyte hard drive.  That’s 1 GB.  1,000 MB.  It was, I believe, back in 1994.  They’d got themselves a new home machine with a 60 MHz Pentium (built on an 800 nm process) and a 1 GB hard disk.  How they gloated.  How envious was I, stuck with my 500 MB disk and a 486DX2-66 processor?

These days, we have 512 GB SD cards, which are merely the size of a postage stamp, available for only GBP 314.  A terabyte version has recently been announced by SanDisk.

The size of today’s storage devices has led to people just not appreciating just how vast those stores are!

Here comes the science part: An SI terabyte, which is a usual measure of persistent storage, is 1,000,000,000,000 (1012) bytes.  Each byte is 8 bits, and can encode a single character in ASCII or EBCDIC schemes – let’s ignore internationalisation and multi-byte Unicode for now as that just complicates things.

Note that this is distinct to a CS tebibyte which is 1,099,511,627,776 (240) bytes and is used as a more usual measurement for memory.

So, to put these numbers into some perspective, just how big IS a terabyte?

It’s quite common to talk about it in terms of hours of music or video but that is really kind of misleading… sure you can imagine a stack of 213 Blu-ray movie disks, two and half years of continual MP3 music or nearly 2 weeks of worldwide tweets but what does that actually mean in more visceral terms?

Let’s work it out in terms of space and time using, whilst they still exist as physical artefacts, traditional paperback books.  For this example, let’s use Tolstoy’s War and Peace to generate the numbers. We’ll do time first.

War and Peace has a word count of 587,285.  It’ll depend on the particular translation, I guess, but let’s use that as a working number for now.

The average typing speed for a typical person is around 40 words per minute (but obviously more for a professional typist).  That means that it would take someone, maybe you, around 245 hours to type a copy of War and Peace.  We can convert time into money too – the minimum wage is  around GBP 7 per hour, so the opportunity cost of the time taken for that typing would be around GBP 1,715.  That represents about 6 weeks of work effort.

The English language has around 4.7 letters per word on average but don’t forget the spaces and punctuation too – so make it 5.7 characters per word; 5.7 x 587,295 = 3,400,380 bytes.  A Project Gutenberg copy of War and Peace is 3,359,550 bytes so that’s quite a good correlation.  Let’s just use 3.4 MB to make the numbers easier.

A terabyte is 1,000,000 MB so it could hold 294,118 copies of War and Peace.  As calculated above for a single copy and multiplied up, that represents around  72,058,910 hours of human effort and so a labour value of about GBP 504,412,370.  (You can see why photocopiers took off…)

Those 294,118 copies would hold about 368,529,854 pages (see below), each with maybe 476 words on it on average.  Every adult in the UK could have 7 pages dedicated to them.

A terabyte is worth over 500 million pounds sterling of minimum wage workers’ typing effort and, at 8 hours a day for 260 days a year, would take 34,644 years to complete.

Best get some help to finish that before lunch.

The adult population of the UK is about 52 million so, if they all mucked in, they should get the job done in just under an hour and a half by typing some 3,325 words each, which just so happens to be around 7 pages of content. How convenient.

It’s also worth noting that, it would take 34.6 years of work for that average typist to fill a single GB, which holds 294 copies of the book.  This is why, when everything was held as plain ASCII text files, even 500 MB felt cavernous to most people!

People read, on average, at about 250 words per minute so it’d take around 5.6 years to just read a GB of text, and 5,623 years to read a TB. (Assuming a 9-5 job, 5 days a week.)

So that’s time, and by extrapolation, money.  Now for space.

Copies of War and Peace differ in physical size and page count depending on extraneous supporting content and typeface used.  Here are three off of Amazon, we’ll take an average of them as our working numbers.

ISBN-13 Pages Dimensions Volume
978-1853260629 1,024 12.2 x 5.6 x 19.8 cm 1,352.74 cm3
978-0140447934 1,440 13.0 x 6.1 x 19.8 cm 1,570.14 cm3
978-0099512246 1296 13.1 x 5.5 x 21.6 cm 1,556.28 cm3
 Averaged 1,253 12.8 x 5.7 x 20.4 cm 1,493.19 cm3

We’ll just assume that the extraneous text is not material to the argument and stick with the 3.4 MB data per edition.

So the printed information density of a War and Peace paperback book is around 2,277 bytes per cubic centimetre, or 2.28 KB/cm3.  This equates to about 2.28 GB/m3.  (You can see why electronic storage is so popular…).

There are 1,000 GB in 1 TB, so a TB in such a printed form would take up around 439 cubic metres and neatly contain those 294,118 copies of War and Peace.  That’s a cube 7.6 metres (or about 25 feet) on each side.

It’s still a little abstract so let’s make it more concrete.

A community swimming pool is generally about 25m long and 13m wide. It obviously various depending on the number of lanes but this is roughly what you’d expect.  Depth varies too but, for this, imagine a 2m depth throughout – no shallow end or diving area!

That means that a terabyte of books would fill such a community pool to a depth of 1.35 metres or nearly 4.5 feet.  Each pool could hold the equivalent of around 1.5 TB in printed form.

A 6 TB disk, now commonly available for under GBP 200, holds enough data that, if printed in a paperback novel form, would fill 4 standard swimming pools, or a 50m Olympic pool to a depth of 2.1 metres.

Hopefully that gives you a feel for just how much data a terabyte, or even a mere gigabyte, actually is, when it represents textual information rather than high-definition video or music content.

As an aside (and this is highly speculative and could be utterly wrong in so many, many ways), just how small a volume can a TB be stored in? Well, it’s looking like a holographic universe (if that’s an actual thing) stores one bit in an area one Planck length per side, or 2.56E-70 m2.  If we want to store 8E12 bits we’d need 20.48E-58 mor a sphere at least 2.55E-29 metres in diameter surrounding the information.  That’s the size of the black hole formed by a mass of 8.9 g packed into a singularity, or something like that anyway. Of course, we’d likely also need sufficient information to be held to fully describe the substrate containing the data which would push out the size very substantially.  As the man said, “there’s plenty of room at the bottom“.

Rolling the Groovy Dice

January 6, 2013

Bit of a throw away post but it might save someone 5 minutes…  Nothing to do with Project Euler

I needed to have a quick way of rolling an arbitrary number of polyhedral dice including the ability to take a number of the highest from all those rolled.

I did this in Groovy by adding an overloaded roll() instance method to Integer along with a helper isPolyhedralDie() method (but only to catch me making some sort of basic typo in the calling code).

Integer.metaClass.isPolyhedralDie() {
  (delegate as Integer) in [ 3, 4, 6, 8, 10, 12, 20, 100 ]
}

Integer.metaClass.roll() {
   delegate.roll(1)
}

Integer.metaClass.roll() { dice, best = dice ->;
   assert delegate.isPolyhedralDie() && (best > 0) && (dice >= best)
   def ( sides, rnd ) = [ delegate, new Random() ]
   (1..dice).collect { rnd.nextInt(sides) + 1 }
            .sort().reverse()[0..<best].sum()
}

// Generate 4x 3/4d6 character attribute sets

4.times {
  def r = (1..6).collect { 6.roll(4,3) }
  r.countBy { it }.sort { a, b -> b.key <=> a.key  }
   .each { k, v -> print "${v}x $k, " }
  println "Sum = ${r.sum()}"
}

Usage:

Integer#roll()                   // 6.roll() = roll 1d6
Integer#roll(noOfDice)           // 6.roll(3) = roll 3d6
Integer#roll(noOfDice, noOfBest) // 6.roll(4,3) = roll 4d6 take 3

A 360° View of 3D

September 21, 2011

The time has come at last to splash out on one of the new fangled plasma or LED TV sets.

However, my ancient 28" Panasonic QuintrixF box with the excellent Tau flat-screen, having done 10 years of faithful service (albeit with a couple of repairs during that time), has started to show tell-tale signs of impending failure – the Dolby Surround circuitry lost the centre channel a year ago and the picture does an occasional wobble at the top.

I would get it repaired but I’ve a feeling that finding parts might be problematic now and, besides, there are some excellent new TVs on the market (or so I’m told) that should do the job AND they’re 3D capable too.

I’ve put off buying a new TV over the last few years, even when true 1080p HD came out as the picture quality on these sets, even with all their HD-ness, just wasn’t up to the QuintrixF Tau screen. When it came to watching standard definition (SD) video on them, they couldn’t hold a candle to my Panasonic CRT TV.

But it’s time to look around.  I’m a bit of sceptic when it comes to the marketing-hype of consumer electronics and like to see the products in real-life use.

The up-market screens from pretty much all the major manufacturers are very capable now, though I’d argue that their HD picture quality is no better to my current TV, and are much improved in up-scaling SD signals so as to avoid that apparently blurry image especially prevalent on low-bitrate channels.

Buying such an up-market screen generally means getting a 3D-capable set as the manufacturers pair their best panels with the best electronics.  I imagine that this seriously over-weights the statistics of people "buying into" 3D technology over the last year so.

With this in mind, I’ve spent the last few months pestering friends and relatives who did take the plunge last Christmas and bought a 3D capable set back then.  So now, having now watched quite a bit of 3D footage, courtesy of their hospitality, I’ve made some observations of the medium.

Apart from the technical issues, such as faded colours, image cross-talk, the physical restrictions of passive 3D glasses and the costs of active 3D glasses that people often talk about, there are four perceptual problems of 3D that I’ve noticed.  This is just my opinion, but it’s my blog, so here they are:

Lack of Near-Field Parallax
I’m not sure if this is what it is really called but it seems quite descriptive of the effect.  When an object is in the foreground and is apparently close to the viewer, if that viewer moves their head they’d expect the object to move in relation to the background, revealing new areas and obscuring others.  Obviously as the picture is actually flat and only presents a static 3D view this doesn’t happen and spoils the immersive experience.

This isn’t such a problem with more distant objects as a viewer expects much less parallax to occur between such an object and the background.  The best way around this seems to sit still and watch TV as if you’re in the cinema – maybe some people do this but, personally, I don’t.

Assuming an Infinite Depth of Field
In standard 2D content the director typically denotes the main subject of the scene by focusing on them with a shallow depth of field. The background appears deliberately out of focus but that OK as it’s pretty much how our eyes work.

In 3D content there seems to be a preference to demonstrate its 3D nature by having an infinite depth of field with everything in focus, even when a "special" 3D effect is used, as is the wont in the traditional realm of 3D horror films.  This practice makes the next two effects seem worse than they might otherwise be.

Billboarding
I don’t believe that this is a correct term but it’s quite descriptive of the apparent effect.  In short, current 3D technologies don’t seem to capable of providing a completely smooth depth transition between objects in a scene.  I don’t know if this is a feature of the cameras or the TVs.

This shortcoming leads to "billboarding" – in that it appears that a 2D picture of a 3D scene has been pasted onto a billboard which is then set at a distance into the scene. This technique used in video game development to reduce the computational complexity in rendering.

The scene is seen to be composed of a set of these "planes" rather than being smoothly graduated.  In worst cases, this effect is even seen on single objects. For example, on one demo the viewer was taken across the Pont Sant’Angelo in Rome (which I recommend visiting sometime) and there were close ups on the 17C angelic statues.

Unfortunately, although the picture was excellent, the statues appeared dislocated. Where a hand was pointing “out” of the screen it was disjointed from the forearm, which was on another "plane", and this extended to the shoulder, wings, etc. It was as if it was a Channel 4 ident with pieces of stone suspended on invisible wires and visible as a solid body only when viewed from a specific angle.

Focal Distance Adjustment
This seems to sort of related to the "Infinite Depth of Field" issue I’ve mentioned above.  In order to "believe" a 3D scene, it seems that my eyes have to be fooled into assuming that the objects are a certain distance away.  When the scene shifts whereby the "focus" is now apparently at a different distance but remains in focus without my eyes having to re-adjust, it seems wrong. At best this just destroys the illusion of 3D until I "lock in" again but, at worst, it makes me feel a little seasick!

So what do I believe this means for 3D TV?

The technology will continue to improve so the technical concerns will probably be consigned to history within the next few generations of the systems themselves.

The limitations of physical equipment will be solved as, I imagine, should the "billboarding" effect described above – unless this is actually some form of physiological limitation of human cognitive processing.

The issue with lack of near field parallax will probably drive the adoption of VERY large 60”+ screens that can be placed farther away from the viewer so that the problem is just less pronounced.  Unless the TV can generate a private 3D image for each viewer and use some form of individual eye tracking I don’t see how else this might be tackled.

The other issues are really a matter of direction style.  The modern way of shooting 2D TV seems to to use close in cameras with rapidly changing viewpoints in order to engage the viewer and make them feel like they’re actually part of the action.  What works well for 2D simply doesn’t work for 3D.  The rapidly changing perspective just leads to disorientation and destroys the immersive 3D experience.

I believe that, for 3D, viewers need to be treated more like a theatre audience as passive onlookers onto a scene.  Viewpoints need to be established and held in order for the audience to “lock in” to the scene’s perspective.  This means that the 2D version of a film won’t just be a single-eye image of a 3D film but, for a large part, a differently shot piece of work.  Obviously this will push up production costs.  I don’t like to think that we’ll be seeing 2D movies going the same way as black and white movies and only being shot as art nouveau retro pieces.

In short, 3D holds a lot of promise, especially in the gaming market where the scene is being generated on the fly, but for general viewing I’m really not convinced that there’s a great need at the moment (sport may be an exception though) until the content production industry works out how to handle the artistic differences between 2D and 3D in order to get the most from both mediums.  There is some fantastic 3D content out there but not enough to warrant buying a 3D set specifically to see it.

I was interested to see that JetBrains, creators of the splendid IntelliJ IDEA IDE amongst other things, are working on their own “better-Java-than-Java” language, Kotlin.

Ostensibly this is to resolve some of the “issues” that the Java language has, whilst being simpler than the currently strongest competitor, Scala.  Kotlin is going to be supported as a first-class citizen in the IDEA IDE from around the end of 2011.

There’s a lot of talk on various fora about the need (or lack of) for introducing a new language when others like Scala, Groovy and Clojure are becoming more established.  This is similar to the fuss back in April 2011 when Gavin King “leaked” the news that Red Hat were working on their “Ceylon” JVM-hosted language.  The public launch of Ceylon is likely to be in the same sort of timescale.

Most of the chatter seems to be about the benefits around the features of the languages, semantic overhead of learning, efficiency of execution, dilution of Java skillsets, etc.  All are interesting and sound discussions.

What I’m not seeing is anyone saying about the pure commercial value of a company such as JetBrains or RedHat launching their languages focusing on their their own ecosystems.  I don’t believe these guys aren’t doing it for the love of the Java community. They’re doing it for cold, hard cash… any why not?

It’s interesting that JetBrains are saying that Kotlin is intended for “industrial use”.  By this I’m expecting the real value-add features (which must be compelling) to be supported in their paid-for “Ultimate” edition of IDEA rather than the free “Community” version and they’re targeting the “enterprise” market.

They would need to convince “corporate” development managers of the benefits of the Kotlin, disabuse them of the risks of selecting a single vendor language  and get it embedded into their companies order to monetise the investment in developing it.

That’s not to say they’re wrong to try – look at how successful VB6 was and that’s a single company language supported by a strong IDE!

They’ll have to be quick though.

Java 7 has just been released and Java 8 is due in late 2012.  Whilst J7 addresses quite a few “niggles” with Java (including reducing some noise around Generics and the equivalent of C#‘s “using” statement)  and add some good features into the JVM it’s not the complete “Project Coin” package.

The “big thing” being sold in all these new languages are Lambdas/Closures which won’t come until J8.  (You can see the split of features here.) This means that they’ll have 12 months to get some traction before the “official” Java language supports this.

Personally, I don’t believe that having closures in an OO imperative language is that important.  Useful, of course, but not the end of world if they’re lacking.  There didn’t seem to be much demand for them in Java before Ruby on Rails became popular in 2006 and there was an outbreak of scripting-envy across the Java community.

Of course, with the JVM bytecode instruction InvokeDynamic being added in Java 7 it means that dynamic language such as JRuby should become much more performant so, if you really want closures on the JVM why not just use that?

I think it’s disappointing that the official Java/JVM ecosystem has been on a bit of a hiatus for the last 5 years (Java 6 was released in 2006) but it’s kind of understandable given the situation with Sun and Oracle.  It’s also disappointing that Coin was split across two releases as this will delay adoption too and, in the Java world, allow third-parties to fill the perceived gap with semi-official solutions in the interim.

It will be interesting to see how this all plays out.  I do hope it doesn’t distract JetBrain or RedHat/JBoss from continuting to provide world-class platform support for the official Java stack on their products.

The Tau of Pi

April 14, 2011

I came across this interesting article – The Tau Manifesto.

It’s proposed that the stalwart constant of geometry – Pi (π) – be replaced by a new, more natural constant – Tau (τ) – which is simply Circumference/Radius and is so equivalent to 2π. They make quite a convincing argument as to why this should be the case.

Those multiples of 2π in equations always made me twitchy… and now I realise why!

IE9 previewed on BBC.co.uk

September 15, 2010

Great. Now Internet Explorer can consume 100% of my PC’s resources. Like it doesn’t do that already…

Hello world!

July 24, 2009

Welcome to my blog!

This blog is going to be about computing topics and what I think about them.

Over the last 9 months I’ve been asked repeatedly about my opinions on various things. Like do I prefer Agile over Waterfall as a development methodology? What do I think about dynamic languages in an “enterprise” environment? How do I think Cloud Computing is going to change the game?

I figured that I haven’t really been following the true DRY (Don’t Repeat Yourself) principle by, well, repeating myself!

I’ll try and capture my thoughts here and so make them generally available. That way, even people that haven’t asked the questions will get to know what I’d answer if they’re sufficiently curious.