Posts tagged with analytics

Notes on Edward Tufte’s Presenting Data and Information

Photograph of Envisioning InformationHere are my notes from today’s event by renowned statistician Edward Tufte – author of The Visual Display of Quantitative Information and Envisaging Information primarily for my own reference but perhaps of interest to others.

A dramatic start

No announcement, no preamble. The lights went out and a visually striking video showing a representation of music started. Conversations were immediately hushed and devices put away. An effective technique to get attention and signal an absolute start.

Charts and tables

  • Sorting: Find a sort for your data that makes sense. Treat it as another axis and don’t waste it with the alphabet.
  • Sparse columns: Remove sparsely populated columns from tables. Special events should be specially annotated.
  • Linking lines: Always annotate them to describe the interaction. Prefer verbs over nouns as they are a taxonomy.

Information does not fit in a tree. The web is successful because Tim-Berners Lee understood this and made links the interconnectedness between content. “Vague, but exciting”

Data

Content is not clean. Data that shows behavior in a perfect way has likely been manipulated.

Human beings over-detect clusters and conspiracies. They find links between unrelated events especially in sequences (serial correlation). Sports commentators given any series of scores will develop a false narrative to explain it. They’ll find a reason for 7 wins in a row despite random data producing such sequences.

Self-monitoring is a farce because people can’t keep their own score. Once something is measured it becomes a target and will be subsequently gamed and fudged as needed.

You can make many models to fit any data you are given. It may work well for the past and current data but how far it will last is highly variable. This effect is referred to as shrinkage – no model lasts forever.

Big data is not a substitute for traditional data collection and analysis. Google famously thought this when they created Google Flu which tried to spot the spread of flu based on search terms. It has been seriously criticized by Forbes and the New York Times.

Conflict

Do not jump to conflict or character assassination. Your motives are likely no better (or worse).

How many nice comments wiped out a bad one? Ten… a hundred?

There is evil in the world but it probably does not exist in your day-to-day life.

A deck of slides

A deck is inefficient. It is easy for the presenter but hard for the audience who are waiting for something they can use. “A diamond in the swamp” Slow reveals further reduce the information density and people will check-out when it gets low.

Prefer spatially adjacent data (a document) over temporally stacked (slides). The often-cited limit of 7±2 items was for temporal retention so limiting a page to this number of items is actually the opposite of what that research was telling us. We can cope with much more data if it is all on-page together.

Meetings and presentations

Do not be afraid of paper.

Prepare a document in advance but do not send it and instead spend 30 minutes at the start of the meeting reading it in silence (known as a study hall). People can read faster than you can talk as well as go back and forth as needed, skipping what they already know and latecomers are less disruptive.. Amazon is famously using this with its 6-page narrative memo system.

Never go meta in your presentation – stick to the content. Respect your audience and do not presume to know them or you may find yourself pandering or having low expectations. Instead present the data to the best of your ability. Many complicated things are explained to millions of people all the time. You can’t teach if you have low expectations. Negativity and positivity are self-fulfilling.

Does your audience understand and trust you? Credibility is eroded not just by lying but by cherry picking. Evidence of cherry picking includes data too good to be true and hiding the source of the data behind excuses such as copyright, proprietary or others secrets. Why would a conclusion be open when the data needs to be secret? It’s likely a misrepresentation of the data for their own means.

Note a few words when somebody asks you a question to make sure your answer stays on topic. If you don’t know the answer be honest but suggest where you would start looking for the answer. Never heckle or waste time correcting minutiae.

Doctors trip

A trip to the Doctor’s office is a presentation. Write down your list before you go in. Make them listen because they normally interrupt after 22 seconds and consider each item individually. You’ll give up before you reach the end of your list this way and they may not see the connected pattern of the whole.

Documents

Every document needs an abstract. It should spell out as simply as possible:

  1. What the problem is
  2. Who cares
  3. What the solution is

If you can’t write this then you don’t have a document and you’re not saying anything.

Latex

Real scientists use Latex. There are thousands of templates including official ones for well-known journals. Online tools like Overleaf can reduce the barrier to entry. Latex code appears like this:

\title{My presentation matters}
 \begin{document}
 \section*{Introduction}
 Sample of Latex

R is another alternative but it’s considered hard even by people who use Latex.

Reading

We are taught to read to extract facts to pass exams at school. We need to practice reading for enjoyment, reading to spot new information, to extract what we want, to form new opinions and ideas, to loot & hack.

Immediately skip words you don’t understand: there won’t be a test – you’re not at school.

Design

Design does not belong to ‘other people’. Support thinking with analytical design and do whatever it takes to explain the data.

Why do bird books use illustrations? Because the authors want to help you spot the birds and using art they exaggerate the differences as well as produce a generic version of the bird.

Nature magazine has some of the best designed visualizations around. Openness, pride and space constraints all help. (DNA only got 1.5 pages) The New York Times also often produces interesting visualizations of data.

User interface

Use the ideas proven by large successful sites on the web. Do not be swayed by arguments that your users won’t understand. Millions of users already do.

Touch is the next-generation of user interface. It allows the chrome (interface junk) to be jettisoned. No scroll bars, no buttons, no cursor, no zoom. Pure information experiences and this came not from academia, finance or medical but from consumer space.

The future of interface design… is information design. Edward Tufte – Seattle, August 4 2015

The original UI metaphors at Xerox Parc on the Alto were around a single document. Instead we have application-owned silos of data. The elegance was lost because companies want to control the content you create with their tools. They isolate your content so they can profit.

Hierarchies are still used for web design because it mimics the organization paying the bill. They see themselves this way and do not focus on how and what their customers need. Famous examples include the Treasury Department burying tax forms 7 levels deep despite being a top user request and the XKCD strip about University web sites. People on the inside have a skewed perspective of what the outside is.

The density of user interfaces is increasing which allows for richer visualizations especially when combined with animation or video. It is hard to get right.

[)amien

WordPress 2.5 notes from the field

It’s been a few weeks since I upgraded to WordPress 2.5 and whilst the upgrade went well it hasn’t been all plain sailing.

New and changed

Manage files removed

The Manage > Files administration function has been silently removed despite still appearing in the on-line documentation.

If you don’t have FTP access this was the only way to modify your .htaccess and other important files without installing additional software. I can understand such a powerful function could be abused but an option to turn it back on or even a warning about its removal in the upgrade notes/read-me would have been useful.

Hopefully somebody will repackage the old functionality into a plug-in.

New look administration

I’m not sold on the new administration theme look but Dean J Robinson has crafted a plug-in called Fluency that refines the styling.

Generally usability in the administration user interface is better, the write post screen makes a lot more sense and the category & tag management is more logical.

Auto update plug-ins

I love this new feature, it shows you when a new version of a plug-in is available and then lets you automatically update. It isn’t without issue however, specifically it:

Screen-shot of WordPress 2.5's automatic plug-in upgrade feature

  1. overwrites modified local versions without warning
  2. may go wrong (it “upgraded” my wp-PostViews to wp-DBManager)
  3. does not provide a roll-back or revert facility

Still it’s a v1 feature and I’m sure it’ll get some refinement in 2.6.

Compatibility

Live search broken (Redoable theme)

The Ajax-enabled live search is broken in Redoable although the normal search works. There have been no signs of an update in quite despite an un-patched HTML injection vulnerability.

Gregarious

This social bookmarking plug-in is broken in 2.5 and the author is too busy to fix it right now and the search for an alternative led me to Joost and his Sociable plug-in.

I’m not 100% happy with the output from Sociable right now but that can wait until my new theme is finished.

Other tweaks

These weren’t caused by the 2.5 upgrade but once you start messing with one thing there’s always that urge to fix other things you know aren’t right.

Google Analytics

Whilst testing my site I found XHTML compliance was broken. It turns out that Joost’s Google Analytics for WordPress incorrectly rewrites the hyperlinks when they parent another element such as an image. Putting on my PHP hacking hat I managed to fix it by editing the plug-in and changing the last line of ga_parse_link to:

return '<a '.$matches[1].'href="'.$matches[2].'//' .$matches[3].'"'.' '.$coolBit.$matches[4].'>'.$matches[5].'</a>';

Subscriber counts

My feed subscriber counts have been bugging me for a while as they jump up and down seemingly at random. In fact, it can be traced down to two things:

  1. If you base it on less than 5 days of activity you’ll get the weekend dip
    Solution: Count 7 days worth of individual subscribers
  2. Google Reader etc. sometimes stop passing the subscriber count from time to time
    Solution: Use single most recent multi-subscriber header for each service/feed in the last 30 days

If you want to do the same you need to edit the Feed Statistics plug-in and change the how_many_subscribers function to:

$q = "SELECT subscribers, CASE WHEN subscribers = 1 THEN identifier ELSE CONCAT(identifier, feed) END AS ident FROM ".$table_prefix."feed_subscribers  WHERE ((subscribers = 1 AND
	date > '".date("Y-m-d H:i:s", time() - (60 * 60 * 24 * get_option("feed_statistics_expiration_days")))."' ) OR
	(subscribers = 1 AND LOCATE('###',identifier) != 0 AND
	date > '".date("Y-m-d H:i:s", time() - (60 * 60 * 24 * get_option("feed_statistics_expiration_days") * 3))."' ) OR
	(subscribers > 1 AND date > '".date("Y-m-d H:i:s", time() - (60 * 60 * 24 * 30 * 3))."' ) )
	ORDER BY ident ASC, date DESC";

This should help but it’s not ideal with individual readers counted more than once if they change IP and groups counted as one if they are sharing an IP/NAT.

[)amien

Google nuggets

As a frequent user of many of Google’s various services I continually find myself finding new tips and tricks for getting the most out of them. Here are a few to share;

Gmail extra features

Gmail are constantly introducing new features however the localised user-interfaces are often lagging behind and failing to expose the options. Set your language to “English (US)” to get access to the latest options.

Current options revealed include: Vacation responder, integration with Google Chat, Send mail as an alternative address (not just a reply-to) and web clips.

Gmail compose with mailto:

Many web sites use the standard mailto: syntax to indicate email addresses. This works great for local mail applications such as Outlook and Thunderbird but fails miserably with web based mail providers.

Ideally web browsers would handle this but in the mean time a GreaseMonkey script is available called MailtoComposeInGmail that will internally rewrite any mailto: scripts to head over to your Compose option in Gmail :)

If you don’t have GreaseMonkey already installed give it a shot. It effectively gives your machine the ability to customize the page just for you by running small scripts on your own computer.

Gmail on your own domain name

Google have just opened registrations for getting the Gmail interface, account management and mailing lists on your own domain name. This will no doubt mean MX records have to change and it will be interesting to know what SMTP/POP options they offer. Will let you know if I get in!

Google Analytics availability

If you, like me, have been itching to get into analytics but are still waiting for Google to re-open sign-ups then there is something you can do. Find somebody who has an account and get them to add your web site to their monitoring – they can monitor up to 10. Add the snippet to your site and wait.

They can then grant access to your reports and statistics information to your Google account so you can browse at your pleasure.

If you want to know when you visit a site being tracked by Analytics there’s yet another GreaseMonkey script to do the job.

Keep tabs on the Labs

Google’s engineers are always coming up with something new to play with, sometimes it will become a new product, sometimes a feature inside an existing one. Either way there’s usually a few interesting or useful nuggets to play with.

Google Suggest is very cool and the Site Flavoured Google Search is rather interesting.

Google Video

Currently one of the items in the labs but certainly soon to graduate with its selection of home videos, commercial TV shows and great clips. The commercial stuff is US-only at the moment and costs $1.99 a show but there is plenty of free amazing clips to enjoy.

There is also a GreaseMonkey script to hack round the US-only limitation by firing it through Google’s own translator… Let’s see how long that works for.

Google Maps

They’ve just improved the detail of their UK aerial data once again. It’s a shame Guernsey isn’t covered despite a local company having the information available but hidden away to those willing to pay. The route finding options are pretty useful and this is going to just get better and better.

The 3D real-time Windows application Google Earth is, well, incredible. I wonder how long before they add real-time weather simulation and predictive ones based on forecasts.

Google Search operators

This is pretty simple old stuff but you’d be amazed at how few people still seem to use/know about it.

To search for an exact phrase put quotes around it. For example “sql server” will get you pages about Microsoft’s SQL Server product while sql server without the quotes will return you pages with the word sql and the word server anywhere on the page. In this example, pages about Oracle, MySql, DB2 and every other SQL compatible database known to man.

The other option worth it’s weight in gold is the – operator. This looks for pages without a term or phrase. For example Leopard -Apple -OS will let you find results for Leopards as opposed to getting back lots of pages about the next version of Apple’s operating system.

Hope you found something useful,

GrinGod points out that Google have acquired MeasureMap which provides much more Blogger-friendly statistics than Google Analytics. Sign up for when they’re ready to roll!

[)amien