2023, retrospective, media, accessibility, hyperaudio

2023 was a difficult year for me with family issues seeing me spending more time in the UK than usual. I’ve consequently been working less and when freelance contracts ended in April, I spent much of the rest of the year on a sabbatical, of sorts.

In previous years, moving from developer, to project manager, to CTO for a period meant I had spent increasingly less time writing code. It tends to be the natural way of things for many, but I had always kept my hand in – maintaining small open source libraries and applications as part of the Hyperaudio Project.

The year began well, working with long-time friend and associate Laurian Gridinoc and Dave Bevan from the BBC – hitting our stride with the BBC News Labs project called Audiogram. This was the second iteration of a prototype – an app to create small accessible animations and video clips with captions overlaid, designed to be consumed with or without audio – perfect for social media. True to form, the basis of the application was an timed interactive transcript – output from the BBC’s speech-to-text system.

A flow diagram of the audiogram application

A flow diagram of the Audiogram 2 application

The Audiogram application produced a number of keyframes derived from neatly split captions that light up karaoke-style. Options are made available to customise each keyframe, including colours, fonts, waveforms and image overlay. Once finished the Audiogram could be exported in various useful formats.

Initial view of an audiogram, within the Audiogram 2 application

Modularity was key – the aim being to easily add effects. This project deserves a blog post of its own, but this video should illustrate the functionality.

Bad Idea Factory is a collective of chaotic creatives using technology to make people thinking face emoji. Or so the website says. I’m happy to say that I’ve been part of the collective since its conception, 5 years ago. BIF encourages its members to work on side projects that combine fun and technology in new exciting ways.

Skyppy is a project we began in 2021 which uses a speech segmenter to allow people to remove parts of a YouTube video – for comic effect. We can, for example, remove a person’s voice from an interview and leave the pregnant pauses. I collaborated with my good friend Marco Scarselli on this project – he worked on the backend while I worked on the front and I’m glad to say that we finally finished work in early 2023 and were featured in the infamous B3ta Newsletter. So far people have created over 100 videos which have been viewed over 1500 times.

A screenshot of the Skyppy web app in action

The rest of the year was spent on more “serious” open source projects. I’m a big fan of the 80-20 rule, or Pareto Principle. In the context of Software Engineering I interpret this to mean that you can achieve 80% of the functionality of an application with 20% of the effort, although I think it is probably possible to achieve 90-10.

On larger projects I’m often managing the development of applications that use various frameworks, libraries and build systems reliant on complicated cloud infrastructure, but when working on my own side projects I like to dial all that down and use “vanilla” JavaScript, HTML and CSS, no backend and no build system. Maybe I’m showing my age, but I miss the simplicity and immediacy of 90s web development. I don’t miss the inconsistencies in browser implementations of the 90s, but nowadays we’re seeing the web platform steadily mature, with new and useful features being rolled out across browsers.

Since most people have browsers set on auto-update, I feel like that with side projects at least, I can develop for the latest browsers. This allows me to create Progressive Web Apps that use newer technology such as Web Components, Service Workers, Notification APIs, Local Storage and IndexedDB allowing us to create modular, installable web applications such as the Hyperaudio Lite Editor.

A screenshot of the Hyperaudio Lite Editor

The Hyperaudio Lite Editor

Another advantage of the web platform’s maturation is its inclusion of key functionality such as the Drag and Drop API, the Dialog Element or just plain scrolling. I’m hoping that these APIs coupled with new improvements in the CSS spec allowing better scoping will mean we’ll be able to use increasingly fewer external libraries as 2024 progresses.

There does seem to be momentum building against the complexity and sometimes faddish nature of modern frameworks and libraries in favour of a more simple approach.

The Hyperaudio Lite library is 22KB and the Hyperaudio Lite Editor weighs in at 334KB zipped. This lightness can translate into performance, keeping in mind that Chrome’s Lighthouse scores are not the be all and end all for measuring the quality of a web app, I’m nonetheless happy to say that the Hyperaudio Lite Editor achieves 100% scores across the board.

Hyperaudio Lite (and associated Wordpress plugin) also supports a number of media platforms, including YouTube, SoundCloud, Vimeo and Spotify.

More recently we have been experimenting with WebAssembly and AI to implement speech-to-text algorithms within the browser itself – managing to build a version of OpenAI’s Whisper into the Hyperaudio Lite Editor itself. Sure, it’s slow and not as accurate as some models, but it’s completely free. We've also integrated with Deepgram's speech-to-text service – who provide 45,000 minutes of transcription for free.

Needless to say I’m very excited to see the progress in the field of AI, especially as applied to media and natural language processing. When I studied AI as part of a Masters degree I took in “Knowledge Based Systems” it was all theoretical, with very little concrete application possible. Again, this was the 90s :) It’s great to see that the fundamentals of concepts such as neural networks still apply. This excitement is tempered by a certain cautiousness around the ethics of AI, and I hope to be able to use these new technologies in positive ways. More about that later.

It’s uplifting to see people contributing to all the libraries in the Hyperaudio repositories and see the number of stars slowly rise. But stars don’t pay bills, so aside from setting up a patreon account for Hyperaudio I’ve decided to try and sell a “pro” version of the Hyperaudio Wordpress plugin. This pro version combines the standard (and free) Hyperaudio Plugin with the (also free) Hyperaudio Lite Editor to give users a seamless way to create Interactive Transcripts in Wordpress, rather than using the two separate tools. It remains open source, but I’m hoping that people will still buy it and proceeds will subsidise the development of the Hyperaudio Project. To this end I’ve been working behind the scenes on hyperaudio.site to showcase all this technology.

Another open source project I helped manage was the OpenEditor. Sponsored by WTTW/WFMT the OpenEditor was a response to the WFMT desire to bring transcript editing in-house. The idea is that it can be installed on an AWS instance, with Amplify Studio being used for user management and permissions while using Amazon Transcribe and Transcoding services to produce the transcripts and media. OpenEditor is a much more comprehensive solution than the Hyperaudio Lite Editor – including project and file management and multiple transcript search. Entirely developed by Laurian Gridinoc, OpenEditor is a testament to what can be achieved when developing closely with a client over a number of years. A shout out to Allison Schein, whose vision made all this possible.

People that know me, know that I’ve been involved with Mozilla for a while now. I was a Knight Mozilla OpenNews fellow back in 2013 and many of the people I met as part of that fellowship, I’m still in touch with to this day. I’m also a big fan of the Mozilla Festival, having attended them all and being involved in various ways, from facilitator to session "Wrangler" and more recently running a series of initiatives to make content more accessible through captions, transcripts and translations culminating in this year’s website mozfest.hyper.audio. More details about this platform at hyper.audio/conferences.

A screenshot of a Mozilla Festival transcript of the Hyperaudio for Conferences platform

It’s validating to see many of the above projects (and even successful businesses such as Trint) leverage Hyperaudio – technology we started work on early last decade. It really helps justify our attempts to break media out of the 21st century by making it more web-like discoverable and accessible.

This leads me to the conclusion of this retrospective with the happy announcement that I will be joining TheirStory as CTO starting January 2nd. I’ve known Zack Ellis – the founder and CEO since we met at a TextAV meetup in London several years ago and have remained in touch ever since. TheirStory is a platform to facilitate the end-to-end collecting, preserving and sharing of oral history. One of my first tasks will be to integrate Hyperaudio technology into the platform. I’ll also be overseeing initiatives to integrate emerging AI based technology. I’ll miss having the spare time to work full-time on the Hyperaudio libraries but will continue developing, as time allows. We're intending to implement an open source strategy with TheirStory, so all in all it’s a great match and a role that I’m very much looking forward to.