Monday, 18 February 2013

Lee Going Perceptual : Part One

Lee Going Perceptual : Part One

Welcome to the start of another exciting adventure into the weird and wonderful world of cutting edge software development. Beyond these walls lurk strange unfathomable creatures with unusual names, ready to tear the limbs of any unsuspecting coder careless enough to stay on the path.



For me, this is a journey for off-piste adventurers, reckless pioneers and unconventional mavericks. We're off the chain and aiming high so prepare yourself, it's gonna get messy!

A Bit About Me

My name is Lee Bamber, co-founder of The Game Creators (circa 1999). I have been programming since the age of nine and thanks to my long tenure as a coder, brands now include DarkBASIC, The 3D Gamemaker, FPS Creator, App Game Kit and soon to be developed; FPSC-Reloaded. I have been invited by Intel to participate in a challenge to exploit cutting edge technologies, which as we say in the UK is right up my cup of tea.


The Challenge



The basic premise of the challenge is to put six elite developers (and me) into a 'room' and ask them to create an app in seven weeks that takes full advantage of either a convertible Ultrabook and/or gesture camera. We will be judged on our coding choreography by a panel of hardened industry experts who expect everything and forgive nothing. At the end, one coder will be crowned the 'Ultimate Coder' and the rest will be consigned to obscurity.



The Video Blog

I will be posting video versions of my blog every week to communicate with expansive arm gestures and silly accents what I cannot describe in text. Hopefully amusing, sometimes informative, certainly low-budget.




The Real Blog


I do like to blog, but a video of me can be too much for the mind to cope with, so I have provided copious amounts of text as a safer alternative. In these pages you will learn about the technical details of the project, any useful discoveries I make and the dangers to be avoided.


The Idea

For this challenge I want to create a new kind of Web Cam software, perhaps even the sort of app you would find bundled with the hardware when you buy the product. Hardware manufacturers only bundle software that has mass market appeal showing off the best of what their device has to offer. Rather than shoe-horn the technology into something I was already doing, or come up with crazy ideas around what I could do with these wonderful new toys, I wanted to produce a relevant app. An app that users want, something that relates this new hardware to the needs of the human user, not the other way around. If my app can fix an existing problem, or improve a situation, or open a new door, then I will have created a good app.


The Perceptual Computing Myth

Forget the movies! That scene out of such and such was not designed with good computer interaction in mind, it was created to entertain. We all know large physical keyboards are better for writing blogs than virtual keyboards or voice dictation. Simple fact. Ask Hollywood for futuristic keyboard and they'd replace it with a super-intelegent robot, writing the blog for you and correcting your metaphors.

In the real world, we like stuff that 'just works'. The better it works for us, the more we like it. The keyboard works so well we've been using it for over 140 years, but only for writing text. You would not, for example, use it to peel potatoes. Similarly, we would not use Perceptual Interfaces to write a blog, nor would we use it to point at something right in front of our nose, we'd just reach out and touch it.

Context is king, and just as you would not chop tomatoes on your touch tablet, there will be many scenarios where you would not employ Perceptual Computing. Deciding what those scenarios are, and to what degree this new technology will improve our lives remains to be seen. What I do know is that app developers are very much on the front line and the world is watching!


The Development Setup

To create my masterpiece, I have a few tools at my disposal.  My proverbial hammer will be the programming language Dark Basic Professional. It was designed for rapid software development and has all the commands I need to create anything I can dream up.

I will be using an Ivybridge-based Desktop PC running at 4.4Ghz for the main development and a Creative Gesture Camera device for camera and depth capture. 




The Gesture Camera & SDKs

I have created a quick un-boxing video of the Perceptual device I will be using, which comes with a good sized USB cable and handy mounting arm which sits very nicely on my ageing Sony LCD.



The SDKs used will be the Intel Perceptual Computing SDK Beta 3 and the companying Nuance Dragon voice SDK.


The Convertible Ultrabook

To test my app for final deployment and for usage scenarios, I will be using the new Lenovo Ideapad Yoga 13. This huge yet slim 13 inch Ultrabook converts into a super fast touch tablet, and it will be interesting to see how many useful postures I can bend the Ultrabook into over the course of this competition.  Here is a full un-boxing video of the device.



I also continued playing with the Yoga 13 after the un-boxing and had a great time with the tablet posture. I made a quick video so you can see how smooth and responsible this form factor was. Very neat.






The State Of Play


As I write this, there is no app, no design and no code. I have a blank sheet of paper and a few blog videos. The six developers I am competing against are established, seasoned and look extremely dangerous. My chances of success are laughable, so given this humorous outcome, I'm just going to close my eyes and start typing. When I open them in seven weeks, I'll either have an amazing app or an amazing lemon.



My Amazing Lemon

Allow me now, with much ado, to get to the point. The app I am going to create for you today will be heralded as the next generation of Web Cam software. Once complete, other webcam software will appear flat and slow by comparison. It will revolutionise remote communication, and set the standard for all web camera software.

The basic premise will be to convert the depth information captured from the Gesture Camera and convert it to a real-time 3D mesh. It will take the colour information from the regular camera output and use this to create a texture for the 3D mesh. These assets are then streamed to a client app running on another computer where the virtual simulation is recreated. By controlling the quantity of data streamed, a reliable visual connection can be maintained where equivilant video streaming techniques would fail.  Additionally, such 3D constructs can be used to produce an augmented virtual environment for the protagonists, effectively extracting the participants from the real world and placing them in artificial environments.

Such environments could include board rooms for serious teleconferencing or school rooms for remote teaching. Such measures also protect privacy by allowing you to control the degree with which you replace the actual video footage, from pseudo realistic 3D to completely artificial. You could even use voice recognition to capture your voice and submit the transcript to those watching your webcam feed, protecting your identity further.

At that's just the start. With real-time access to the depth information of the caster, you can use facial tracking to work out which part of the 3D scene the speaker is interested in. The software would then rotate the camera to focus in on that area, much like you would in real life.  Your hand position and gestures could be used to call up pre-prepared material for the web cast such as images, bullet points and video footage without having to click or hunt for the files. Using voice recognition, you could bring likely material to the foreground as you speak, and use gestures to throw that item into the meeting for the rest of the group to see.

Current web cam and web casting technologies use the camera in an entirely passive way. All interaction is done with keyboard and mouse. In the real world you don't communicate with other humans by pressing their buttons and rolling them around on the floor (er, most of the time). You stand at arms length and you just talk, you move your arms and you exchange ideas. This is how humans want things to work.

By using Perceptual Computing technology to enable this elevated form of information exchange, we get closer to bridging the gap between how humans communicate through computers to other humans.




Signing Off

Note to judges, quality development is one part inspiration and ten parts iteration. If you feel my blog is too long, too short, too wordy, too nerdy or too silly, or my app is too ugly, too confusing, too broken or too irrelevant, I insist you comment and give me your most candid response. I work with a team who prize brutal honesty, with extra brutal. Not only can I handle criticism, I can fold it like origami into a pleasing shape.

Congratulations! You have reached the end of my blog post.  Do pass go, do collect £200 and do come back next week to hear more tantalising tales of turbulence and triumph as I trek through trails of tremendous technology.

NOTE: This blog is also published officially on the IDZ site at: http://software.intel.com/en-us/blogs/2013/02/17/ultimate-coder-challenge-ii-lee-going-perceptual-part-one






3 comments:

  1. hi!the post on depth data was cool!
    is it possible to detect eyes using the depth stream in creative cam??

    ReplyDelete
  2. You can detect eye sockets for sure, based on head location, but to detect pupil direction you would need to augment the depth data stream with the RGB stream. Definitely worth experimenting with, as passive awareness of where the user is looking would be extremely useful in today's desktop computing!

    ReplyDelete
  3. hi! how to get the color data of a pixel from the rgb stream??

    ReplyDelete