Humans in the Loop with AI
Part 1
When we marvel at the future of technology and AI, it's super cool to think about how much better technology will be. This or that will improve our quality of life. But what exactly is a better quality of life? Technology has improved, but have we improved alongside it or merely become reliant? To break down this idea, I’m going to talk about image generation and interactive machine learning.
When NVIDIA released GauGAN 2019, this was an amazing breakthrough in photorealistic image generation. Using this tool, a user would be able to paint with their mouse a rough sketch of colors and GauGAN would transform that sketch into a photorealistic landscape. One could create amazing landscapes and waterfalls with simple strokes of a blue and green paintbrush. While the tool at the time wasn’t perfect, images had artifacts and impossible features, it gave anyone the ability to paint photo-realistically without explicit training (for the human).
Fast forward to recent developments over the last year, image generation has been completely dominated by text prompts and Dalle2. With just a few simple words, Dalle2 can return a super high-resolution image that can combine techniques across the imagination. A person can paint in any style they wish. The technology for image generation has certainly improved. But do humans improve alongside it?
If painting by hand to create an image is a traditional human skill that can be improved, we’ve ceded more and more effort and creativity to the machine. Dalle2 requires not skill but instead, authority. The human in the loop simply needs to authorize a text prompt. This authority doesn’t come from artistic confidence or domain experience, but wishful desire. With “tools” like Dalle2, we communicate what we “wish” for with natural language processing. A more clearly communicated wish results in a “better” image. Unlike NVIDIA’s GauGan, where performance and creativity can be expressed through the user’s paintbrush, the crafting of what we desire in the visual domain is confounded with text communication. In this case, AI agents become wishful genies rather than tools that facilitate empowered interaction.
In a more technologically advanced future, as AI agents become increasingly capable, humans should become increasingly capable as well. Rather than just allowing humans to easily wish with authority, AI agents should empower humans in ways that broaden capability and offer new challenges. To speak a common language with humans in a clear feedback loop, the onus of future advancement is between both man and machine.
Part 2
Ten things that can benefit from an interactive AI mindset
1. Learning topics in education that are traditionally difficult to understand (maths/sciences) in a more personalized/intuitive way
2. Musical accompaniment for a human performer. Being able to learn and respond to the player, their habits, style, piece
3. Recommender systems that aren’t just one direction (Netflix -> person watching) but perhaps the user would be able to ask for a suggestion, pick a suggestion through some trial and error interaction
4. Speech to text in which the system is able to account for real-time user correction, more of a speech communicated idea to text rather than just pure speech audio
5. Image processing tools and agents in the medical domain to help doctors make well-informed analyses and diagnosis
6. More accurate background removal in images
7. Cooking AI, real-time feedback on cooking performance, improving cooking ability through parameterization and AI assistance
8. MusicLM, iterative music generation, feedback, and control over output, process, and correctional parameters
9. Generative AI agents, generative composition, chatGPT, and images would benefit/require interactive machine learning for faster training, and more human results
10. Self-driving cars and how they are trained, not just data-wise and CV but also with human intuition in difficult scenarios. Also would self-driving cars be expressive in driving style? Some drive fast/slow, accelerate/turn quicker/slower. Would self-driving cars drive like their drivers, in their style, instead of just safe?