OpenAI's ChatGPT Unveils Game-Changing Updates: Voice and Image Recognition

OpenAI, the company behind the highly popular ChatGPT, has just rolled out major updates, significantly enhancing the capabilities of its viral app. These new features are poised to change the way users interact with ChatGPT and mark another significant milestone in the evolution of artificial intelligence technology.

Voice Interaction: ChatGPT now boasts the power of voice interaction. Users can select from five remarkably lifelike synthetic voices to engage in real-time conversations with the chatbot as if they were making a phone call. This groundbreaking feature leverages two distinct models: Whisper, OpenAI’s existing speech-to-text model, converts spoken words into text, which is then processed by the chatbot. A new text-to-speech model transforms ChatGPT’s responses into spoken language.

During a recent demonstration, OpenAI’s product manager, Joanne Jang, showcased ChatGPT’s synthetic voices. These voices were meticulously crafted by training the text-to-speech model on the voices of hired actors. OpenAI prioritized creating voices that users could comfortably listen to for extended periods. This update not only enhances the conversational experience but also opens the door for potential customization, enabling users to create their own voices in the future.

Notably, OpenAI is sharing this text-to-speech model with several other companies, including Spotify. The music streaming giant is employing this synthetic voice technology to translate celebrity podcasts into multiple languages, utilizing synthetic versions of the podcasters’ voices.

Image Recognition: In another groundbreaking move, ChatGPT can now answer questions about images. This feature, initially teased during the reveal of GPT-4 (the model powering ChatGPT) in March, is now available to the broader public. Users can upload images to the app and inquire about their content, significantly expanding the chatbot’s utility.

Raul Puri, a scientist working on GPT-4, demonstrated this image recognition feature during a recent demo. He uploaded a picture of a child’s math homework, circled a Sudoku-like puzzle, and asked ChatGPT for the solution, receiving the correct steps in response. Puri also utilized the feature to troubleshoot his fiancée’s computer, uploading screenshots of error messages and seeking guidance from ChatGPT.

Moreover, ChatGPT’s image recognition capability has been tested by Be My Eyes, a company offering an app for individuals with visual impairments. Users can upload images of their surroundings and request descriptions from human volunteers or, thanks to a partnership with OpenAI, from ChatGPT.

While these updates represent significant advancements in AI technology, OpenAI acknowledges the potential risks. Combining models introduces a new level of complexity and necessitates addressing possible misuses. Certain safeguards are in place, such as restricting questions about images of private individuals and preventing malicious queries.

Despite these challenges, OpenAI is confident in the safety of ChatGPT’s updates and believes they will offer a valuable and versatile tool for users. As with any technological innovation, it’s essential to strike a balance between functionality, accessibility, and responsible usage, and OpenAI is committed to navigating these challenges effectively.

OpenAI’s ChatGPT Unveils Game-Changing Updates: Voice and Image Recognition

Arizona’s Chip Dream Advances: TSMC Lands $6.6 Billion in U.S. Grants

Building Trust in AI: EU’s First Draft Guidelines Aim to Tame Risks and Boost Transparency

DeepL’s New Real-Time Voice Translation: The Future of Language Tech?

Our Services

Let’s Get In Touch