Soon, ChatGPT will be able to ‘debate’ user-shared photos and have five-voiced ‘back-and-forth’ dialogues.
The functionalities will be available to ChatGPT Plus and Enterprise members in the upcoming weeks.
OpenAI said on Monday that voice interactions and picture recognition are now supported by ChatGPT. Soon, the company’s AI-powered chatbot will be able to comprehend photographs that users upload or share, and it will be able to provide context or related information on all platforms where it is available. Additionally, it will be able to converse back and forth using OpenAI’s Whisper voice recognition software and a new text-to-speech (TTS) technology from the business that is said to provide “human-like” audio on the company’s ChatGPT mobile app.
The voice conversations feature will be accessible on iOS and Android via an opt-in setting, while OpenAI’s new picture recognition capability for ChatGPT will be accessible on all platforms. There is no information yet on whether these functionalities will eventually be made available to users of the free tier in addition to ChatGPT Plus and Enterprise members.
By selecting the Voice Conversations checkbox under Settings > New Features, voice conversations can be made available in ChatGPT. Then, you can choose among five voices; OpenAI claims to have partnered with experienced voice actors to provide the new feature. By translating your spoken inquiries into text that the chatbot can understand, the ChatGPT app will be able to respond, and responses will be converted into audio using the company’s new TTS technology.
Spotify launched a new AI-based speech translation tool for podcast creators on Monday that will automatically translate a podcast from English to French, German, and Spanish. ChatGPT is not the only service that will employ OpenAI’s new TTS technology. According to the streaming service, the tool is now being tested by a select podcasters, and translated episodes will be accessible to all users everywhere Spotify is present.
According to OpenAI, the new image recognition tool utilizes the GPT-3.5 and GPT-4 multimodal models from the business and is capable of analyzing the images and text found in pictures, screenshots, and documents. To receive insights from ChatGPT, users can either take a new image or share an existing one from their phone.
According to OpenAI, ChatGPT will also let users contribute several photos that can be debated with the chatbot. You can mark a portion of the image with the built-in drawing tool if you want it to focus on a certain area. For instance, ChatGPT’s chatbot could be able to provide you with solutions if you draw a circle around a bicycle chain that has come undone in a photo you shared with it.