OpenAI Enables ChatGPT to See, Hear and Speak

OpenAI has announced a significant update set to roll out soon for ChatGPT, equipping it with voice and image functionalities, designed to usher in a new era of multi-modal interactions with the AI chatbot.

Users will soon be able to engage in vocal exchanges with ChatGPT, whether settling a family debate or seeking a bedtime story, ChatGPT is ready to act as an auditorially conversational companion.

Once released, mobile users will need to activate the voice chat feature via settings and from there simply tap the headphone icon on the main screen. The update brings with it five distinct voices, generated from collaboration with professional voice actors and powered by an advanced text-to-speech model. Additionally, Whisper, OpenAI’s speech recognition system, seamlessly transcribes spoken words into text.

Visual interactions have also been introduced, enabling ChatGPT to assist in dynamic instances. Image processing and understanding utilize the multimodal prowess of GPT-3.5 and GPT-4 — this means that the model doesn’t just view images but applies linguistic reasoning to them.

OpenAI has expressed that safety remains a priority and that the deployment of these new features will be gradual, to ensure risks are continually assessed and mitigated. The company further expressed in a blog post that the essence of these features is to facilitate assistance in daily life.

Access to the new update will roll out in waves in approximately two weeks, with Plus and Enterprise users to gain first access.

Elsewhere in tech, Amazon invests $4 Billion USD in OpenAI competitor Anthropic.
Source: Read Full Article