reported that “With the generative AI gold rush in full swing, some IT leaders are finding generative AI’s first-wave darlings — large language models (LLMs) — may not be up to snuff for their more promising use cases.”  The May 21, 2024 article entitled “IT leaders look beyond LLMs for gen AI needs” ( included these comments:

 LLMs, with their advanced ability to comprehend and generate text, have become a near stand-in for generative AI in the collective mindshare. Along with code-generating copilots and text-to-image generators, which leverage a combination of LLMs and diffusion processing, LLMs are at the core of most generative AI experimentation in business today.

But not all problems are best solved using LLMs, some IT leaders say, opening a next wave of multimodal models that go beyond language to deliver more purposeful results — for example, to handle dynamic tabular data stored in spreadsheets and vector databases, video, and audio data.

Multimodal foundation models combine multiple modes, such as text, audio, image, and video, and are capable of generating captions for images or answering questions about images, according to IDC’s Market Glance: Generative Foundation AI Models. Examples include Google Gato, OpenAI GPT-4o, Microsoft LLaVA, Nvidia NeVA, Vicuna, BLIP2, and Flamingo, IDC notes.  

What do you think?

First published at