AI Is Breaking Free Of Token-Based LLMs By Upping The Ante To Large Concept Models That Devour Sentences And Adore Concepts

Generative AI based on LLMs might be old hat. New approaches are brewing. One is the advent of large concept models (LCMs). Here's the inside scoop on the future of AI.

Large concept models (LCMs) offer some exciting prospects.

In today’s column, I explore an intriguing new advancement for generative AI and large language models (LLMs) consisting of moving beyond contemporary words-based approaches to sentence-oriented approaches.

The extraordinary deal is this. You might be vaguely aware that most LLMs currently focus on words and accordingly generate responses on a word-at-a-time basis. Suppose that instead of looking at the world via individual words, we could use sentences as a core element. Whole sentences come into AI, and complete sentences are generated out of AI.

To do this, the twist is that sentences are reducible to underlying concepts, and those computationally ferreted-out concepts become the esteemed coinage of the realm for this groundbreaking architectural upheaval of conventional generative AI and LLMs. The new angle radically becomes that we then design, build, and field so-called large concept models (LCMs) in lieu of old-fashioned large language models.

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). For my coverage of the top-of-the-line OpenAI ChatGPT o1 and o3 models and their advanced reasoning functionality, see the link here and the link here.

There is an ongoing concern in the AI community that perhaps AI researchers and AI developers are treading too much of the same ground right now. We seem to have landed on an impressive architecture contrivance for how to shape generative AI and LLMs and few want to depart from the success so far attained.

If it isn’t broken, don’t fix it.

The problem is that not everyone concurs that the prevailing architecture isn’t actually broken. By broken — and to quickly clarify, the issue is more of limitations and constraints than it is one of something inherently being wrong. A strong and vocal viewpoint is that we are hitting the topmost thresholds of what contemporary LLMs can accomplish. There isn’t much left in the gas tank, and we are soon to hit a veritable wall.

As such, there are brave souls who are seeking alternative architectural avenues. Exciting but a gamble at the same time. They might hit the jackpot and discover the next level of AI. Fame and fortune await. On the other hand, they might waste time on a complete dead-end. Smarmy cynics will call them foolish for their foolhardy ambitions. It could harm your AI career and knock you out of getting that sweet AI high-tech freewheeling job you’ve been eyeing for the longest time.

I continue to give airtime to those who are heads-down seriously aiming to upset the apple cart. For example, my analysis of the clever chain-of-continuous thought approach for LLMs merits dutiful consideration, see the link here. Another exciting possibility is the neuro-symbolic or hybrid AI approach that marries artificial neural networks (ANNs) with rules-based reasoning, see my discussion at the link here.

There is no doubt in my mind that a better mousetrap is still to be found, and all legitimate new-world explorers should keep sailing the winds of change. May your voyage be fruitful.

Doing A Deep Dive On Upheaval

The approach I’ll be identifying this time around has to do with the existing preoccupation with words.

Actually, it might be more appropriate to say a preoccupation with tokens. When you enter words into a prompt, those words are converted into numeric values referred to as tokens. The rest of the AI processing computationally crunches on those numeric values or tokens, see my detailed description of how this works at the link here. Ultimately, the AI-generated response is in token format and must be converted back into text so that you get a readable answer.

In a sense, you give words to AI, and the AI gives you words in return (albeit via the means of tokenization).

Do we have to do things that way?

No, there doesn’t seem to be a fundamental irrefutable law of nature that says we must confine ourselves to a word-at-a-time focus. Feel free to consider alternatives. Let your wild thoughts flow.

Here is an idea. Imagine that whole sentences were the unit of interest. Rather than parsing and aiming at single words, we conceive of a sentence as our primary unit of measure. A sentence is admittedly a collection of words. No disagreement there. The gist is that the sentence is seen as a sentence. Right now, a sentence happens to be treated as a string of words.

Give the AI a sentence, and you get back a generated sentence in return.

Boom, drop the mic.

Sentences Beget Concepts Which Beget Answers

Making sense of sentences is a bit of a head-scratcher. How do you look at an entire sentence and identify what the meaning or significance of the sentence is?

Aha, let’s assume that sentences are representative of concepts. Each sentence will embody one or more concepts. If you closely inspect a sentence, perhaps you can ferret out the set of concepts that underlie it. The beauty is that we can then work with concepts and for the moment set aside sentences.

Yes, the steps are that we will take an entered sentence, computationally squeeze out the implied concepts, and we will then use those identified concepts as essentially our “tokens” (well, they will be numeric constructs that we can relate to other numeric constructs that are also concepts).

After playing around with the concepts in their numeric configuration, the goal is to produce an output-bound set of numeric-represented concepts that captures what the AI has derived as an answer or response. People want to see text-based answers, so the outbound set of concepts needs to get turned back into text, consisting of sentences, and subsequently presented to the user.

Give the AI a sentence, the sentence gets juiced into concepts, the concepts are used to do processing, and the resultant answer that is in concepts gets converted back into a sentence.

Some More Meat On The Bones

All the heavy lifting takes place in something we will anoint as a concepts-only space. This is a mathematical and computational multi-dimensional structure that relates concepts in numeric formats to other concepts in numeric formats.

A quite nifty consequence arises. If this is done carefully, to some degree the language being used for the sentences is merely plug-and-play.

Here’s why. The usual generative AI or LLM data training gets somewhat trapped in the mainstay natural language used for data training, such as using English language content on the Internet as the scanned data. You can at times push the AI toward other languages, which has an interesting twist all its own, see my discussion at the link here.

Anyway, since the core of this alternative concepts-based approach has to do with concepts in numeric formats, the sentences coming in and going out can be switched rather readily to whichever language you prefer. This isn’t a no-brainer though and realize that added work would be required. The general point is that the universality across languages is an intriguing potential bonus. One assumption is that concepts are universal.

Do you think that’s the case or do the language and concepts go tightly hand-in-hand (see my coverage of that debate at the link here)?

Unpacking The Suitcase

You are now pretty much at the 30,000-foot level, which is sufficient to do some assessment of this innovation.

First, we’ve covered that there are presumably three ways to see the world:

(a) Rudimentary at-words level. Word-by-word processing involving the analysis of individual words.

(b) Step up at a sentence level. Sentence-level processing involving analyzing entire sentences.

(c) Concepts-level processing. Identify and deal with concepts rather than words and sentences per se.

Second, allow me to introduce to you six major steps entailing the sentences and concepts embodiment:

(1) User enters a sentence. Take in a sentence from the user that is composed of one or more words.

(2) Concept encoder examines the sentence. Use a concept encoder to figure out what concepts might mathematically and computationally underpin the text-based sentence provided in step #1.

(3) Feed those concepts into a Large Concept Model (LCM). Feed those speculated identified concepts from step #2 into the Large Concept Model for computational processing.

(4) LCM generates responsive concepts. The LCM produces as internal outputs a set of concepts as a result of processing the fed in presumed identified concepts from step #3.

(5) Concept decoder produces text-based answers for the user. Use a concept decoder that turns the LCM-provided set of concepts from step #4 into a text-based sentence of words for presentation to the user.

(6) User is shown the generated sentence. Show the user the text-based sentence composed as a result of step #5.

I hope that helps to lay out the overarching precepts.

Example Of LLM Versus LCM At Work

Allow me to share with you a quick example of how a large concept model or LCM would work. I’ll make the example simple to concentrate on the crux of things. It is show-and-tell time.

Suppose I am aiming to put together a road trip. Everyone loves road trips. You can log into most of the major generative AI and LLMs to ask for assistance in planning a road trip. Use whichever you like, such as OpenAI ChatGPT, Anthropic Claude, Microsoft CoPilot, Google Gemini, Meta Llama, etc.

If I used a conventional contemporary LLM, this is what it might look like:

My entered prompt: “Plan a road trip with scenic stops. Avoid toll roads. Limit driving to under 5 hours per day.”

Generative AI response: “Day 1: Drive 4.5 hours to Mountain View State Park, stopping at Scenic Overlook Point. Day 2: Drive 3.5 hours to Lakeside Resort, avoiding toll routes and enjoying a scenic drive by the coast."

What happened under-the-hood?

The LLM would word-by-word convert my prompt “Plan a road trip…” into a series of numeric tokens covering the words that were entered.

The tokens would be run through various architectural transformers, decoders, encoders, and so on.

The tokens generated would eventually get converted back into words. The result as you can see is the response by the AI that says “Day 1: Drive 4.5 hours to…”

Internal Processing By An LCM

In contrast, let’s show what the LCM’s steps would consist of (I’ll include the steps midstream, so they are easier to envisage):

My entered prompt: “Plan a road trip with scenic stops. Avoid toll roads. Limit driving to under 5 hours per day.”

Internal — Input Segmentation detects three distinct sentences: (1) "Plan a road trip with scenic stops.", (2) "Avoid toll roads.", (3) "Limit driving to under 5 hours per day."

Internal — Encode each sentence into a concept embedding: (1) "Plan a road trip with scenic stops." → Concept A, (2) "Avoid toll roads." → Concept B, (3) "Limit driving to under 5 hours per day." → Concept C

Internal — Process via the LCM: The sequence of concepts is fed into the LCM, consisting of Concept A, Concept B, and Concept C. The originally entered sentences are no longer being used at this juncture, instead the underlying identified concepts are being used.

Internal — LCM generates pertinent concepts: Based on the fed-in concepts, the LCM produces a set of internal output concepts accordingly, which in this case consists of Concept D and Concept E.

Internal — Decoder converts concepts into text: The concepts generated by the LCM are turned into suitable text sentences, Concept D = "Day 1: Drive 4.5 hours to Mountain View State Park, stopping at Scenic Overlook Point.", and Concept E = "Day 2: Drive 3.5 hours to Lakeside Resort, avoiding toll routes and enjoying a scenic drive by the coast."

Generative AI response: “Day 1: Drive 4.5 hours to Mountain View State Park, stopping at Scenic Overlook Point. Day 2: Drive 3.5 hours to Lakeside Resort, avoiding toll routes and enjoying a scenic drive by the coast."

The output of the LCM could be the same as the output produced by the LLM. I’m not saying this is always the case, and in fact, it would rarely be the case. You can say the same about different LLMs, namely that the output produced by a given LLM such as ChatGPT won’t likely be identical to the output from Claude.

Handy AI Research About Large Concept Models

I hope that you find this to be a fascinating topic.

For the nitty-gritty details of the LCM approach, take a good look at a newly released research study entitled “Large Concept Models: Language Modeling in a Sentence Representation Space” by Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk, arXiv, December 12, 2024, which made these salient points (excerpts):

“Current best practice for large-scale language modeling is to operate at the token level, i.e. to learn to predict the next tokens given a sequence of preceding tokens. There is a large body of research on improvements of LLMs, but most works concentrate on incremental changes and do not question the main underlying architecture.”

“In this paper, we have proposed a new architecture, named a Large Concept Model (LCM), which substantially differs from current LLMs in two aspects: 1) all modeling is performed in a high-dimensional embedding space instead of on a discrete token representation, and 2) modeling is not instantiated in a particular language or modality, but at a higher semantic and abstract level.”

“We define a concept as an abstract atomic idea. In practice, a concept would often correspond to a sentence in a text document, or an equivalent speech utterance. We posit that a sentence is an appropriate unit to achieve language independence, in opposition to single words.”

“The input is first segmented into sentences, and each one is encoded with SONAR to achieve a sequence of concepts, i.e., sentence embeddings. This sequence of concepts is then processed by a Large Concept Model (LCM) to generate at the output a new sequence of concepts.”

“Finally, the generated concepts are decoded by SONAR into a sequence of subwords. It is important to highlight that the unchanged sequence of concepts at the output of the LCM can be decoded into other languages or modalities without performing again the whole reasoning process.”

This is the kind of thinking outside the box we need to have for venturing beyond the norm of current times generative AI and LLMs.

Where We Boldly Must Next Go

Is a posited LCM approach the winner-winner chicken dinner?

Don’t know, no one can say for sure at this time.

Should we be avidly coming up with new ideas and stretching the boundaries of what we are doing at the keystones or base roots of AI?

Absolutely.

Innovation is the watchword. It is too early to toss in the towel and declare that existing AI architectures are the end-all. We must be more open-minded.

If the prevailing conventional approach to AI is a potential dead-end (trolls – I’m not saying it is, just asking a reasonable question, thank you), I think we would all rather prefer that we have backup approaches already underway. Sitting around with a blank stare and saying woe is us just ought not to be a strategy. Multiple paths and a Darwinian battle involving creative ideas and novel AI approaches would seem a more prudent route.

Final thoughts for now.

Albert Einstein famously said this: “Creativity is intelligence having fun.”

Let loose with your creativity if you are in the AI field. Don’t be myopic. Reach for the sky. As you do so, please have a modicum of practicality and do not go off the deep end.

The last word on creativity goes to Einstein once again in this equally popular quote: “Creativity is contagious. Pass it on.”

Okay, enough said, the creativity baton is being passed along to you, so let’s all get to work.

Follow me on Twitter.

Lance Eliot

Editorial Standards

Forbes Accolades

Contributor,Lance Eliot

Post a Comment

Previous Post Next Post