Noticias
Let’s Start Thinking Of Breathtaking Ways To Leverage Generative AI Far Beyond What We Are Doing Right Now
In today’s column, I explore the rising vocal clamor that we are woefully underutilizing generative AI and large language models or LLMs.
This might come as quite a surprise since the use of generative AI seems to be just about everywhere and continues to rapidly expand. There are reportedly 250 million weekly active users of OpenAI ChatGPT and undoubtedly hundreds of millions or into the billions more users of AI when including the likes of Anthropic Claude, Google Gemini, Meta Llama, and other major generative AI apps.
But the rub is this.
It’s not how many people are using generative AI, it’s the way in which generative AI has been set up to be used.
The primary approach that nearly everyone uses is that generative AI takes in essay-like text and produces text-based responses, or possibly images and video. That is the norm. Generative AI and large language models are data trained on patterns in human language and the way that humans write.
Maybe we should be identifying something else to pattern on. Perhaps we can reach far beyond just everyday natural language. The sky is the limit, or shall we say limitless.
Does that catch your attention and offer some intrigue?
Let’s talk about it.
This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).
The Push To Go Outside The Box
A modern-day luminary in the AI field named Andrej Karpathy began quite an online conversation and debate when he posted a tweet on X that said this (posting on September 14, 2024, per @karpathy):
- “It’s a bit sad and confusing that LLMs (‘Large Language Models’) have little to do with language; It’s just historical. They are highly general-purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. They don’t care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can ‘throw an LLM at it’.
- “Actually, as the LLM stack becomes more and more mature, we may see a convergence of a large number of problems into this modeling paradigm. That is, the problem is fixed at that of ‘next token prediction’ with an LLM, it’s just the usage/meaning of the tokens that changes per domain. If that is the case, it’s also possible that deep learning frameworks (e.g. PyTorch and friends) are way too general for what most problems want to look like over time. What’s up with thousands of ops and layers that you can reconfigure arbitrarily if 80% of problems just want to use an LLM? I don’t think this is true, but I think it’s half true.”
I’d like to walk you through the underlying proposition.
You might want to grab a glass of fine wine and find a quiet spot to sit and mull over the significance of what this is all about.
Tokens And Pattern Matching Are The Key
Currently, when you enter a prompt into generative AI, the words that you input are converted into a numeric format referred to as tokens. For example, suppose the sentence was “The dog barked” and that we had beforehand assigned the number 23 to the word “The”, 51 to the word “dog” and 18 to “barked”. The tokenized version of the sentence “The dog barked” would be those numbers shown in the sequence of 23, 51, and 18.
Next, after that conversion from text to numbers, the numbers or tokens are then used within the generative AI to figure out what the output will be. A long series of computations are undertaken. At the tail end of the processing, and before you see any text output, the resultant numbers might consist of say 10, 48, 6, which let’s assume that 10 is for the word “Yes”, 48 is for the word “it” and 6 is for the word “did”. Thus, the output as a result of making use of the inputs 23, 51, and 18, gives us the numbers 10, 48, and 6, which is shown to you as “Yes it did”.
For a more detailed explanation of tokenization, see my discussion at the link here.
By and large, the premise of generative AI and large language models is that when someone enters a set of sequenced tokens (via text-based words), a response can be computed that will consist of some other set of sequenced tokens (which is then converted into text-based words). In my example, I entered the three sequenced words consisting of “The dog barked” and I got a response of three sequenced words saying, “Yes it did”. My sequence of words “The dog barked” was converted into numeric tokens, run through a gauntlet of mathematical and computational processes, and the result produced was numeric tokens that after conversion into text-based words was “Yes it did.”
How does the AI calculate the words or tokens that form the response?
The general principle is that by doing extensive data training on how humans write, it is feasible to figure out how to take in tokens and generate or produce tokens that fit to the patterns of human writing. Usually, this data training is undertaken by scanning vast amounts of text found on the Internet, including essays, stories, narratives, poems, and so on. It turns out that humans make use of patterns in how they write, and the pattern-matching can pretty much pick up on those patterns.
That’s why generative AI seems fluent. It is computationally mimicking human writing. This requires a lot of examples of human writing to identify those patterns. I’ve discussed that some worry we won’t be able to make dramatic advances in generative AI because there might not be enough available human writing to pattern on, see my analysis at the link here.
Lean Into Pattern Matching As The Crux
It is time to think outside the box.
Are you ready?
Set aside the natural language aspects. Put that at the edge of your thinking. Don’t let it cloud your judgment.
What we really have going on is a kind of statistical predictor that can take in a set of tokens and produce as output a set of other tokens. Within the computational pattern matching is a type of mapping from what happens when some sequence of tokens is encountered and what ought to be predicted as the next tokens to come out.
The existing perspective is that this is useful for natural languages such as English, German, French, etc. Indeed, generative AI is customarily based on and referred to as large language models or LLMs. Why? Because the computational pattern matching is focused on natural languages, forming a model of what our everyday languages entail. After several initial years of trying this, AI researchers realized that you need lots of data to do proficient pattern matching and modeling. In the early days of generative AI, the models weren’t very good, partially due to a lack of scaling up.
At a macroscopic level, assume we need three crucial elements for our predictor mechanism:
- (1) Something that we can convert into tokens.
- (2) There is some pattern associated with inputs to outputs.
- (3) We have enough of the material to sufficiently pattern on.
If any of those assumed elements are unavailable or don’t exist, we are somewhat up a creek without a paddle. Allow me to elaborate on each of the three and why they are respectively vital.
It could be that we cannot convert into tokens whatever it is that we want to use. That’s a problem. We won’t be able to use our prediction models that are based on tokens (as an aside, we could potentially devise models that use something other than tokens).
Another sour possibility is that there aren’t any patterns to be found within the arrangement of the tokens. If there aren’t any patterns, the model can’t make useful predictions. It could be that the patterns are so hard to find that our existing pattern-identifying techniques won’t crack open the secret sauce. It could also be that there just aren’t any patterns at all, period, end of story.
Finally, the likelihood of finding patterns and reliably making predictions is often based on having lots and lots of whatever it is that we are trying to pattern on. If all you have is a drop in the bucket, the odds are it won’t be enough to garner a big picture. Things will be askew.
Throwing The Amazing Predictor At Whatever Works
Okay, now that we have those three elements in mind, we need to start finding new avenues worth venturing into.
I want you to take a moment and put your mind to hard work:
- The Big Question — What else is there other than natural language that provides a source of something that can be converted into tokens, contains patterns, and that we have sufficient volume of the thing that we can reasonably pattern match on it?
And, of course, of which we would want an AI system to be able to process for us.
There must be a buck to be made or some justifiable reason why we would go to the trouble to toss AI at it. I suppose you might do it for kicks but given the cost of churning out this type of AI, there should be a pot of gold at the end of the rainbow, one way or another.
Thinking, thinking, thinking.
Keep your thinking cap on and your mind activated.
You already know that we can do this with natural languages in terms of taking as input text and producing as output some associated text. The same can be said about audio. Generative AI is being used already to take as input audio, convert it into tokens, identify patterns based on available large volumes of audio, and produce audio outputs. Likewise, video is yet another mode, though the video is a lot harder to deal with than text or audio. See my coverage of multi-modal generative AI at the link here.
I’m sure that you know that coding or programming is already under the microscope for generative AI and LLMs. This is an interesting angle because though coding is text-based, it is not quite a natural language per se. You could argue that coding is an artificial language and not a conventional natural language. The beauty though is that it can be converted into tokens, patterns can be identified, and there is a lot of code out there to data train for pattern matching purposes.
Sorry that I keep telling you about possibilities that are already known or taken. That is though good to know about so that you aren’t trying to reinvent the wheel.
Ideas Being Early Days Floated
I will share with you some additional possibilities that are generally underway but still in the early stages of exploration:
- Game playing. You can use the same precepts to get AI to play games. Moves are actions that can be described and converted into tokens. Patterns can be identified. By collecting lots of games being played, data is plentiful.
- Stock market predictions. Consider stock prices as potential tokens. If you want to include other factors, such as the status of the economy, those can be similarly tokenized. Patterns can be presumably found and lots of data is available.
- Molecular structure predictions. Take the shapes or structures of molecules and convert them into tokens. There are patterns to be found. Lots of data is available.
- Route optimizations. Routing of traffic is essential and currently tends to be solved via symbolic or traditional mathematical means. The traffic parameters could be tokenized, patterns figured out, and lots of such data would be available for this.
Those are paths that are seriously being pursued. You are encouraged to jump in and help out. They are still cooking those meals, and the results are not yet finalized. There is ample room to make progress.
Okay, your homework is this.
Think about fields of one kind or another that may have not yet been explored for applying a generative AI or LLM-like capability. If you happen to be a domain expert in that field, you have a leg-up on this. I say that because you hopefully already know whether there are patterns afoot, you know why using AI for predictions would be valuable in that arena, and you possibly know if or where data can be found.
An added twist is this.
If there aren’t known patterns, you might be onto something especially enriching. Here’s the deal. If no one has yet found patterns, it could be that they just haven’t looked the right way. Prior efforts to find patterns might not have had the kind of computational power and pattern matching that we have with contemporary generative AI and LLMs.
The domain might be a sleeper. It is waiting for the right person to have the right vision. The heretofore unknown patterns could be unlocked via the right use of generative AI and LLMs or similar technology. I assure you that if that were the case, you might be in line for big bucks, big fame, and maybe even one of those vaunted Nobel prizes.
Isn’t that worth taking some dedicated time and attention to think about?
Yes, I would certainly say so, and I wish you the best of luck and urge you to get cracking. You can do it.