Turns out that probability density functions are pretty good at doing human-type stuff.
Something I've been playing with a lot lately is language models. I have been very impressed over the past few months as I've been paying more and more attention to OpeanAI's ChatGPT and Anthropic's Claude.
The thing I find to be the most impressive -- and the scariest -- is how incredibly far understanding the probability of tokens can get you in seeming sentient. ChatGPT is remarkably good at seeming sentient. It can summarize things very well, answer complex questions, grade your homework, solve your homework, and now it even has an arbitrary function API.
The Go-Fast is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Claude is similar, though less hyped. The recently-released Claude 2 has native uploading of PDFs. I was able to upload a few papers to Claude yesterday. I chose papers that I cited in my dissertation. These are papers I know very well, and papers I would be able to detect when/if Claude was hallucinating.
The paper I chose was Peng and Xiong (2006). Rather than summarize it myself, I'll let Claude do the work:
Here is a summary of the key points from the paper:
The paper develops a model to study how investors allocate their limited attention when learning about asset fundamentals. The model has a representative investor who holds a portfolio of assets with dividends determined by market, sector, and firm-specific factors.
The investor has a limited amount of attention to process information about these factors each period. The investor allocates attention optimally across the factors to minimize the uncertainty about next period's dividend payments.
The model shows the investor displays "category learning" behavior, allocating more attention to market and sector factors than firm-specific factors. In extreme cases, the investor ignores all firm-specific information.
When combined with investor overconfidence, this category learning behavior leads to three implications:
Stock return correlations are higher than fundamental correlations
Sectors with more informative stock prices have lower average return correlations
Return correlations decrease over time as information technology improves
The model also shows cross-sectional implications for return predictability driven by overreaction to information and neglected information. The degree of predictability in a sector depends on the investor's attention allocation across firms.
Overall, the model demonstrates how limited investor attention and behavioral biases can affect asset price dynamics, return comovement, and return predictability in systematic ways. The results match several empirical findings that are hard to explain with standard models.
This is all very accurate. I would have a hard time writing a summary this clear and succinct. Like many PhD-types I can occasionally have difficulty reducing a complex idea to a simple one, but Claude is able to do so with little difficulty.
This brings me back to the point I find to be so strange and intoxicating. All these large language models (LLMs) do is try to predict the next token in a sequence. That's it. It's a big, giant, messy function that is really good at approximating the joint density of language. When you condition that joint density on partial information, like the text of a paper, it becomes very useful at drawing from conditionally available data to predict token outcomes.
At some level, this is fine and not surprising. We have predictive models of stuff everywhere. In finance, we try to predict or explain returns. In macroeconomics you might try to predict or understand employment. Elsewhere, we commonly build small models that attempt to explain a very small part of the probability function that governs a tiny system, because it's hard to do the whole system, and often we don't have enough data.
The realization that so much of this is a probabilistic system is cool. I’m a probability guy. I like thinking of things in terms of models, density functions, etc. Language happens to be a thing that has become more and more easily understood by big-ass density functions, parameterized by billions of parameters.
The thing that is so mind-numbing about LLMs is that I am now questioning a lot of what makes us human. LLMs are so good at seeming human. And all they do is basic math -- once you have a density function, you just need to keep evaluating it. Predict, observe. Predict, observe. So on and so on.
Is this all we do as people? I find myself thinking this, unprompted, at various times during the day. Multiple times a day. I am often thinking of myself as a very well-trained multimodal model. I receive visual input. I segment the visual input into shapes, and attribute labels to those shapes: dog, bus stop, red hat.
When someone asks me a question, something like "How was your day yesterday?" I typically respond with something like "Oh it was good, I had work and then in the evening I went out to dinner!" In some sense this could be cast as a probabilistic model -- I observed what I did last night, and I know from past experience that people typically relay what they did last night when asked this question.
Large language models are super primitive right now. They are wrong a LOT. They make shit up constantly if they don't have the answer in their context window, or if your prompt is bad, etc.. But they seem really human! A large reason for this is that seeming human is actually a pretty good approximation to what humans actually do.
My therapist and I have a pretty routine discussion, maybe once a month. I'll say something I've done at a party or with my partner or friends or something, and I'll describe how I had a good time and did the things I would want myself to have done. I'll occasionally slip up and say something like
I am very good at seeming like a person!
To which my lovely therapist responds (paraphrased)
Cameron, you are a person. There's no seeming here.
I bring this up because I think the issue of "seeming" human is an interesting concept. If it's the case that we're just a jumble of pretty good models that understand probability fairly well, then I wouldn't be terribly surprised if a density-fitter like ChatGPT were able to seem human.
As we go forward in time, I'm increasingly convinced that we'll see more and more interesting applications of ML tools that approximate human behavior reasonably well. Admittedly, language is one of the more clear behaviors to approximate, especially since we've been producing written language for thousands of years and there's a shitload of data. For other stuff, like vision, audio, etc. we're producing more and more data, and the models are just going to get better.
The models will improve, and they’ll get more dazzling. Imagine if you fixed the hallucinations problem in LLMs. One such solution is allowing LLMs to interface with a knowledge graph and then use the graph to produce accurate information. Then, you can let the knowledge graph update over time while the LLM refers to it just as a person might refer to their memory or notes.
This has the capacity to be truly incredible — can you imagine a super accurate LLM? One that has access to a knowledge graph constructed from all academic papers? I could ask it what it’s expectations would be for a certain experiment, or how people have thought about a model. It’s like having a very senior academic expert on hand to discuss your work with. That would be amazing.
I’m super interested in AI now. I did my PhD in financial economics but I’ve always been more of a stats/computer guy. There’s something about this moment we’re in that is simply amazing to me. I’m going to be following it more closely, and I’m spending my personal time learning more and more about deep learning (via fast.ai). I would be interested in pursuing AI research more aggressively after my postdoc at Stanford finishes, if only because I believe strongly that following your interests is never a bad idea.
If you are one of those people who is unimpressed with ChatGPT or whatever, or you want people to stop talking about it, I’m sorry. This stuff is big and amazing and there’s a very good reason why people won’t shut up. LLMs are not some neural net that recognizes a dog or a banana. It’s getting more amazing as time goes on, and I think it behooves many of us to think a little more deeply about how it impacts our lives going forward. It’s going to be cool as heck, guaranteed
The Go-Fast is a hobby project and will always be free. Subscribe if you like it!