Dan Joldzic, CFA: Natural Language Processing in a Big Data World
By Paul McCaffrey
“We are living in a Big Data World and no single analyst or team of analysts can capture all the information on their positions.” — Dan Joldzic, CFA
Big data, artificial intelligence (AI), machine learning, natural language processing (NLP).
For several years now, we’ve heard how these technologies will transform investment management. Taking their cue, firms have invested untold capital in research in hopes of converting these trends into added revenue.
Yet for many of us, these technologies and what they can bring to the investment process remain cloaked in mystery. And that mystery has evoked existential fears: What do these developments portend for the future of human advisers? Who will pay a human to do what technology can do for free? And what about the risk of overfitting, or the black box effect? If an application generates alpha — or fails to — and we can’t explain why, we are hardly helping our firms, our clients, or ourselves.
Nevertheless, despite such trepidations, the value-add of these technologies has been made clear. AI pioneers have leveraged these innovations and generated impressive results, particularly when these technologies function in tandem with human guidance and expertise.
With that in mind, we wanted to zero in for a closer, granular look at some of the more noteworthy and successful iterations of AI-driven applications in investment management. And that brought us to Alexandria Technology and its use of NLP. Alexandria has been at the leading edge of NLP and machine learning applications in the investment industry since it was founded by Ruey-Lung Hsiao and Eugene Shirley in 2012. The firm’s AI-powered NLP technology analyzes enormous quantities of financial text that it distills into potentially alpha-generating investment data.
For a window into the firm’s methods and philosophy and for insight on progress in the financial technology space more generally, we spoke with Alexandria CEO Dan Joldzic, CFA.
What follows is a lightly edited transcript of our conversation.
CFA Institute: First off, for the uninitiated, how would you define artificial intelligence and natural language-processing?
Dan Joldzic, CFA: Natural language processing (NLP) is the classification of text, where the goal is to extract information from the text. Text classification can be done using rule-based approaches or artificial intelligence. So, the AI component is not necessary for NLP.
Rule-based approaches are basically hard-coding rules or phrases to look up within text. This is also known as a dictionary approach. For example, if I want to extract sentences with revenue, I can simply look for the word “revenue” as a rule.
With a rule-based approach, a word or phrase needs to be manually introduced into the dictionary by a human / researcher. When it comes to AI approaches, you are, in essence, allowing software to create its own dictionary. The machine is detecting words that occur together in sentences to form phrases, and then which phrases occur within the same sentence to form context. It provides for a much deeper understanding of text.
What attracted you to the AI / NLP space in general and to Alexandria in particular?
Data analysis is just one of the things I really like to do. Prior to Alexandria, I was a quantitative research analyst at AllianceBernstein where exploring data was part of my day to day. When it came to NLP, the one thing that was really exciting was exploring new types of data. Text classification was a new type of data set that I hadn’t worked with before, so there were all of these potential possibilities I couldn’t wait to dig into.
As for Alexandria, I was fortunate enough to meet our chief scientist, Dr. Ruey-Lung Hsiao, who was doing incredible classification work on genomic sequencing. And if he could build systems to classify DNA, I was fairly certain we could do a great job classifying financial text.
How can NLP applications inform the investment process? Where are they applied and where have they had the most success?
We are living in a Big Data World and no single analyst or team of analysts can capture all the information on their positions. Natural language processing can first help by reading and analyzing massive amounts of text information across a range of document types that no analyst team can read on their own. Capturing this information and standardizing the text for companies, subject matter, and even sentiment becomes the first step. The next step is identifying if the text has value. Once text is transformed to data, you can begin to see which sources can predict future price movements and which ones are noise. This allows analysts to use the good sources to improve performance, and potentially cut costs on the non-performing sources.
Let’s take two examples: First, let’s say you’re running one of your NLP applications on an earnings call. What are you looking for? What are the potential red flags or green flags you hope to uncover?
The goal of our NLP is to identify fundamentally driven information. It is not enough for a company spokesperson or CEO to say, “Our Company is the best” or “We think we are doing really well.” We focus on statements that impact a company’s bottom line. Are costs rising? Are they rising more or less than expected? It is not enough to look at statements in isolation. You need to focus on the context. For example, “Our revenue was down 10% for the quarter, which is much better than we were expecting.” Many, if not most, current NLP systems may misconstrue this as a negative phrase in insolation. But it is in fact a positive phrase, if one accurately comprehends the context.
Same question but now the NLP is analyzing a Wall Street Bets–type message board. What do you have your eye out for?
For one, our NLP had to learn a new language of emoji. You don’t come across rocket ships and moons and diamonds in earnings calls. So emojis need to be incorporated into our NLP’s contextual understanding. In addition, slang and sarcasm are much more prevalent in chat rooms. So you cannot use a direct interpretation of a given word or phrase. But here again is where context matters.
Without necessarily naming names, can you walk me through an example of how Alexandria’s NLP was applied in an investment context and uncovered a hidden source of alpha?
The real power of NLP and big data is capturing information on a large panel of companies, countries, or commodities. So not naming specific names becomes a very good application, in that we don’t have to start with a pre-conceived company to explore. We can apply our NLP on something like 500 companies in the S&P or 1,000 companies in the Russell and identify positive trends within a subset of companies. We have found that the top 100 companies with positive statements in the S&P 500 outperform the index by over 7% per annum.
And this is just scratching the surface. We work with a wide range of investors, from the most prominent investment managers and hedge funds in the world to smaller boutiques. Our clients are able to find alpha for a wide range of asset classes across various trading horizons. Whether they are short-term focused or long-term, fundamental, quantamental, or quantitative, the alpha potential is real and measurable. We work with all our clients to ensure they are realizing the maximum improvement in alpha and information ratios within their specific investment approach.
NLP applications in investing have moved from the obvious applications, on earning calls, financial statements, etc., to assessing sentiment in chat rooms and on social media. What do you see as the next frontier in NLP in investing?
It is still early innings for NLP applications. We started with news in 2012 based on the idea that everyone is paying for news in some form and using 1% or less of their news spend. Dow Jones publishes 20,000-plus articles per day, so it was very hard to capture all that information before NLP. Calls and filings were a necessary expansion because of the deep insight you get on companies from these documents. We still have a lot more to go with social media. At the moment, we are mostly capturing chat rooms that are geared toward investing. There is a much larger discussion happening about a company’s products and services that are not in these investing rooms. The larger the panel you start to capture, the more insight you can have on a company, before it even makes it to Wall Street Bets.
Tele-text is another information-rich source. Bloomberg or CNBC telecasts are not analyzed for information value. Is the panel discussion on a given company or theme really helpful? We can actually measure if it is.
Beyond that, firms have so much internal text that we would expect to have a lot of value, from email communication to servicing calls or chats.
And what about concerns that these applications could render human advisers obsolete? How do you see these applications replacing / complementing human advisers?
Our systems are more automated intelligence than artificial intelligence. We are trying to learn from domain experts and apply their logic to a much larger panel of information. Our systems need analysts and advisers to continue to identify new themes and trends in markets.
And as to the concern of making human advisers obsolete, we are not the investment manager or investment process on our own. We serve as an input and enhancement to our clients’ various investment strategies. We do not replace what they do. Quite the opposite, we enhance what they already do and help them do it better from both an efficiency standpoint and from a risk and return perspective.
In short, we are a tool to help investment professionals, not replace them.
And for those who are interested in pursuing a career in this space, what advice do you have for them? What type of person and what type of skills are required to succeed in the space?
I think it is fair to say that you need to be analytical, but more than that, I have found mental curiosity becomes a big differentiator with engineers. There are many ways to solve a problem, and there are various open-source tools you can use for NLP.
There are engineers that will use open-source tools without really understanding them too well. They get some data and go right into the analytics. The engineers we have found to be more successful think about how the NLP is operating, how it can be made better, before going straight to the analytics. So it really takes curiosity and creativity. This is not simply a math problem. There is some art involved.
Anything I haven’t asked that I should have?
I think one potential question would be: Are people actually using these tools? The short answer is yes, but we are still in the early days of adoption. At first, NLP and big data were a natural fit for systematic strategies, but there is still some reluctance as far as how these tools can be trusted. The response is fairly simple, in that we have tools to allow for transparency where you can check the accuracy of the classification. The next question then becomes, How does this work so well? That can be harder to explain at times, but we are using very accurate classification systems to extract insights from text, which tends to be from a fundamental perspective.
But NLP is not just a quantitative tool. Discretionary users can get even more insight on the companies or industries they cover and also screen the larger sector or universe that is not at the top of their conviction list. One response we hear from time to time is: “You can’t possibly know more about a company than I do.” We would never claim we do, but once you turn text to data, you can start plotting trends over time to help inform decisions. To your earlier question, we will never replace the deep knowledge these analysts have, but we can be a tool to leverage that knowledge on a larger scale.
Thanks so much, Dan.