As the winner of NFAIS’ (National Federation of Advanced Information Services) Miles Conrad Award, Martin (“Marty”) Kahn, chairman of Code Ocean, gave the annual Miles Conrad Memorial Lecture at the NFAIS 2019 Annual Conference. Kahn has been an investor and executive in the information industry for some 40 years, starting with the online database vendor BRS in the 1980s. He was chairman of Ovid Technologies and of OneSource Information Services and CEO of ProQuest. He has been responsible for a variety of important initiatives that have shaped the information industry. After the lecture, we sat down for a conversation about information services’ past, present, and future. This NewsBreak is an edited and condensed version of our conversation.
Dave Shumaker: Marty, congratulations on receiving the Miles Conrad Award. One thing I noted from your lecture is that you haven’t always been in the information business. You’ve engaged in other industries as well. So what drew you to the information business in the first place?
Marty Kahn: Thanks, Dave. What drew me in was the goal of improving access to information for medical professionals. I started out in medical publishing in the late 1970s and early ’80s. At that time, it was evident that the print model of disseminating medical information wasn’t keeping up with the needs of the community. I was aware that Lexis had already emerged in the legal information business and was having a great deal of success. I thought that if lawyers, who were generally considered pretty technophobic, were embracing online full-text information retrieval, then why wouldn’t doctors, healthcare professionals, and bioscientists, who were perceived as more technically oriented? So I became keen to exit print publishing and develop better solutions for getting information to our customers.
I’ve been involved with other industries as well, but I’ve found the information business more satisfying. The information companies I’ve been involved with have had an important role in organizing, curating, and distributing content. Some have emphasized software; others have emphasized content. But both software and content have played a role in all of them.
Shumaker: One of the principles that you highlighted in your lecture was to put opportunities before efficiencies, or in other words, think in terms of seeking opportunities, rather than solving problems. Aside from your company, Code Ocean, where do you see opportunities in the information industry today?
Kahn: I’ve been struck by how little the delivery of business and professional information has been disrupted, compared to the consumer information industry. The delivery of general information has been so profoundly disrupted that it’s hard to reconstruct what the world was like before Google and all the rest of the new services. But in business and professional information, many of the incumbents are still doing pretty much the same things they did 20 or 25 years ago. In scholarly communication, open access is still a small percentage of all publication. The move toward preprints is a relatively small change. In the business information sector, people still pay a lot of money to vendors like Bloomberg, Thomson Financial, and others for business and financial information.
So I think there’s an opportunity for someone to create a major disruption in the business, professional, and scholarly information systems. If I were going to guess what the big change is going to look like, it might involve a set of standards, and an open network that people around the world can contribute to, with an authority or some other mechanism that enforces a degree of standardization. It might be something like the Apple App Store, only for business and professional purposes. Imagine an App Store for financial information, where anybody could submit, say, a model for stock analysis that “quants”—quantitative investment analysts—might use.
While it would be like the App Store in some ways, it would have much higher technical requirements. The reason the App Store works is that Apple maintains standards. You trust that an app you find there will work, and it won’t have any viruses and adhere to certain user interface principles. Nothing like that exists today for scholarly, business, and professional needs. If someone can invent it, it will open up the market and increase the pace of innovation. Users of information will have alternatives to the established sources. It’s hard to say what the specific apps would be. After all, how many of us could have imagined, just a few years ago, all the apps we have on our phones today? But I think the absence of this kind of marketplace has inhibited innovation.
Shumaker: That really would be disruptive. At the same time, as you mentioned, the general news and consumer information marketplace has been seriously disrupted in the past 20 years. The profession of journalism has been decimated. It seems like there’s a lot of work going on to “solve the problem” of misinformation and disinformation. Is there an opportunity to be found in the general news business?
Kahn: Think of the innovations that disrupted print journalism. Google figured out how to sell ads on its search results page and display news content to the reader for free, and it didn’t share the revenue with the media that were creating the content. Even before that, Craigslist killed classified newspaper advertising, and that was half of the revenue of the newspapers. That was a perfect case of free replacing paid.
Airbnb contributed to disrupting the newspapers too. In a way, it took over the “rooms to rent” part of the classified ads. But it reimagined the service as part of the sharing economy. The newspapers could have extended their traditional “rooms to rent” and done what Airbnb did, but they never realized that they could. That’s the kind of fresh thinking that finds opportunities and disrupts the traditional model.
Once again, I think that if we look at it as a problem, and not an opportunity, we inhibit our ability to innovate. If there’s going to be a solution, it will come from people who imagine it as a completely fresh opportunity. I can’t say what it might look like. Whatever the solution will be for news and consumer information, it will be something so revolutionary that we may not even realize it.
Shumaker: Of course, Airbnb added enforcement of standards when they did it, just as you were advocating for professional content a moment ago.
Kahn: Right. When I look back, there are things, like Airbnb, that work well that I never expected. Wikipedia’s another example. It’s an open system, almost all voluntary, but it has rigid protocols, enforces standards, and maintains a trust network. It might be a model for finding the opportunity in business and professional information too.
Shumaker: Let’s turn to your current venture, Code Ocean. You describe it as “a cloud-based computational reproducibility platform.” How is it different from, say, GitHub, and what’s the opportunity there?
Kahn: Code Ocean is meeting a need for people who might use GitHub, but aren’t really GitHub’s target audience. GitHub is really for software engineers—professional coders. It’s a repository for code—that’s the free service, where millions of people deposit code. But its business model—how it makes money—is by selling a more sophisticated set of capabilities that companies use to manage development projects and do version control.
Code Ocean is optimized for a different group of users: people who write code but aren’t primarily coders—like researchers or anyone who uses computational methods in their work. Our users are financial analysts, researchers in “hard” and social sciences—even fantasy football players. When these folks use GitHub, they often have problems. To use code, they have to download it, and often when they do that, they have problems getting it to run. They may have a different version of the language. There are dependencies. They may not have the right environment. The first thing Code Ocean does is to provide the environment, so they can run the code in the cloud. They don’t have to download the code, and they don’t have to have the right environment locally.
Shumaker: When I first researched Code Ocean, it wasn’t clear to me whether it also supported storing data. Does it?
Kahn: Yes. Both data and code can be stored in a container, called a “compute capsule.” By the way, we’re using open, standard protocols to manage all this. We use the same underlying open source Git protocols that GitHub and other services use. We add Docker for managing containers, and we also use Jupyter, another popular open source protocol.
We are also able to link with published articles based on the code and data. Most of the content on our website is linked. For example, we have an arrangement with IEEE to add a widget into articles in its database that embeds code into the published article. By the way, we assign a DOI to the compute capsule (the code and data) that’s independent of the DOI for the article. Over time, we may find that the usage pattern of the code and data differs from the usage of the published article. The separate DOI will allow us to track that.
Shumaker: So what you wind up with is a network of research artifacts, all connected—code, data, publications?
Kahn: Yes, and we’ve taken on the hardest and most important challenge first, which is interconnecting Git, Docker, and Jupyter and getting them to work together in a way that is transparent to the user. We’re doing this to enable researchers to modify code and run it with their data, or to use their code to analyze someone else’s data. This goes far beyond reading published articles. The articles report their findings in words, but the real research is in the code, the data, and the analysis.
One more thing: In making the code and data reusable, we’re also addressing the so-called reproducibility crisis in research.
Shumaker: How so?
Kahn: When researchers say that they can’t reproduce someone else’s study results, they’re not necessarily saying the results are not true. Sometimes that’s the case, of course. But often, they’re really saying that they just don’t know if the results are correct or not. The problem is that the code used in the analysis isn’t available, or the data aren’t available, so the analysis can’t be re-done. By providing usable code and data, Code Ocean can change that.
Shumaker: Marty, before we close, I’d like to ask about the role of librarians. When you were CEO of ProQuest, in an article published in the February 2011 issue of Information Today, you were quoted as saying that ProQuest is “going to rise or fall on what happens to libraries.” Back then, your company dealt primarily with librarians. Is the same true for Code Ocean? What’s happening to the role of librarians in the new networked world of research code, data, and publications?
Kahn: It was true for ProQuest then, and it’s still true today—90 to 99 percent of its dealings are with librarians. Code Ocean is much less a library product. When we go to an institution, we talk with research deans and information technology people—and also with the librarians. From time to time, we meet digital librarians who see themselves as outfitting a toolkit for researchers. But, unlike ProQuest, it’s a toolkit that they themselves aren’t likely to use.
Sometimes a university’s office of research drives the interest. The research deans are interested because Code Ocean makes it possible to verify that work was done and when it was done. Often, researchers create compute capsules as they are working, as part of their workflow. Each capsule is retained, and it provides a record of their work as of a given time. The IT people see their job as building a range of functionality for the researchers. They need to figure out where to host the functions that the researchers need, and we provide a way for them to optimize this functionality. So there are several groups we talk to, and they have different interests. But in some ways, the librarians are the right buyers in the institution who have a process, a method for purchasing the tools, and can bring these interests together.
Photo of Marty Kahn courtesy of Code Ocean