Over the next 2 years, Dow Jones, a subsidiary of News Corp. (www.newscorp.com), will be moving the bulk of its online content to the NoSQL software platform of MarkLogic Corp. Factiva will be the first of three services on the Dow Jones digital network to shift; the others—WSJ.com and Dow Jones Financial Services—will follow. At present, no specific dates have been set for the moves, but Factiva should have moved over “well before the end of 2013,” according to Alisa Bowen, head of product at Dow Jones. The Factiva segment of Dow Jones services sees this as a significant investment in the future of a service begun late the last century as Dow Jones News Retrieval.
At this point, there are no specifics on how the new platform will change existing services. However, a unified database encompassing a full range of content and formats working under a highly sophisticated, standardized software developed by an established software house with significant experience in dealing with publishers and media leaders augurs well for the future of Factiva and other Dow Jones services.
The new search technology provides standardized search technology and should give customers access to a deeper, broader search capability and the ability to mine and analyze content more quickly and effectively. It should also enhance the user experience for all of Dow Jones’ global offerings. MarkLogic supports more than 200 languages in its existing services with 14 languages receiving advanced language support including stemming, tokenization, and collation rules to enable more precise, language-specific search.
Bowen pointed out, “Standardized search technology not only provides a better experience for our customers, but also gives us a superior platform for developing new content applications and products across the company.”
Georgene Huang, head of Factiva, added, “We are exceptionally well positioned to manage ever-growing volumes and types of content ranging from traditional publishers to social media.”
Factiva contains more than 36,000 sources at present, including (of course) The Wall Street Journal in all its versions, the Dow Jones financial newswires, interactive graphics, trade journals, and other sources of interest to business and information professionals. Its archives are massive.
Launched in 1996, WSJ.com is a leading provider for consumers of business and financial news, analysis, and information on the web. It is the flagship site of The Wall Street Journal Digital Network. An early adopter of the online premium content business model, the site currently has more than 1 million subscribers while averaging more than 40 million total monthly visitors worldwide. Internationally, there are now 11 editions of WSJ.com in eight languages, including German, Chinese, Japanese, Portuguese, and Spanish. International traffic has increased from 15% of The Wall Street Journal’s total traffic in 2008 to 35% this year so far. Visitors have more than doubled over the past 5 years. WSJ.com is also home to WSJ Live, the company’s digital video initiative, which generates more than 4 hours of live video each day—including breaking news, live shows and special features—and an archive of on-demand content.
Dow Jones Financial Services provide key content and access for institutional sales and traders, wealth managers, investment bankers and managers, online traders, private bankers, and investors. It provides real-time news, commentary, and analysis, drawing on exclusive content generated by 2,100 journalists located around the globe. Financial Services includes News Analytics, WSJ Realtime, Dow Jones Newswires, and Dow Jones Private Markets.
Bowen explained the choice of MarkLogic: “We considered it the best in class at search technology with a rapid deployment approach, broad capabilities for users, and the ability to process vast quantities of news and information very quickly.” Founded more than a decade ago, MarkLogic has built a next-generation enterprise NoSQL (Not Only SQL) database software platform that services many industries worldwide with special strength in handling Big Data.
For more information on MarkLogic's customers, go to www.marklogic.com/customers/. In the Case Studies section, you can view services they have offered to publishers, such as Conde Nast, Elsevier, Oxford University Press, and Reed Business Information. You can even see videos on two publisher clients—the Press Association and McGraw Hill. If you want to test the software yourself, MarkLogic offers a free version of its software at http://developer.marklogic.com/express.
It is too early to tell what the change will mean to users specifically. Decades of adding features, improving components, inserting software fixes, and tweaking will be replaced by an integrated XML-document based service that is fast, agile, and scalable. Computer archaeologists may recall IBM STAIRS, the software upon which the earliest service was built. The new platform will support more formats than XML (e.g., image, audio, video, application-specific documents, and other rich file formats).
As for what to expect, Bowen emphasized that although the database would be integrated, the search experience would differ for different products due to the differing nature of the users. “For example, Factiva has tools for the business professional with complex queries. For them the emphasis and priority is actively managing very customized search queries. At the other extreme, WSJ.com emphasizes news and current awareness for consumers. For them, searching is tuned to more recent and breaking news. Consumers have less interest in or time to customize queries; instead good search results demand the most common denominator and common topics. The different channels will be engineered for different results though both use the same platform.”
One definite addition planned will include social media as an option, including blogs and Twitter. When asked about all those abbreviations used in tweets, Bowen admitted that they only “use keywords and ticker symbols, and twittersphere handles, but work on building a taxonomy for company names and geography is a work in progress. The real issue of social media is the huge volume and rapid pace, creating huge noise. Some of our customers, like public relations, want everything, while others only want well established sources. So we’ll only make it an option.”
Matt Turner, chief technologist, media solutions at MarkLogic, described the process involved in working with clients. The clients and MarkLogic staff work together to decide which elements of the NoSQL platform will apply to which levels of service. Ultimately, he said, “there will be a huge reduction in complexity and a huge increase in functionality.” He considered the process a form of partnership.
If Factiva is looking for suggestions, on a personal note, I noticed that one option available in the NoSQL searching package includes a personal favorite of mine, a feature I only recall as working in LexisNexis (i.e., capitalization). Although Dow Jones may assume that everybody has memorized their ticker symbols, it’s nice to be able to find “Apple-the-company” by just using an initial capital instead of mucking through an apple orchard. Initial caps find proper nouns of all type (e.g., the month of May instead of Mother-May-I). Just a thought.