Michigan State University

Digital Text: Collection Development Guidelines

Anticipated Trends

Given that digital text is a preferred medium for knowledge encoding, storage, and transmission, interest in computational analysis of text from a wide variety of subject fields will likely grow. Technical barriers are few as there is relative consensus around what file formats and markup support analysis.

At present time there is wide variation in the ability for content providers to serve digital text in a manner amenable to computational analysis. Accordingly, similar levels of variation are exhibited with respect to added costs associated with text mining research. Encountered models thus far include price quoted per project, a quota on number of requests per defined time span (i.e. xxxx transactions per weekend), access to application programming interface (API) included with standard acquisition, and others offer permission to pull subscribed resources by a user defined method (i.e. wget, curl). 

In order to deal with demand in a scalable manner content providers will move toward making digital text available via application programming interface (API).

Restrictive terms of use and copyright claims will continue to pose the most significant challenges to computational analysis of text at scale.

Michigan State University