Data is a format and not a subject. The creation of digital data crosses all fields of study. Subjects related to research area strengths of the university and of local interest are prioritized.
A. Chronology of the subject
There are not chronological restrictions, subject to other relevant policies. Materials on both current and historical topics are considered.
B. Languages of the resources collected
English language materials are primarily collected. Foreign language materials will be considered, subject to determination of special skills needed to process the data and any associated costs.
C. Geography of the subject
There are no geographic restrictions, subject to other relevant policies. Materials related to MSU, the local area, and Michigan are a particular emphasis.
D. Format of the resources collected
Data will be evaluated based on size and data file characteristics. Larger, complex, and heterogeneous data file collections are more resource-intensive and will require careful consideration of available resources. A general guideline is that data under 100MB or 100 files is not considered to be resource-intensive, whereas data over 100MB or 100 files is considered to be resource-intensive. Standalone data sets are less resource intensive than a thematic collection of many data files.
Data must be complete and ready for distribution in its final or most useful form. Data will be preserved in the fidelity received; therefore high quality files are preferred. Data files may be reformatted for access as needed. File formats should meet the recommended file formats for content type as stipulated by the Digital Curation unit. Formats that do not meet recommended file formats may limit the Libraries’ ability to preserve data; this includes proprietary and obsolete formats. Processing of outdated file formats may incur additional costs which impact selection feasibility.
E. Date of publication of resources collected
Both current and historical data have potential value. Legacy data collections and new publications may be considered. The key factors are expected use and resources required to prepare data for the collection.
F. Authorship and Intellectual Property
Data collected under this policy is expected to be authored by at least one MSU researcher. The author must hold the copyright or secure the copyright. Proprietary data will not be considered. The author must sign a Depositor Agreement acknowledging the rights and responsibilities of the author and the Libraries.
Authors should understand that they may be required to work with the Libraries to facilitate adding their data to the collection and should be available to assist as needed. The deposit and curation process required to add digital data files to the collection is a complex process which may require ongoing communication until ingest is complete.
G. Documentation and Data Quality
Data should meet general quality standards established by disciplinary norms, including provision of adequate documentation and metadata, or documentation required to produce metadata. The Libraries’ are not able to provide editorial or peer review of the data.
Adequate documentation is required for data re-use and cataloging purposes. The data file(s) should be accompanied by documentation necessary for interpretation and re-use. A completed “readme” file may be requested of data authors. Documentation should include a bibliography of related publications in order to provide additional context for the data.
Data are intended for public open access. Confidential and sensitive information will not be collected. It is preferred to make data available for immediate access. Short embargo periods may be considered as part of an active curation plan with the goal of making the data publicly accessible.
I. Selection Responsibility
Based on the author, content, and subject, the appropriate subject selector librarian will evaluate the collection fit. Consultation with relevant specialist librarians will help to determine collection fit and feasibility based on additional criteria such as format and technical requirements and anticipated costs and resource requirements.