Be specific about your topic so that you can narrow your search, but be flexible enough to tailor your needs to existing sources.
This is what you should be able to define:
Social Unit: This is the population that you want to study.
It can be...
Time: This is the period of time you want to study.
Things to think about...
Space: Geography or place.
There are two main types of geographic classifications...
Remember to define your topic with enough flexibility to adapt to available data!
Data is not available for every thinkable topic. Some data is hidden (behind a pay-wall for example), uncollected, unavailable. Be prepared to try alternative data.
Look within a data archive that collects within the general subject area that you are searching for.
Ask yourself: Who might collect and publish this type of data?
Then visit the organization’s website and see if you're right! Or, search for them as an author in the library catalog.
These are some of the main types of data producers:
The government collects data to aid in policy decisions and is the largest producer of data overall. For example, the U.S. Census Bureau, Federal Election Commission, Federal Highway Administration and many other agencies collect and publish data. To better understand the structure of government agencies read the U.S. Government Manual and browse FedStats. Government data is free and publicly available, but may require access through library resources or special requests.
Many independent non-commercial and nonprofit organizations collect and publish data that supports their social platform. For example, the International Monetary Fund, United Nations, World Health Organization, and many others collect and publish data. For more information about NGOs, visit Duke Libraries NGO Research Guide. Data from NGOs may be free or fee-based. The library subscribes to many NGO data resources, so be sure to check the library’s e-resources pages or catalog.
Academic research projects funded by public and private foundations create a wealth of data. For example, the Michigan State of the State Survey, Panel Study of Income Dynamics, American National Election Studies, and many other research projects collect and publish data. Much of this type of data is free and publicly available, but may require access through library resources. Access to smaller original research projects may be dependent upon contacting individual researchers.
Commercial firms collect and publish data as a paid service to clients or to sell broadly. Examples include marketing firms, pollsters, trade organizations, and business information. This information is almost always is fee-based and may not always be available for public release. The library does subscribe to some commercial data services, particularly through the business library.
Search for research studies based on secondary analysis of publicly available data sets.
Unfortunately, citation of research data is often incomplete. Sometimes the best you will get is the title of the data set used, but check to see if the data or a related publication are cited and follow it up. Don't commit this fallacy when you publish, cite your data.
Search for statistics and follow them to the source.
Try the search strategies for statistics detailed on the "Finding Statistics" tab of this guide. Where does the statistic you find come from? Can you track it down to the source survey or other data set?
Knowing when to call in reinforcements is important.
Depending on which search strategy you used, you may have already found the dataset file download link directly on a website. Or, you may have just a reference/citation to a dataset or producer. Here are some common ways to find the dataset files themselves.
Once you’ve chosen a data set that you believe will work, take care to carefully evaluate it. Is it appropriate? Does it come from an authoritative source? Does it fit your needs? Does it cover your Where, When, and Who or What requirements? Are you willing to compromise your requirements or manipulate the data to fit your needs? Always read the documentation and codebook to ensure that the analysis you are planning to do really measures what you want it to.
The Center for Statistical Training and Consulting (CSTAT) is the primary unit on campus that assists with data analysis. CSTAT is a professional service and research unit that aims to support research and provide training and consulting in statistics for faculty, staff and graduate students.
For additional on campus resources please consult the Data Analysis page.
This is a short list of helpful tutorials that are useful for learning more about the technicalities of secondary data manipulation and codebooks.