Chapter 4 Importing Data
Real economic analysis requires real data. This chapter covers how to bring data into R from files on your computer and from external sources like the Federal Reserve.
4.1 Folders and File Paths
Before importing data, you need to set up an organized folder structure and understand how R finds files.
4.1.1 Setting Up a Project Folder
Create a main folder for your economics work. You can put this in Documents, Dropbox, or Google Drive. Inside your main folder, create two subfolders:
EC102/
├── programs/ # Your R scripts go here
└── data/ # Your data files go here
On Mac, you can create these in Finder. On Windows, use File Explorer. Or create them from R:
4.1.2 The Working Directory
R always operates from a “working directory”—a folder on your computer that R treats as its home base. To see your current working directory:
## [1] "/Users/calvinackley/Dropbox/textbooks/intro_to_R"
When you specify a file name without a full path, R looks in this directory.
4.1.3 Absolute vs Relative Paths
There are two ways to specify file locations:
Absolute paths give the complete location from the root of your file system:
- Windows:
"C:/Users/yourname/Documents/EC102/data/myfile.csv" - Mac/Linux:
"/Users/yourname/Documents/EC102/data/myfile.csv"
Relative paths specify location relative to the working directory:
"data/myfile.csv"looks for adatafolder inside the working directory"../other_folder/myfile.csv"goes up one level, then intoother_folder
Relative paths are generally preferred because they make your code portable—it will work on different computers as long as the folder structure is the same.
4.2 Importing CSV Files
CSV (comma-separated values) files are the most common data format. Let’s practice by downloading and importing a US macroeconomic dataset that we’ll use throughout this book.
First, download the example data to your data folder:
download.file(
"https://raw.githubusercontent.com/ackleycb/intro_to_R/main/data/us_macrodata.csv",
"data/us_macrodata.csv"
)Now import it using read.csv() from base R:
# Basic import
macro <- read.csv("data/us_macrodata.csv")
# With options
macro <- read.csv("data/us_macrodata.csv",
header = TRUE, # First row contains column names
stringsAsFactors = FALSE # Keep text as character, not factor
)
# View the first few rows
head(macro)The dataset contains annual US economic data from 1871-2024, including CPI, GDP, unemployment, and other key indicators.
The tidyverse provides read_csv() which is faster and has better defaults:
4.3 Importing Excel Files
Excel files (.xlsx or .xls) require the readxl package. The same macroeconomic data is available as an Excel file:
# Download the Excel version (note: mode = "wb" is required for binary files)
download.file(
"https://raw.githubusercontent.com/ackleycb/intro_to_R/main/data/us_macrodata.xlsx",
"data/us_macrodata.xlsx",
mode = "wb"
)Now import it:
install.packages("readxl") # Only needed once
library(readxl)
# Basic import (reads first sheet)
macro <- read_excel("data/us_macrodata.xlsx")
# View the result
head(macro)For Excel files with multiple sheets:
# See what sheets are available
excel_sheets("data/multi_sheet_file.xlsx")
# Specify which sheet by name
data <- read_excel("data/multi_sheet_file.xlsx", sheet = "2023")
# Or by sheet number
data <- read_excel("data/multi_sheet_file.xlsx", sheet = 2)
# Skip rows (useful when headers aren't in row 1)
data <- read_excel("data/multi_sheet_file.xlsx", skip = 3)4.5 Importing Stata Files
Economics research often uses Stata (.dta files). The haven package reads these:
install.packages("haven") # Only needed once
library(haven)
data <- read_dta("data/survey_results.dta")haven also reads SPSS (.sav) and SAS (.sas7bdat) files:
4.6 Importing from FRED
The Federal Reserve Economic Data (FRED) database contains thousands of economic time series. You can download data directly into R using an API.
4.6.1 Getting a FRED API Key
- Go to https://fred.stlouisfed.org/docs/api/api_key.html
- Create a free account
- Request an API key (it’s free and instant)
4.6.2 Using the Helper Function
This book provides a helper file with simplified functions for downloading FRED data. First, download and source it:
# Download the helper file (only need to do this once)
download.file(
"https://raw.githubusercontent.com/ackleycb/intro_to_R/main/R/econ_data_helpers.R",
"econ_data_helpers.R"
)
# Load the helper functions
source("econ_data_helpers.R")Now you can download FRED data easily:
# First time: include your API key
cpi <- get_fred("CPIAUCSL", api_key = "your_api_key_here")
# After the key is set, just use the series ID
unemployment <- get_fred("UNRATE")
gdp <- get_fred("GDP", start_date = "2000-01-01")The function returns a data frame with three columns: date, series_id, and value.
4.6.3 Common FRED Series
Here are some frequently used series for economics courses:
| Series ID | Description |
|---|---|
| CPIAUCSL | Consumer Price Index (All Urban Consumers) |
| UNRATE | Unemployment Rate |
| GDP | Gross Domestic Product |
| FEDFUNDS | Federal Funds Rate |
| DGS10 | 10-Year Treasury Rate |
| PAYEMS | Total Nonfarm Payrolls |
To see a full list of common series included in the helper file:
You can search for more series at https://fred.stlouisfed.org/
4.7 Importing from the Census Bureau
The tidycensus package provides access to US Census Bureau data, including the decennial census and the American Community Survey (ACS).
4.7.1 Getting a Census API Key
- Go to https://api.census.gov/data/key_signup.html
- Fill out the form to request a free API key
- The key will be emailed to you
4.7.2 Using tidycensus
install.packages("tidycensus") # Only needed once
library(tidycensus)
# Set your API key (only need to do this once per session)
census_api_key("your_api_key_here")Download population data by state from the ACS:
# Get total population by state (2022 5-year ACS)
state_pop <- get_acs(
geography = "state",
variables = "B01003_001", # Total population
year = 2022
)
head(state_pop)Download median household income by county:
4.8 Importing from IPUMS
IPUMS (Integrated Public Use Microdata Series) provides harmonized census and survey microdata from around the world. The ipumsr package helps import IPUMS extracts.
4.8.1 Creating an IPUMS Extract
Unlike the other data sources, IPUMS requires you to:
- Create an account at https://www.ipums.org/
- Select your variables and samples through their web interface
- Submit an extract request
- Download the data files when ready
IPUMS provides both a data file (.dat or .csv) and a DDI codebook file (.xml) that describes the data structure.
4.8.2 Using ipumsr
install.packages("ipumsr") # Only needed once
library(ipumsr)
# Read an IPUMS extract (you need both the data file and DDI file)
ddi <- read_ipums_ddi("data/usa_00001.xml")
ipums_data <- read_ipums_micro(ddi)
# View the data
head(ipums_data)
# See variable labels
ipums_var_info(ddi)The DDI file contains important metadata including variable labels and value codes, which ipumsr uses to properly format your data.
4.8.3 IPUMS Data Collections
IPUMS offers several data collections useful for economics:
| Collection | Description |
|---|---|
| IPUMS USA | US Census and ACS microdata |
| IPUMS CPS | Current Population Survey |
| IPUMS International | Census data from 100+ countries |
| IPUMS Time Use | American Time Use Survey |
| IPUMS Health Surveys | NHIS and MEPS data |
4.9 Importing from the World Bank
The WDI package provides access to the World Bank’s World Development Indicators—a comprehensive database of international economic and social statistics.
4.9.1 Using WDI
Download GDP per capita for multiple countries:
# GDP per capita (current US$) for selected countries, 2000-2023
gdp_data <- WDI(
country = c("US", "CN", "DE", "JP", "BR"),
indicator = "NY.GDP.PCAP.CD",
start = 2000,
end = 2023
)
head(gdp_data)Download multiple indicators at once:
4.9.3 Common World Bank Indicators
| Indicator | Description |
|---|---|
| NY.GDP.PCAP.CD | GDP per capita (current US$) |
| NY.GDP.MKTP.KD.ZG | GDP growth (annual %) |
| SP.POP.TOTL | Population, total |
| SP.DYN.LE00.IN | Life expectancy at birth |
| SL.UEM.TOTL.ZS | Unemployment (% of labor force) |
| FP.CPI.TOTL.ZG | Inflation, consumer prices (annual %) |
4.11 Exercises
Set up your project folder structure with
EC102/programsandEC102/datasubfolders. Download both the CSV and Excel versions of the US macrodata file to yourdatafolder.Import
us_macrodata.csvusingread.csv(). Usehead(),str(), andsummary()to explore the data. How many years of data are included?Import
us_macrodata.xlsxusingread_excel(). Verify that it contains the same data as the CSV version.Get a FRED API key and download the unemployment rate (
UNRATE) from 2000 to present. Compare the values to theunemploymentcolumn in the macrodata file.Using the macrodata, find the years with the highest and lowest inflation rates. (Hint: use
which.max()andwhich.min()on the inflation column.)Use
WDIto download GDP per capita for the G7 countries (US, UK, France, Germany, Italy, Japan, Canada) from 2010 to 2022. Which country had the highest GDP per capita in 2022?Use
WDIsearch()to find the indicator code for “inflation” and download inflation data for Brazil, Argentina, and Mexico from 2000-2023.