Chapter 4 Importing Data

Real economic analysis requires real data. This chapter covers how to bring data into R from files on your computer and from external sources like the Federal Reserve.

4.1 Folders and File Paths

Before importing data, you need to set up an organized folder structure and understand how R finds files.

4.1.1 Setting Up a Project Folder

Create a main folder for your economics work. You can put this in Documents, Dropbox, or Google Drive. Inside your main folder, create two subfolders:

EC102/
├── programs/    # Your R scripts go here
└── data/        # Your data files go here

On Mac, you can create these in Finder. On Windows, use File Explorer. Or create them from R:

# Change the path to match where you want your folder
dir.create("~/Documents/EC102/programs", recursive = TRUE)
dir.create("~/Documents/EC102/data", recursive = TRUE)

4.1.2 The Working Directory

R always operates from a “working directory”—a folder on your computer that R treats as its home base. To see your current working directory:

getwd()
## [1] "/Users/calvinackley/Dropbox/textbooks/intro_to_R"

When you specify a file name without a full path, R looks in this directory.

4.1.3 Absolute vs Relative Paths

There are two ways to specify file locations:

Absolute paths give the complete location from the root of your file system:

  • Windows: "C:/Users/yourname/Documents/EC102/data/myfile.csv"
  • Mac/Linux: "/Users/yourname/Documents/EC102/data/myfile.csv"

Relative paths specify location relative to the working directory:

  • "data/myfile.csv" looks for a data folder inside the working directory
  • "../other_folder/myfile.csv" goes up one level, then into other_folder

Relative paths are generally preferred because they make your code portable—it will work on different computers as long as the folder structure is the same.

4.1.4 Tips for File Paths

  • Use forward slashes / even on Windows (R handles the conversion)
  • Avoid spaces in folder and file names when possible
  • Keep your data files organized in a data subfolder of your project

4.2 Importing CSV Files

CSV (comma-separated values) files are the most common data format. Let’s practice by downloading and importing a US macroeconomic dataset that we’ll use throughout this book.

First, download the example data to your data folder:

download.file(
  "https://raw.githubusercontent.com/ackleycb/intro_to_R/main/data/us_macrodata.csv",
  "data/us_macrodata.csv"
)

Now import it using read.csv() from base R:

# Basic import
macro <- read.csv("data/us_macrodata.csv")

# With options
macro <- read.csv("data/us_macrodata.csv",
                  header = TRUE,           # First row contains column names
                  stringsAsFactors = FALSE # Keep text as character, not factor
                  )

# View the first few rows
head(macro)

The dataset contains annual US economic data from 1871-2024, including CPI, GDP, unemployment, and other key indicators.

The tidyverse provides read_csv() which is faster and has better defaults:

library(tidyverse)
macro <- read_csv("data/us_macrodata.csv")

4.3 Importing Excel Files

Excel files (.xlsx or .xls) require the readxl package. The same macroeconomic data is available as an Excel file:

# Download the Excel version (note: mode = "wb" is required for binary files)
download.file(
  "https://raw.githubusercontent.com/ackleycb/intro_to_R/main/data/us_macrodata.xlsx",
  "data/us_macrodata.xlsx",
  mode = "wb"
)

Now import it:

install.packages("readxl")  # Only needed once
library(readxl)

# Basic import (reads first sheet)
macro <- read_excel("data/us_macrodata.xlsx")

# View the result
head(macro)

For Excel files with multiple sheets:

# See what sheets are available
excel_sheets("data/multi_sheet_file.xlsx")

# Specify which sheet by name
data <- read_excel("data/multi_sheet_file.xlsx", sheet = "2023")

# Or by sheet number
data <- read_excel("data/multi_sheet_file.xlsx", sheet = 2)

# Skip rows (useful when headers aren't in row 1)
data <- read_excel("data/multi_sheet_file.xlsx", skip = 3)

4.4 Importing Text Files

For tab-delimited or other text files, use read.table() or read_delim():

# Tab-delimited file
data <- read.table("data/output.txt",
                   header = TRUE,
                   sep = "\t")  # Tab separator

# Using tidyverse
library(tidyverse)
data <- read_delim("data/output.txt", delim = "\t")

4.5 Importing Stata Files

Economics research often uses Stata (.dta files). The haven package reads these:

install.packages("haven")  # Only needed once
library(haven)

data <- read_dta("data/survey_results.dta")

haven also reads SPSS (.sav) and SAS (.sas7bdat) files:

spss_data <- read_sav("data/survey.sav")
sas_data <- read_sas("data/analysis.sas7bdat")

4.6 Importing from FRED

The Federal Reserve Economic Data (FRED) database contains thousands of economic time series. You can download data directly into R using an API.

4.6.1 Getting a FRED API Key

  1. Go to https://fred.stlouisfed.org/docs/api/api_key.html
  2. Create a free account
  3. Request an API key (it’s free and instant)

4.6.2 Using the Helper Function

This book provides a helper file with simplified functions for downloading FRED data. First, download and source it:

# Download the helper file (only need to do this once)
download.file(
  "https://raw.githubusercontent.com/ackleycb/intro_to_R/main/R/econ_data_helpers.R",
  "econ_data_helpers.R"
)

# Load the helper functions
source("econ_data_helpers.R")

Now you can download FRED data easily:

# First time: include your API key
cpi <- get_fred("CPIAUCSL", api_key = "your_api_key_here")

# After the key is set, just use the series ID
unemployment <- get_fred("UNRATE")
gdp <- get_fred("GDP", start_date = "2000-01-01")

The function returns a data frame with three columns: date, series_id, and value.

4.6.3 Common FRED Series

Here are some frequently used series for economics courses:

Series ID Description
CPIAUCSL Consumer Price Index (All Urban Consumers)
UNRATE Unemployment Rate
GDP Gross Domestic Product
FEDFUNDS Federal Funds Rate
DGS10 10-Year Treasury Rate
PAYEMS Total Nonfarm Payrolls

To see a full list of common series included in the helper file:

show_fred_series()

You can search for more series at https://fred.stlouisfed.org/

4.6.4 Downloading Multiple Series

To download several series at once:

# Download CPI, unemployment, and GDP together
macro_data <- get_fred_multiple(c("CPIAUCSL", "UNRATE", "GDP"))

This returns all series combined in “long” format, which is useful for plotting and analysis.

4.7 Importing from the Census Bureau

The tidycensus package provides access to US Census Bureau data, including the decennial census and the American Community Survey (ACS).

4.7.1 Getting a Census API Key

  1. Go to https://api.census.gov/data/key_signup.html
  2. Fill out the form to request a free API key
  3. The key will be emailed to you

4.7.2 Using tidycensus

install.packages("tidycensus")  # Only needed once
library(tidycensus)

# Set your API key (only need to do this once per session)
census_api_key("your_api_key_here")

Download population data by state from the ACS:

# Get total population by state (2022 5-year ACS)
state_pop <- get_acs(
  geography = "state",
  variables = "B01003_001",  # Total population
  year = 2022
)

head(state_pop)

Download median household income by county:

# Median household income for counties in California
ca_income <- get_acs(
  geography = "county",
  state = "CA",
  variables = "B19013_001",  # Median household income
  year = 2022
)

4.7.3 Finding Census Variables

The Census Bureau uses codes for variables. To search for variables:

# Load the list of available ACS variables
acs_vars <- load_variables(2022, "acs5")

# View it (or search in RStudio's viewer)
View(acs_vars)

4.8 Importing from IPUMS

IPUMS (Integrated Public Use Microdata Series) provides harmonized census and survey microdata from around the world. The ipumsr package helps import IPUMS extracts.

4.8.1 Creating an IPUMS Extract

Unlike the other data sources, IPUMS requires you to:

  1. Create an account at https://www.ipums.org/
  2. Select your variables and samples through their web interface
  3. Submit an extract request
  4. Download the data files when ready

IPUMS provides both a data file (.dat or .csv) and a DDI codebook file (.xml) that describes the data structure.

4.8.2 Using ipumsr

install.packages("ipumsr")  # Only needed once
library(ipumsr)

# Read an IPUMS extract (you need both the data file and DDI file)
ddi <- read_ipums_ddi("data/usa_00001.xml")
ipums_data <- read_ipums_micro(ddi)

# View the data
head(ipums_data)

# See variable labels
ipums_var_info(ddi)

The DDI file contains important metadata including variable labels and value codes, which ipumsr uses to properly format your data.

4.8.3 IPUMS Data Collections

IPUMS offers several data collections useful for economics:

Collection Description
IPUMS USA US Census and ACS microdata
IPUMS CPS Current Population Survey
IPUMS International Census data from 100+ countries
IPUMS Time Use American Time Use Survey
IPUMS Health Surveys NHIS and MEPS data

4.9 Importing from the World Bank

The WDI package provides access to the World Bank’s World Development Indicators—a comprehensive database of international economic and social statistics.

4.9.1 Using WDI

install.packages("WDI")  # Only needed once
library(WDI)

Download GDP per capita for multiple countries:

# GDP per capita (current US$) for selected countries, 2000-2023
gdp_data <- WDI(
  country = c("US", "CN", "DE", "JP", "BR"),
  indicator = "NY.GDP.PCAP.CD",
  start = 2000,
  end = 2023
)

head(gdp_data)

Download multiple indicators at once:

# GDP per capita and life expectancy for all countries
world_data <- WDI(
  country = "all",
  indicator = c("NY.GDP.PCAP.CD", "SP.DYN.LE00.IN"),
  start = 2020,
  end = 2022
)

4.9.2 Finding World Bank Indicators

Use WDIsearch() to find indicator codes:

# Search for indicators related to unemployment
WDIsearch("unemployment")

# Search for GDP indicators
WDIsearch("gdp per capita")

4.9.3 Common World Bank Indicators

Indicator Description
NY.GDP.PCAP.CD GDP per capita (current US$)
NY.GDP.MKTP.KD.ZG GDP growth (annual %)
SP.POP.TOTL Population, total
SP.DYN.LE00.IN Life expectancy at birth
SL.UEM.TOTL.ZS Unemployment (% of labor force)
FP.CPI.TOTL.ZG Inflation, consumer prices (annual %)

4.10 Checking Your Import

After importing data, always verify it loaded correctly:

# Dimensions
dim(data)

# First few rows
head(data)

# Structure
str(data)

# Summary statistics
summary(data)

4.11 Exercises

  1. Set up your project folder structure with EC102/programs and EC102/data subfolders. Download both the CSV and Excel versions of the US macrodata file to your data folder.

  2. Import us_macrodata.csv using read.csv(). Use head(), str(), and summary() to explore the data. How many years of data are included?

  3. Import us_macrodata.xlsx using read_excel(). Verify that it contains the same data as the CSV version.

  4. Get a FRED API key and download the unemployment rate (UNRATE) from 2000 to present. Compare the values to the unemployment column in the macrodata file.

  5. Using the macrodata, find the years with the highest and lowest inflation rates. (Hint: use which.max() and which.min() on the inflation column.)

  6. Use WDI to download GDP per capita for the G7 countries (US, UK, France, Germany, Italy, Japan, Canada) from 2010 to 2022. Which country had the highest GDP per capita in 2022?

  7. Use WDIsearch() to find the indicator code for “inflation” and download inflation data for Brazil, Argentina, and Mexico from 2000-2023.