LegEx is driven by an underlying MySQL database with three primary tables containing information about members, bills, and bill actions. The following codebook describes the data as available for download as well as the output from the python scripts included in this repository. Any issues with existing data should be reported to
Each of these tables is available to download separately as a text file, and can be linked together based on the relationships outlined in the table below:
In narrative form, each member has a unique
thomas id number and there is one record for each member in each
cong. The bills table also identifies the member by
SpThomasID (same as
idNew field in
bills is linked to the
billID variable in
These variables are common across all three tables:
bill: bill number
Cong: congress of bill
billtypeor BillType: type of bill or resolution
- Bills (binding, signed by President)
hr– House Bill
s– Senate Bill
- Resolutions (non-binding, not signed)
hres– House Simple Resolutions
hconres– House Concurrent Resolutions
hjres– House Joint Resolutions
sres– Senate Simple Resolutions
sconres– Senate Concurrent Resolutions
sjres– Senate Joint Resolutions
This table contains one record for each Member of Congress for every term in which they served. All data is from the congress-legislators github, unless otherwise noted.
id: primary key
ICPSR: an identifier for each legislator commonly used, though not a link for this data.
thomas: the congress.gov identifier for each legislator. This is the primary identifier used in this database for a member.
govtrack: yet one more identifier for each member
first: member’s first name
last: member’s last name
gender: member’s gender (F, M)
type: rep--Representative; or sen--Senator
start: date member took office in current term
end: date member left office in current term
district: numeric identifier for house districts, 0 for Senate
class: Senate indicator of which election cycle member is in. Years given in reference to the current terms below, but can be extrapolated back by subtracting units of six years.
1: Reelected in 2012
2: Up for reelection in 2014
3: Up for reelection in 2016
party: Party of legislator
simpleParty: Generic party identification (
DW1: numeric representation approximating ideology, the first dimension score from Poole and Rosenthal’s DW-Nominate.
0,1] a 1 indicates that the listed DW1 value is an estimate based on party mean for the given Congress because the actual DW1 value was not available for this member.
0,1] Was member the chair of any committee in a given congress?
0,1] Was member the ranking minority member of any committee in a given congress?
- LeadCham: [
0,1] Did member lead the chamber during a given congress (House Speaker or Senate Majority Leader)?
0,1] Was member the chair of any subcommittee in a given congress?
0,1] Was member the ranking minority member of any subcommittee in a given congress?
0,1] Was any of this member’s data manually updated? Most manual updates due to odd names or missing id numbers.
Bills: All bill data was scraped from thomas.loc.gov via congress github, with exceptions noted below.
idNEW: primary key, also the link with actions as
IntrDate: Date bill was introduced.
ShortTitle: preferred title to display, when NULL, use OfficialTitle
OfficialTitle: second choice for title (much longer)
PopTitle: not often used and not very descriptive, but provided by the underlying data.
SpThomasID: the Thomas (congress.gov) id for the bill’s sponsor, also the link to the members table in combination with
SpDist: All characteristics of bill sponsor (also available in
UpdatedAt: Provided by data source, indicates date the bill’s record was most recently updated.
CoSpThID: a comma delimited list of thomasIDs for cosponsors
MinorBill: a filter for bills considered “minor” in nature so that they may be excluded (post office bills, land transfers, etc).
compLaw: when not NULL, indicates a companion bill that this bill was “folded into”. This bill should be considered law at the same time as the identified companion bill. Scraped separately from Thomas for the 103rd-112th congresses.
Major: (only partially available for current Congress) Policy Agendas major topic code, from the Congressional Bills Project.
Minor: (only partially available for current Congress) Policy Agendas minor topic code, from the Congressional Bills Project.
0,1] Was sponsor chair of committee of referral?
0,1] Was sponsor ranking minority member of committee of referral?
0,1] Was sponsor a member of a committee of referral?
0,1] Was sponsor a subcommittee chair for a committee of referral?
0,1] Was sponsor a subcommittee ranking minority member for a committee of referral?
0,1] Was bill sponsored by a member of the majority?
0,1] Is this bill/resolution from the Senate?
commRefs: A comma-delimited list of committees the bill was referred to, using congress.gov committee abbreviations described below.
URL: URL for official
beta.congress.govpage for bill.
1= bill] Is this a bill?
Note: Committee data comes from the following sources, and was linked to bill referral data in order to create the bill-level variables. A crosswalk (.csv) is available which provides various labeling schemes used for congressional committees by different organizations.
- Full Committee Membership Data, 93rd-112th: Charles Stewart’s Congressional Data
- Full and Sub Committee Membership Data, 113th: congress-legislators github
- Subcommittee membership data, 101-105th: Congressional Bills Project
- Subcommittee membership data, 106th-112th: Congressional Quarterly
This is the same data that is listed in the “all actions” page for a bill at beta.congress.gov, but not every action is included. In particular, we did not capture subcommittee actions.
actionID: primary key for each bill action
billID: sequential bill identifier, tied to billsNEW
acted_at: date action occurred
loc: location of action (described in separate pdf) NOTE: Calendar actions are only available from the 97th Congress to present for the Senate and 101st to present for the House.
status: type of action (described in separate pdf)
actno: for diagnostic purposes, indicates sequential number of action in
subbill: this is an integer indicator that notes when a bill “splits off” into multiple sub-bills by being referred to multiple committees. It is only applied to actions at the committee level. The first committee continues in the 0 subbill, while additional referrals will lead to subbills 1, 2, 3, etc.