990 Data Lake

The first iteration of GivingTuesday’s 990 Data Infrastructure project is a collaborative data lake of clean standardized 990 data in XML format. This is the rawest form of 990 data in GivingTuesday’s 990 Data Infrastructure. If users do not require the raw data and would benefit from processed data, we recommend users to first consult the data marts for a more accessible option.

The following table outlines the hierarchy of folders and content for the raw data layer, you can access the following folders of the data lake through AWS or CLI:
Main bucket link:
https://us-east-1.console.aws.amazon.com/s3/buckets/gt990datalake-rawdata
region: us-east-1 virginia


1. Access Raw 990 Data Lake via AWS Account

The following steps outline how to access the 990 Data Lake using an Amazon Web Services (AWS) account. This will lead you directly into the AWS S3 bucket where you can access raw 990 e-filed data and indices to navigate to the desired 990 XMLs.

Important Note!

This option requires the user to open a free AWS account – please note that you will not be charged for using the Data Lake. To access via a free AWS account, you will need to enter your billing information per AWS validation protocol. You can find out more at aws.amazon.com/free.

Step by step guide to accessing data via AWS account:

2. Click “Create Amazon Web Services (AWS) account”

3. Enter the required details and follow all the prompts – please note that AWS will request your CC information for validation purposes

4. After creating your AWS account, log in to your account, this will take you to the AWS console Landing Page

5. While logged into your AWS account, open the following link in a new tab:


2. Access Raw 990 Data Lake via Command Line (Terminal)

The following steps outline how to access the Data Lake with command line tools (CLI). This option is recommended for advanced users. Alternatively, we recommend that you access the Data Lake directly via AWS.

Important Note!

This option requires the user to have AWS CLI Tools installed In your Terminal. If you do not have AWS CLI Tools installed follow these instructions. This option will allow the user to access the files programmatically.

Step by step guide to accessing data via Command Line Terminal:

1. Open your terminal

2. To access the main bucket & list contents, type the following to access main bucket:

aws s3 ls gt990datalake-rawdata –no-sign-request

Note: You can learn more about no-sign-request parameter here

3. For any bucket sub directories use a similar command with the url from this table.

4. To download contents from a bucket to your local computer, use the
following prompt:

aws s3 cp gt990datalake-rawdata/{FromTableAbove} yourlocalpath

An example for downloading index would be:

s3://gt990datalakerawdata/Indices/990xmls/index_all_years_efiledata_xmls_created_on_2023-10-29.csv index.csv

For additional commands visit AWS CLI documentation