Access Raw 990 Data Lake via AWS Account
The following steps outline how to access the 990 Data Lake using an Amazon Web Services (AWS) account. This will lead you directly into the AWS S3 bucket where you can access raw 990 e-filed data and indices to navigate to the desired 990 XMLs.
Important Note!
This option requires the user to open a free AWS account – please note that you will not be charged for using the Data Lake. To access via a free AWS account, you will need to enter your billing information per AWS validation protocol. You can find out more at aws.amazon.com/free.
Step by step guide to accessing data via AWS account:
1. Visit aws.amazon.com
2. Click “Create Amazon Web Services (AWS) account”
3. Enter the required details and follow all the prompts – please note that AWS will request your CC information for validation purposes
4. After creating your AWS account, log in to your account, this will take you to the AWS console Landing Page
5. While logged into your AWS account, open the following link in a new tab:
Access Raw 990 Data Lake via Command Line (Terminal)
The following steps outline how to access the Data Lake with command line tools (CLI). This option is recommended for advanced users. Alternatively, we recommend that you access the Data Lake directly via AWS.
Important Note!
This option requires the user to have AWS CLI Tools installed In your Terminal. If you do not have AWS CLI Tools installed follow these instructions. This option will allow the user to access the files programmatically.
Step by step guide to accessing data via Command Line Terminal:
1. Open your terminal
2. To access the main bucket & list contents, type the following to access main bucket:
aws s3 ls gt990datalake-rawdata –no-sign-request
Note: You can learn more about no-sign-request parameter here
3. For any bucket sub directories use a similar command with the url from this table.
4. To download contents from a bucket to your local computer, use the
following prompt:
aws s3 cp gt990datalake-rawdata/{FromTableAbove} yourlocalpath
An example for downloading index would be:
s3://gt990datalakerawdata/Indices/990xmls/index_all_years_efiledata_xmls_created_on_2023-10-29.csv index.csv
For additional commands visit AWS CLI documentation
Access Raw 990 Data Lake via Browser
The following steps outline the option to access the Data Lake directly through your browser. This is a multi-step process where you will have to download sample indices (tables that document all the files in the dataset) and navigate through the indices to recreate the link to the desired files, which are housed on AWS.
Important Note!
This option allows you to download sample indices directly through your browser. Please note that these are multi-gigabyte files being downloaded directly onto your device. For more efficient access to the datasets, you can access the data directly with a free Amazon Web Services account via the “Access via AWS” option on the home page.
Step by step guide to accessing data via Browser:
1. Download the sample index you want by clicking on one of the following links:
Link to Download CSV File – Note this is a 2.1GB file:
Link to Download JSON File – Note this is a 8GB file:
2. Open index and use URL Field from any row to download individual XML file for that row
3. Open this sample URL in your browser to view/download individual XML file:
https://gt990datalake-rawdata.s3.amazonaws.com/EfileData/XmlFiles/201943299349100509_public.xml