Data formats, groups and sources in geospatial technology; main data portals of NOAA, NASA, USGS, GIS Clearinghouses
================================================================
Basic Common GIS Formats
Below is a list of common and less common formats of data used in GIS analysis. Some data formats are being used solely in GIS (e.g. shapefiles, coverages) and some are used solely in remote sensing and earth science (e.g. NetCDF, HDF). Development of the software helps to integrate these data in easier way now.
Used abbreviations:
ESRI: Environmental Systems Research Institute, Redlands, CA
USGS: United States Geological Survey
NOAA: National Oceanographic and Atmospheric Administration
NASA: National Aeronautics and Space Administration
Related to ArcGIS software (ESRI Inc.):
-
Coverage (vector), original ESRI format, not much in use
-
Interchange Format (or Export Format, .e00), not much in use
-
Shapefile (vector), widely used by ESRI and other GIS software
-
ESRI GRID (raster), still in use in many ArcGIS applications
-
ESRI Geodatabases (file .gdb and personal, .mdb), standard compact format
USGS
-
Digital Line Graph (DLG) (vector), USGS, still can be found in many USGS sites, not much in general use
-
Spatial Data Transfer Standard (SDTS), USGS, still can be found in many USGS sites, not much in general use
-
Digital Raster Graph (DRG) (raster), USGS, still can be found in many USGS sites, not much in general use
NOAA, NASA
-
NetCDF (.nc)
-
HDF (.hdf5)
Both formats are very common for many meteorological, oceanographic and other earth science data applications
General standard raster/image formats:
Various image formats: .bil, .jpg, .tiff, .sid, etc. Very common to store raster data in ArcGIS and images from NASA and USGS (e.g. Landsat data).
================================================================
ESRI FORMATS:
Coverage (vector format) and GRID (raster format):
Both data formats have separate folders for spatial data and attribute tables
Example: ESRI coverage named “towns” stores municipal boundaries of a township.
Its structure consists of two sub-folders: /info /towns
/info is a standard folder for storing non-spatial information (i.e. attribute tables or “what is there?”)
/towns is a standard folder containing spatial information (i.e. features; or “where is it?”)
Directory with both /info and /towns is called “Workspace”
Workspace can have multiple coverages and grids; however /info directory will be common for all, it will store attribute data.
Copying, renaming and deleting datasets can be done only through ArcInfo interface or ArcCatalog.
NO DRAGGING AND MOVING of these directories!!! You will lose internal links between data and software will not be able to work with them.
================================================================
EXAMPLE OF FILE MANAGER FOLDER WITH WORKSPACE AND COVERAGES/GRIDS.


What is the workspace name?
What is the name of the first coverage or grid in the file manager window?
Why do we consider this subdirectory a workspace?
================================================================
SHAPEFILE FORMAT
Shapefile characteristics:
-
Vector format
-
Have separate files for spatial and non-spatial (attribute table) data
-
Does not follow or maintain topological rules
-
Does not automatically calculate arc lengths and polygon areas
-
Requires projection definition file “.prj” that stores information about spatial reference of your data; you can still view data without it but the software will issue a warning message and you will not be able to use them together with other data properly set with spatial reference.
Example: ESRI shapefile that stores municipal boundaries and is called “towns”.
towns.shp (spatial component, main file)
towns.shx (index file)
towns.dbf (non-spatial component, attribute dBase table, can open with EXCEL)
towns.prj (projection definition (spatial reference) file)
The minimum necessary number of shapefile components for viewing it in ArcGIS is three: .dbf, .shp, .shx.
Files can be ZIPed (WinZip), RARed (WinRAR), dragged and moved together, but NOT SEPARATED!!!
When compressing file make sure that ALL relevant files are inside your .zip or .rar file
Especially critical are .dbf, .shp and .shx
================================================================
EXAMPLE OF FILE MANAGER FOLDER WITH SHAPEFILE

Name spatial and non-spatial components of shapefile
Name shapefile component that stores spatial reference
================================================================
INTERCHANGE FILE (aka Export):
1. Compressed version of ESRI Coverage or Grid, similar to TAR, ZIP, RAR, etc.
2. Combines in one file (.e00) spatial data folders and attribute tables from /info folder.

3. Helps transfer COVERAGES or GRIDS easily over the internet
4. Contains only ESRI coverages or grids
5. Has to be IMPORTED to produce back coverage or grid. See below.

Example: ESRI export file that stores municipal boundaries and is called “towns”.
towns.e00
Can be moved, dragged, etc. outside of ESRI interfaces, however it has to be converted back to coverage or grid either via ArcCatalog or ArcMap or ArcTools or ArcInfo
Interchange files are used rare now but you can still find them on some governmental servers
================================================================
ESRI GEODATABASES:
-
Consist of data imported from variety of sources, such as shapefiles, coverages, grids, text files, images, etc. Like a closet.
-
Compact storage and handling of datasets, easy to transfer
-
Very efficient use for data processing by applying and devising set of rules and properties for all data within them.
File Geodatabases – Stored as folders in a file system (.gdb). Each dataset is held as a file that can scale up to 1 TB in size.
Personal Geodatabases – All datasets are stored within a Microsoft Access data file (.mdb), which is limited in size to 2 GB. More vulnerable to corruption since it depends on (and only supported by) the Microsoft windows file system management and security; however it is very compact and can be easily transferred as one file.
Additional Resources:
On ArcCatalog and Managing Databases:
================================================================
GEODATABASE EXAMPLE

================================================================
GENERAL GIS DATA PORTALS
==================================================================
PORTALS FOR ENVIRONMENTAL DATA ACCESS:
Precipitation:
-
-
For continuous precipitation data in equatorial and sub-equatorial zones use TRMM (only available until 2015): http://trmm.gsfc.nasa.gov/
-
Temperature:
-
-
Global Topography:
-
-
Soils:
-
-
Land Cover:
-
-
Climatic Data:
Population Data:
-
-
-
-
Stream Data:
NASA DAACs, GPM, HDF & NetCDF links:
DAAC
|
Distributed Active Archive Centers:
|
EOS
|
Earth Observing System:
|
EOSDIS
|
Earth Observing System Data Information System:
|
GPM
|
Global Precipitation Measurement:
|
IMERG
|
Integrated Multi-satellitE Retrievals for GPM
|
HDF
|
Hierarchical Data Format:
|
NetCDF
|
Network Common Data Format:
|
================================================================
-
Majority of GIS data are in compressed format: .rar, .zip, .gz, .tar., etc.; NASA stores mainly in .hdf (HDF5) and .nc (NetCDF)
-
You need to learn available compression tools for .rar, .zip, .tar.
-
I use most of the time WinRAR because so far it proved to be the best in handling binary data compression (I never had corrupted files using WinRAR); download it from here: http://www.rarlab.com/download.htm
-
When you download compressed file, keep in mind that you will need first to uncompress it and place its content in a specific directory (e.g. D:/yuri/data )
-
Outside of regular GIS and image formats ArcGIS can recognize .txt, .csv and .xls and .xlsx files. It does not recognize compressed formats! and does not uncompress your downloaded compressed data!!! (I had students spending virtually hours trying to read .rar files in ArcGIS J)