Simple example about how to read a MODIS HDF file using python and the pyhdf library (Note: To download automatically a MODIS granule on your local repertory, see Download MODIS granule HDF files from lads using python):
Note 1: the following tutorial has been written with python 2.7. To migrate the code to python 3.+, it is necessary to replace the print statement by a print function ( means print x to print(x) in the code) and iteritems() was removed in python3, so replace iteritems() by items() instead (at the bottom of the page you can find the code written in python 3).
Note 2: I work with the python anaconda distribution (if it helps you can find an installation guide here).
For example, let's consider the following MODIS HDF file : "MYD06_L2.A2007219.2010.006.2014053202546.hdf". To open the file:
from pyhdf.SD import SD, SDC
file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)
print file.info()
First line import pyhdf library. The command "print file.info()" gives (127, 14) meaning that the file contains 127 Scientific Datasets (SDS) to get the name of all SDS:
How to print all SDS names ?
datasets_dic = file.datasets()
for idx,sds in enumerate(datasets_dic.keys()):
print idx,sds
return
0 Cloud_Top_Height_Nadir_Night
1 Single_Scatter_Albedo_Ice
2 Radiance_Variance
3 Brightness_Temperature
4 Sensor_Azimuth
. .
. .
. .
123 Cloud_Top_Height
124 Cloud_Effective_Emissivity_Nadir
125 Cloud_Effective_Emissivity
126 Cloud_Effective_Radius_Uncertainty_1621
How to get the data ?
Since we know the exact name of all SDS, we can get the data stored in one SDS easily, example with the cloud_top_temperature_1km:
sds_obj = file.select('cloud_top_temperature_1km') # select sds
data = sds_obj.get() # get sds data
print data
return
[[11758 11565 10858 ..., 12100 12299 11902]
[11758 11565 10667 ..., 12100 12299 12100]
[11758 11565 10858 ..., 11902 12299 12100]
...,
[11820 11820 11820 ..., 12475 12633 12633]
[11643 11820 11820 ..., 12633 12633 12781]
[11496 11820 11820 ..., 12633 12633 12781]]
How to get the SDS attributes ?
For the cloud top temperature at 1km the data are formatted. So it is necessary to transform the data by using the information stored in the sds attributes. To read the attributes of the cloud_top_temperature_1km SDS:
import pprint
pprint.pprint( sds_obj.attributes() )
return
{'Cell_Across_Swath_Sampling': [1, 1354, 1],
'Cell_Along_Swath_Sampling': [1, 2030, 1],
'Geolocation_Pointer': 'External MODIS geolocation product',
'Parameter_Type': 'Output',
'_FillValue': -999,
'add_offset': -15000.0,
'long_name': 'Cloud Top Temperature at 1-km resolution from LEOCAT, Temperature from Ancillary Data at Retrieved Cloud Top Pressure Level',
'scale_factor': 0.009999999776482582,
'units': 'K',
'valid_range': [0, 20000]}
One can see that the cloud_top_temperature_1km SDS has two important attributes: 'add_offset' and 'scale_factor' that we can extract:
for key, value in sds_obj.attributes().iteritems():
print key, value
if key == 'add_offset':
add_offset = value
if key == 'scale_factor':
scale_factor = value
to then transform the data and get the cloud top temperature at 1km:
data = (data - add_offset) * scale_factor
print data
return
[[ 267.57999402 265.64999406 258.57999422 ..., 270.99999394 272.9899939
269.01999399]
[ 267.57999402 265.64999406 256.66999426 ..., 270.99999394 272.9899939
270.99999394]
[ 267.57999402 265.64999406 258.57999422 ..., 269.01999399 272.9899939
270.99999394]
...,
[ 268.19999401 268.19999401 268.19999401 ..., 274.74999386
276.32999382 276.32999382]
[ 266.42999404 268.19999401 268.19999401 ..., 276.32999382
276.32999382 277.80999379]
[ 264.95999408 268.19999401 268.19999401 ..., 276.32999382
276.32999382 277.80999379]]
How to extract info from a byte ?
A more complex case where multiple information are stored in one SDS such as in the "Cloud_Mask_1km" SDS (see MODIS C6 cloud optical products User Guide page 115).
sds_obj = file.select('Cloud_Mask_1km') # select sds
data = sds_obj.get()
print data.shape
return
(2030, 1354, 2)
2 bytes here (1byte = 8 bits)
Lets consider we want to read the "Cloud Mask Status Flag", "Cloud Mask Cloudiness Flag" and "Day/Night Flag" from the Cloud_Mask_1km:
import numpy as np
def bits_stripping(bit_start,bit_count,value):
bitmask=pow(2,bit_start+bit_count)-1
return np.right_shift(np.bitwise_and(value,bitmask),bit_start)
i = 576 # random pixel
j = 236
status_flag = bits_stripping(0,1,data[i,j,0])
day_flag = bits_stripping(3,1,data[i,j,0])
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])
print 'Cloud Mask determined: ', status_flag
print 'Cloud Mask day or night: ', day_flag
print 'Cloud Mask: ', cloud_mask_flag
return
Cloud Mask determined: 1
Cloud Mask day or night: 1
Cloud Mask: 0
meaning (1:Determined; 1:Day; 0:Confident Cloudy ). Explanations: the bits_stripping function allows to extract information for a given byte and needs in inputs: the position of the first bit and the number of bits used (Warning: python uses 0-based indexing ! meaning bits are indexes: 0,1,2,3,4,5,6,7 for 1 byte ). For exemple to get "day_flag": first bit position 3 and uses 1 bit: "bits_stripping(3,1,data[i,j,0] "
Example of source code in python 2.7:
#!/usr/bin/env python
from pyhdf.SD import SD, SDC
file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)
print file.info()
#----------------------------------------------------------------------------------------#
# print SDS names
datasets_dic = file.datasets()
for idx,sds in enumerate(datasets_dic.keys()):
print idx,sds
#----------------------------------------------------------------------------------------#
# get data
sds_obj = file.select('cloud_top_temperature_1km') # select sds
data = sds_obj.get() # get sds data
print data
#----------------------------------------------------------------------------------------#
# get attributes
import pprint
pprint.pprint( sds_obj.attributes() )
for key, value in sds_obj.attributes().iteritems():
print key, value
if key == 'add_offset':
add_offset = value
if key == 'scale_factor':
scale_factor = value
print 'add_offset', add_offset, type(add_offset)
print 'scale_factor', scale_factor, type(scale_factor)
data = (data - add_offset) * scale_factor
print data
#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km
sds_obj = file.select('Cloud_Mask_1km') # select sds
data = sds_obj.get()
pprint.pprint( sds_obj.attributes() )
print data.shape
#----------------------------------------------------------------------------------------#
# Extract info from a byte
import numpy as np
def bits_stripping(bit_start,bit_count,value):
bitmask=pow(2,bit_start+bit_count)-1
return np.right_shift(np.bitwise_and(value,bitmask),bit_start)
i = 576 # random pixel
j = 236
status_flag = bits_stripping(0,1,data[i,j,0])
day_flag = bits_stripping(3,1,data[i,j,0])
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])
print 'Cloud Mask determined: ', status_flag
print 'Cloud Mask day or night: ', day_flag
print 'Cloud Mask: ', cloud_mask_flag
Example of source code in python 3:
#!/usr/bin/env python
from pyhdf.SD import SD, SDC
file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)
print( file.info() )
#----------------------------------------------------------------------------------------#
# print SDS names
datasets_dic = file.datasets()
for idx,sds in enumerate(datasets_dic.keys()):
print( idx,sds )
#----------------------------------------------------------------------------------------#
# get data
sds_obj = file.select('cloud_top_temperature_1km') # select sds
data = sds_obj.get() # get sds data
print( data )
#----------------------------------------------------------------------------------------#
# get attributes
import pprint
pprint.pprint( sds_obj.attributes() )
for key, value in sds_obj.attributes().items():
print( key, value )
if key == 'add_offset':
add_offset = value
if key == 'scale_factor':
scale_factor = value
print( 'add_offset', add_offset, type(add_offset) )
print( 'scale_factor', scale_factor, type(scale_factor) )
data = (data - add_offset) * scale_factor
print( data )
#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km
sds_obj = file.select('Cloud_Mask_1km') # select sds
data = sds_obj.get()
pprint.pprint( sds_obj.attributes() )
print( data.shape )
#----------------------------------------------------------------------------------------#
# Extract info from a byte
import numpy as np
def bits_stripping(bit_start,bit_count,value):
bitmask=pow(2,bit_start+bit_count)-1
return np.right_shift(np.bitwise_and(value,bitmask),bit_start)
i = 576 # random pixel
j = 236
status_flag = bits_stripping(0,1,data[i,j,0])
day_flag = bits_stripping(3,1,data[i,j,0])
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])
print( 'Cloud Mask determined: ', status_flag )
print( 'Cloud Mask day or night: ', day_flag )
print( 'Cloud Mask: ', cloud_mask_flag )
References
Link | WebSite |
---|---|
MODIS C6 cloud optical products user guide | NASA |
pyhdf | sourceforge |
python pprint dictionary on multiple lines | stackoverflow |
Iterating over dictionaries using for loops in Python | stackoverflow |
How to print a dictionary line by line in Python? | stackoverflow |
Error “ 'dict' object has no attribute 'iteritems' ” | stackoverflow |