How to read a MODIS HDF4 file using python and pyhdf ?


Simple example about how to read a MODIS HDF file using python and the pyhdf library (Note: To download automatically a MODIS granule on your local repertory, see Download MODIS granule HDF files from lads using python):

Note 1: the following tutorial has been written with python 2.7. To migrate the code to python 3.+, it is necessary to replace the print statement by a print function ( means print x to print(x) in the code) and iteritems() was removed in python3, so replace iteritems() by items() instead (at the bottom of the page you can find the code written in python 3).

Note 2: I work with the python anaconda distribution (if it helps you can find an installation guide here).

For example, let's consider the following MODIS HDF file : "MYD06_L2.A2007219.2010.006.2014053202546.hdf". To open the file:

from pyhdf.SD import SD, SDC

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)

print file.info()

First line import pyhdf library. The command "print file.info()" gives (127, 14) meaning that the file contains 127 Scientific Datasets (SDS) to get the name of all SDS:

How to print all SDS names ?

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print idx,sds

return

0 Cloud_Top_Height_Nadir_Night
1 Single_Scatter_Albedo_Ice
2 Radiance_Variance
3 Brightness_Temperature
4 Sensor_Azimuth
. .
. .
. .
123 Cloud_Top_Height
124 Cloud_Effective_Emissivity_Nadir
125 Cloud_Effective_Emissivity
126 Cloud_Effective_Radius_Uncertainty_1621

How to get the data ?

Since we know the exact name of all SDS, we can get the data stored in one SDS easily, example with the cloud_top_temperature_1km:

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data
print data

return

[[11758 11565 10858 ..., 12100 12299 11902]
 [11758 11565 10667 ..., 12100 12299 12100]
 [11758 11565 10858 ..., 11902 12299 12100]
 ..., 
 [11820 11820 11820 ..., 12475 12633 12633]
 [11643 11820 11820 ..., 12633 12633 12781]
 [11496 11820 11820 ..., 12633 12633 12781]]

How to get the SDS attributes ?

For the cloud top temperature at 1km the data are formatted. So it is necessary to transform the data by using the information stored in the sds attributes. To read the attributes of the cloud_top_temperature_1km SDS:

import pprint

pprint.pprint( sds_obj.attributes() )

return

 {'Cell_Across_Swath_Sampling': [1, 1354, 1],
  'Cell_Along_Swath_Sampling': [1, 2030, 1],
  'Geolocation_Pointer': 'External MODIS geolocation product',
  'Parameter_Type': 'Output',
  '_FillValue': -999,
  'add_offset': -15000.0,
  'long_name': 'Cloud Top Temperature at 1-km resolution from LEOCAT, Temperature from Ancillary Data at Retrieved Cloud Top Pressure Level',
  'scale_factor': 0.009999999776482582,
  'units': 'K',
  'valid_range': [0, 20000]}

One can see that the cloud_top_temperature_1km SDS has two important attributes: 'add_offset' and 'scale_factor' that we can extract:

for key, value in sds_obj.attributes().iteritems():
    print key, value
    if key == 'add_offset':
        add_offset = value  
    if key == 'scale_factor':
        scale_factor = value

to then transform the data and get the cloud top temperature at 1km:

data = (data - add_offset) * scale_factor
print data

return

[[ 267.57999402  265.64999406  258.57999422 ...,  270.99999394  272.9899939
269.01999399]
 [ 267.57999402  265.64999406  256.66999426 ...,  270.99999394  272.9899939
270.99999394]
 [ 267.57999402  265.64999406  258.57999422 ...,  269.01999399  272.9899939
270.99999394]
 ..., 
 [ 268.19999401  268.19999401  268.19999401 ...,  274.74999386
276.32999382  276.32999382]
 [ 266.42999404  268.19999401  268.19999401 ...,  276.32999382
276.32999382  277.80999379]
 [ 264.95999408  268.19999401  268.19999401 ...,  276.32999382
276.32999382  277.80999379]]

How to extract info from a byte ?

A more complex case where multiple information are stored in one SDS such as in the "Cloud_Mask_1km" SDS (see MODIS C6 cloud optical products User Guide page 115).

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()
print data.shape

return

(2030, 1354, 2)

2 bytes here (1byte = 8 bits)

Lets consider we want to read the "Cloud Mask Status Flag", "Cloud Mask Cloudiness Flag" and "Day/Night Flag" from the Cloud_Mask_1km:

import numpy as np

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print 'Cloud Mask determined: ',  status_flag
print 'Cloud Mask day or night: ',  day_flag
print 'Cloud Mask: ',  cloud_mask_flag

return

Cloud Mask determined:  1
Cloud Mask day or night:  1
Cloud Mask:  0

meaning (1:Determined; 1:Day; 0:Confident Cloudy ). Explanations: the bits_stripping function allows to extract information for a given byte and needs in inputs: the position of the first bit and the number of bits used (Warning: python uses 0-based indexing ! meaning bits are indexes: 0,1,2,3,4,5,6,7 for 1 byte ). For exemple to get "day_flag": first bit position 3 and uses 1 bit: "bits_stripping(3,1,data[i,j,0] "

Example of source code in python 2.7:

#!/usr/bin/env python

from pyhdf.SD import SD, SDC

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'

file = SD(file_name, SDC.READ)

print file.info()

#----------------------------------------------------------------------------------------#
# print SDS names

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print idx,sds

#----------------------------------------------------------------------------------------#
# get data

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data

print data

#----------------------------------------------------------------------------------------#
# get attributes

import pprint

pprint.pprint( sds_obj.attributes() )

for key, value in sds_obj.attributes().iteritems():
    print key, value
    if key == 'add_offset':
        add_offset = value  
    if key == 'scale_factor':
        scale_factor = value

print 'add_offset', add_offset, type(add_offset)
print 'scale_factor', scale_factor, type(scale_factor)

data = (data - add_offset) * scale_factor

print data

#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()

pprint.pprint( sds_obj.attributes() )

print data.shape

#----------------------------------------------------------------------------------------#
# Extract info from a byte

import numpy as np

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print 'Cloud Mask determined: ',  status_flag
print 'Cloud Mask day or night: ',  day_flag
print 'Cloud Mask: ',  cloud_mask_flag

Example of source code in python 3:

#!/usr/bin/env python

from pyhdf.SD import SD, SDC

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'

file = SD(file_name, SDC.READ)

print( file.info() )

#----------------------------------------------------------------------------------------#
# print SDS names

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print( idx,sds )

#----------------------------------------------------------------------------------------#
# get data

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data

print( data )

#----------------------------------------------------------------------------------------#
# get attributes

import pprint

pprint.pprint( sds_obj.attributes() )

for key, value in sds_obj.attributes().items():
    print( key, value )
    if key == 'add_offset':
        add_offset = value  
    if key == 'scale_factor':
        scale_factor = value

print( 'add_offset', add_offset, type(add_offset) )
print( 'scale_factor', scale_factor, type(scale_factor) )

data = (data - add_offset) * scale_factor

print( data )

#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()

pprint.pprint( sds_obj.attributes() )

print( data.shape )

#----------------------------------------------------------------------------------------#
# Extract info from a byte

import numpy as np

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print( 'Cloud Mask determined: ',  status_flag )
print( 'Cloud Mask day or night: ',  day_flag )
print( 'Cloud Mask: ',  cloud_mask_flag )

References