Landscape change analysis with MOLUSCE - methods and algorithms: различия между версиями

Материал из GIS-Lab
Перейти к навигации Перейти к поиску
Строка 60: Строка 60:


Samples stored as [http://docs.python.org/2/library/collections.html#collections.namedtuple named tuples]: <pre>
Samples stored as [http://docs.python.org/2/library/collections.html#collections.namedtuple named tuples]: <pre>
(Count_Of_Samples, dtype=[('coords', float, 2),  
(Count_Of_Samples,  
('state', float, Count_Of_Dummy_Vars*Moving_Win_Size),
  dtype=[
('factors',  float, Moving_Win_Size*Summary_Band_Count),  
    ('coords', float, 2),  
('output', float, 1)])</pre>
    ('state', float, Count_Of_Dummy_Vars*Moving_Win_Size),
    ('factors',  float, Moving_Win_Size*Summary_Band_Count),  
    ('output', float, 1)
  ]
)</pre>
 
But this structure is internal and user should use special methods to create samples and to take the stored data. The module has the next functions for data manipulation:
* setTrainingData method takes several arguments (input and output rasters, mode of sampling, count of desired samples etc.) and performs sampling. One of the most important parameters is mode, it can be one of:
** "All"            Get all pixels
** "Random"          Get samples. Count of samples in the data=samples.
** "Stratified"      Undersampling of major categories and/or oversampling of minor categories.
* getData
* saveSamples

Версия от 12:26, 16 августа 2013

General structure of the plugin

MOLUSCE consists of several parts. The most importance are:

  • GUI modules (implement user interface)
  • Utility modules
    • Data Provider (provides procedures of reading/writing raster data and similar utility functions)
    • Cross Tabulation (provides functions for creating contingency tables)
    • Sampler (provides sampling procedure)
  • Algorithmic modules:
    • Area Analysis (provides procedures of change searching, making change maps)
    • Modeling (provides submodules for modeling relation between input-output data)
    • Simulation (provides procedure of land change simulation)
    • Validation (provides statistic functions and procedures for validation of simulation result)

The paper describes internal structure of most important utility and algorithmic modules. The description is valid for structure of plugin version <= 1.x.x

Utility modules

Data Provider

The module provides data structure for internal storing of raster data. It uses numpy masked arrays as data store. But to to prevent low-level manipulations of the data by user, the module gives special methods for data access (it allows to change internal structure in future, if any need occurs). The most important methods are:

  • Creation and storing methods
    • reading data from file, for example
      r = Raster(filename, maskVals = [...])
      The command creates new raster variable r and reads data from filename into internal storage. If the file contains no-data values, that pixels will be stored as masked pixels. An user can specify addition list of no-data values by maskVals variable. The initialization procedure uses gdal utilities and can read all data types that are supported by gdal.
    • creating new raster, for example
      r = Raster()
      r.create([band1Array, band2Array, ...], geodata=...)
      A user can create a raster variable by using masked array.
    • saving data, for example
      r.save(filename)
      A raster can be saved into a file, the method uses dgal utilities.
  • Access to the data and data manipulation
    • getBand method allows to user read partial raster's band from internal storing. For example, if r is variable assigned with 3-bands raster data, user can read a band:
      band1 = r.getBand(1); band1 = r.getBand(1);
    • setBand method is the opposite for getBand. It allows to replace a band by new array of pixels, for example
      r.setBand(band, bandNumber)
    • getBandsCount method gives information about number of bands stored in the raster variable:
      count = r.getBandsCount()
    • getBandGradation and getBandStat methods provides statistics of raster's bands (min/max values, standard deviation and list of unique values stored in the band).
    • normalize and denormalize methods are used by some algorithmic procedures. For example, Multi-layer perceptron or Logistic Regression predictors usually are better if input data are normalized. To normalize the data the training procedure must use some statistic information.
    • getGeodata method reads geometry and geography related information from the raster variable. Raster variable stores information about pixel sizes (dx and dy), data origin, projection of the raster data and so on. All of such information are encapsulated into special object - geodata. User can read the information via the method:
      geodata = r.getGeodata()
    • setGeoData method is the opposite for getGeodata method. It can be used, for example, during creating a new raster variable.
  • Comparing geodata objects
    • geoDataMatch performs comparing of geodata (raster sizes, projections and geo transform objects) of one raster and another geodata. The method is useful for most of raster operations between two different rasters. For example module can perform arithmetic operation on two raster if rasters have a equal pixel size, origin and projection.
    • geoTransformMatch performs comparing of two raster's geo transform objects.

Sampler

Sampler is the module that performs sampling procedure. A sample is a set of input data and corresponding output data that has to be predicted via a model.

A sample contains:

  • coordinates of pixel,
  • input data (consists of 2 parts):
    • state is data that is read from 1-band raster, this raster contains initial states (categories). Categories are splitted into set of dummy variables.
    • factors is list of rasters (they can be multiband) that explain transition between states (categories).
  • output data is read from 1-band raster, this raster contains final states.

In the simplest case we have pixel-by-pixel model. In such case:

sample = np.array(
            ([Dummy_variables_for_pixel_from_state_raster], [pixel_from_factor1, ..., pixel_from_factorN], pixel_from_output_raster),
            dtype=[('state', float, 1),('factors',  float, N), ('output', float, 1)]
        )

But we can use moving windows to collect samples, then input data contains several (for example, 3x3=9) pixels for every raster (band). For example if we use 1-pixel neighbourhood (3x3 moving windows):

sample = np.array(
            ( [Dummy1_1st-pixel_from_state_raster,..., DummyK_1st-pixel_from_state_raster, ..., DummyK_9th-pixel_from_state_raster],
              [1st-pixel_from_factor1, ..., 9th-pixel_from_factor1, ..., 1st-pixel_from_factorN..., 9th-pixel_from_factorN],
              pixel_from_output_raster
            ),
            dtype=[('state', float, 9*DummyVariablesCount),('factors',  float, 9*N), ('output', float, 1)]
)

Samples stored as named tuples:

(Count_Of_Samples, 
  dtype=[
    ('coords', float, 2), 
    ('state', float, Count_Of_Dummy_Vars*Moving_Win_Size),
    ('factors',  float, Moving_Win_Size*Summary_Band_Count), 
    ('output', float, 1)
   ]
)

But this structure is internal and user should use special methods to create samples and to take the stored data. The module has the next functions for data manipulation:

  • setTrainingData method takes several arguments (input and output rasters, mode of sampling, count of desired samples etc.) and performs sampling. One of the most important parameters is mode, it can be one of:
    • "All" Get all pixels
    • "Random" Get samples. Count of samples in the data=samples.
    • "Stratified" Undersampling of major categories and/or oversampling of minor categories.
  • getData
  • saveSamples