MOLUSCE - technical manual
Terminology used in this document
- MOLUSCE is a QGIS plugin for Land Use Change Evaluation. The plugin performs the next process (in very general):
- Takes raster of land use categories for the past time, raster of land use categories for the present time and rasters of explanatory variables or factors.
- Trains a model that predicts land use changes.
- Predicts future land use changes via model, current state of land use and current factors.
- Model is an algorithm that is used for prediction of land use changes (transitions).
- State raster is one-band raster that contains landuse categories encoded in the raster's pixels.
- Input state raster is one-band raster that contains landuse categories of the past. It is an input for used model.
- Factor rasters are rasters of explanatory variables. The rasters can be one-band or multy-band. These are inputs for used model.
- Output state raster is one-band raster that contains landuse categories of the present. It is the desired result of prediction for model (a user trains the model to predict (input raster, factors) -> output raster).
- Transition is a information about land use change. Every change (for example forest -> non-forest) can be viewed as a transition of land use categories (encoded as pixels) from Input state raster to Output state raster.
- Change map is a one-band raster that stores information about transitions.
General structure of the plugin
MOLUSCE consists of several parts. The most importance are:
- GUI modules (implement user interface)
- Utility modules
- Data Provider (provides procedures of reading/writing raster data and similar utility functions)
- Cross Tabulation (provides functions for creating contingency tables)
- Sampler (provides sampling procedure)
- Algorithmic modules:
- Area Analysis (provides procedures of change searching, making change maps)
- Modeling (provides submodules for modeling relation between input-output data)
- Simulation (provides procedure of land change simulation)
- Validation (provides statistic functions and procedures for validation of simulation result)
The paper describes internal structure of most important utility and algorithmic modules. The description is valid for structure of plugin version <= 1.x.x
Utility modules
Utility modules are used in many other modules of the plugin. The most important utility modules such as DataProvider and Sampler are discussed in this section.
Data Provider
The module provides data structure for internal storing of raster data. It uses numpy masked arrays as data store. But to to prevent low-level manipulations of the data by user, the module gives special methods for data access (it allows to change internal structure in future, if any need occurs). The most important methods are:
- Creation and storing methods
- reading data from file, for example
r = Raster(filename, maskVals = [...])
The command creates new raster variable r and reads data from filename into internal storage. If the file contains no-data values, that pixels will be stored as masked pixels. An user can specify addition list of no-data values by maskVals variable. The initialization procedure uses gdal utilities and can read all data types that are supported by gdal. - creating new raster, for example
r = Raster()
r.create([band1Array, band2Array, ...], geodata=...)
A user can create a raster variable by using masked array. - saving data, for example
r.save(filename)
A raster can be saved into a file, the method uses dgal utilities.
- reading data from file, for example
- Access to the data and data manipulation
- getBand method allows to user read partial raster's band from internal storing. For example, if r is variable assigned with 3-bands raster data, user can read a band:
band1 = r.getBand(1); band1 = r.getBand(1);
- setBand method is the opposite for getBand. It allows to replace a band by new array of pixels, for example
r.setBand(band, bandNumber)
- getBandsCount method gives information about number of bands stored in the raster variable:
count = r.getBandsCount()
- getBandGradation and getBandStat methods provides statistics of raster's bands (min/max values, standard deviation and list of unique values stored in the band).
- normalize and denormalize methods are used by some algorithmic procedures. For example, Multi-layer perceptron or Logistic Regression predictors usually are better if input data are normalized. To normalize the data the training procedure must use some statistic information (see for example getBandStat).
- getGeodata method reads geometry and geography related information from the raster variable. Raster variable stores information about pixel sizes (dx and dy), data origin, projection of the raster data and so on. All of such information are encapsulated into special object - geodata. User can read the information via the method:
geodata = r.getGeodata()
- setGeoData method is the opposite for getGeodata method. It can be used, for example, during creating a new raster variable.
- getBand method allows to user read partial raster's band from internal storing. For example, if r is variable assigned with 3-bands raster data, user can read a band:
- Comparing geodata objects
- geoDataMatch performs comparing of geodata (raster sizes, projections and geo transform objects) of one raster and another geodata. The method is useful for most of raster operations between two different rasters. For example module can perform arithmetic operation on two raster if rasters have a equal pixel size, origin and projection.
- geoTransformMatch performs comparing of two raster's geo transform objects.
Sampler
Sampler is the module that performs sampling procedure. A sample is a set of input data and corresponding output data that has to be predicted via a model.
A sample contains:
- coordinates of pixel,
- input data (consists of 2 parts):
- state is data that is read from 1-band raster, this raster contains initial states (categories). Categories are splitted into set of dummy variables.
- factors is list of rasters (they can be multiband) that explain transition between states (categories).
- output data is read from 1-band raster, this raster contains final states.
Samples are stored as named tuples:
np.array(Count_Of_Samples, dtype=[ ('coords', float, 2), ('state', float, Count_Of_Dummy_Vars*Moving_Win_Size), ('factors', float, Moving_Win_Size*Summary_Band_Count), ('output', float, 1) ] )
But this structure is internal and user should use special methods to create samples and to take the stored data. The module has the next functions for data manipulation:
- setTrainingData method takes several arguments (input and output rasters, mode of sampling, count of desired samples etc.) and performs sampling. One of the most important parameters is mode, it can be one of list:
- "All". This mode allows store all pixels of the input and output rasters.
- "Random". This mode allows store random selected samples: if output raster (target raster) has N pixels, then the every pixel has probability of selection 1/N.
- "Stratified". This mode allows undersampling of major categories and/or oversampling of minor categories. If C is desired count of samples and K is number of output categories, then the sampling procedure select K random groups of pixels of equal size C/K. Every group will contain pixels of one certain category.
- getData function returns stored array of samples.
- saveSamples method allow to save samples as point *shp file. Every sample will be saved as a point.
Algorithmic modules
Area Analysis
The main purpose of area analysis module is change map calculation. The main methods of the module are makeChangeMap and getChangeMap. The first of them performs encoding and creating of change map. The module uses the next scheme of transition encoding:
It can be represented as
Code_Of_Iinitial_Class * Count_Of_Classes + code_Of_Final_Class