Topic 8: Localised Geospatial StatisticsDr. Kam Tin Seong
Assoc. Professor of Information SystemsSchool of Computing and Information Systems,
Singapore Management University2020-5-1 (updated: 2021-07-05)1 / 37

Content

Introducing Localised Geospatial Analysis
- Local Indicators of Spatial Association (LISA)
Cluster and Outlier Analysis
- Local Moran and Local Geary
- Moran scatterplot
- LISA Cluster Map
Hot Spot and Cold Spot Areas Analysis
- Getis and Ord’s G-statistics
Case Studies

2 / 37

Localised Geospatial Analysis

A collection of spatial statistical analysis methods for analysing the location related tendency (clusters or outliers) in the attributes of geographically referenced data (points or areas).
Can be indecies decomposited from their global measures such as local Moran's I, local Geary's c, and Getis-Ord Gi*.
These spatial statistics are well suited for:
- detecting clusters or outliers;
- identifying hot spot or cold spot areas;
- assessing the assumptions of stationarity; and
- identifying distances beyond which no discernible association obtains.

3 / 37

Local Indicator of Spatial Association (LISA)

A subset of localised geospatial analysis methods.
Any spatial statistics that satisfies the following two requirements (Anselin, L. 1995):
- the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation;
- the sum of LISAs for all observations is proportional to a global indicator of spatial association.

4 / 37

What is geographically referenced attributes?

Attributes that are recorded based on a geographical entity such as postal code, postal area, census block, district, state, province, and country
- informal geographical entities includes regular grids or hexagons.
These geographical entities can be in either point or polygon features.
The attributes can be in absolute counts (i.e. number of people age 65 and above) or rates (i.e. proportion of population age 65 and above).
It is univariate in nature.

5 / 37

Detecting Spatial Clusters and Outliers

Given a set of geospatial features (i.e. points or polygons) and an analysis field, the spatial statistics identify spatial clusters of features with high or low values. The tool also identifies spatial outliers.
local Moran's I is the most popular spatial statistical method used, other methods include local Geary's c.
In general, the analysis will calculate a local statistic value, a z-score, a pseudo p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and pseudo p-values represent the statistical significance of the computed index values.

6 / 37

Local Moran's I

Given a geographically referenced attribute field, X the formula of local Moran's I is:

7 / 37

Local Moran and Moran’s I

The summation of local Moran is

Moran’s I

8 / 37

Test statistics of Local Moran

9 / 37

Interpretation of Local Moran

An outlier: significant and negative if location i is associated with relatively low values in surrounding locations.
A cluster: significant and positive if location i is associated with relatively high values of the surrounding locations.
In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant.
The commonly used alpha-values are 0.1, 0.05, 0.01, 0.001 corresponding the 90%, 95, 99% and 99.9% confidence intervals respectively.

10 / 37

Interpretation of Local Moran and Scatterplot

11 / 37

Other forms of LISA

Local Geary

12 / 37

Detecting hot and cold spot areas

Given a set of geospatial features (i.e. points or polygons) and an analysis field, the spatial statistics tell you where features with either high (i.e. hot spots) or low values (cold spots) cluster spatially.
The spatial statistic used is called Getis-Ord Gi* statistic (pronounced G-i-star).

13 / 37

Getis-Ord Gi*

Getis and Ord (1992) define the local G and G∗ statistics for region i (i=1,···,n) as

14 / 37

Getis-Ord Gi*

For variable x to be nonpositive and the weight matrix W(d) to be nonbinary the G and G* statistics are defined as

15 / 37

Interpretation of Getis-Ord Gi and Gi*

A hot spot area: significant and positive if location i is associated with relatively high values of the surrounding locations.
A cold spot area: significant and negative if location i is associated with relatively low values in surrounding locations.

16 / 37

Fixed weighting scheme

Things to consider if fixed distance is used:

All features should have at least one neighbour.
No feature should have all other features as neighbours.
Especially if the values for the input field are skewed, you want features to have about eight neighbors each.
Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense.
In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters).

17 / 37

Adaptive weighting schemes

Adaptive schemes adjust itself according to the density of data
- Shorter bandwidths where data are dense and longer where sparse.
- Finding nearest neighbors are one of the often used approaches.

18 / 37

Best practice guidelines

Results are only reliable if the input feature class contains at least 30 features.
The input field mst be in continuous data type such as a count, rate, or other numeric measurement, no categorical attribute field is allowed.

19 / 37

Best practice guidelines

Select an appropriate spatial weighting method

The polygon contiguity method is effective when polygons are similar in size and distribution, and when spatial relationships are a function of polygon proximity (the idea that if two polygons share a boundary, spatial interaction between them increases).
- When you select a polygon contiguity conceptualization, you will almost always want to select row standardization for tools that have the Row Standardization parameter.
The fixed distance method works well for point data. It is often a good option for polygon data when there is a large variation in polygon size (very large polygons at the edge of the study area and very small polygons at the center of the study area, for example), and you want to ensure a consistent scale of analysis.

20 / 37

Best practice guidelinesSelect an appropriate spatial weighting methodThe inverse distance method is most appropriate with continuous data or to model processes where the closer two features are in space, the more likely they are to interact/influence each other.Be warned that with this method, every feature is potentially a neighbour of every other feature, and with large datasets, the number of computations involved will be enormous. 

21 / 37

Best practice guidelines

Select an appropriate spatial weighting method

The k-nearest neighbours method is effective when you want to ensure you have a minimum number of neighbors for your analysis.
- Especially when the values associated with your features are skewed (are not normally distributed), it is important that each feature is evaluated within the context of at least eight or so neighbors (this is a rule of thumb only).
- When the distribution of your data varies across your study area so that some features are far away from all other features, this method works well.
- Note, however, that the spatial context of your analysis changes depending on variations in the sparsity/density of your features.
- When fixing the scale of analysis is less important than fixing the number of neighbors, the k-nearest neighbours method is appropriate.

22 / 37

Futher guide on selecting a fixed-distance band value

Select a distance based on what you know about the geographic extent of the spatial processes promoting clustering for the phenomena you are studying.
Use a distance band that is large enough to ensure all features will have at least one neighbor, or results will not be valid.
Try not to get stuck on the idea that there is only one correct distance band. Reality is never that simple. Most likely, there are multiple/interacting spatial processes promoting observed clustering.
Select an appropriate distance band or threshold distance.
- All features should have at least one neighbour.
- No feature should have all other features as a neighbour.
- Especially if the values for the input field are skewed, each feature should have about eight neighbours.

23 / 37

In colclusion

Spatial statistics methods are not a blackbox. Before performing the analysis, a geospatial analyst should consider the followings:

What is the geographical question?
What is the geospatial feature?
What is the analysis field?
Which conceptualization of spatial relationships is appropriate?

24 / 37

Case Study 1: Area-based

Reference: https://ink.library.smu.edu.sg/sis_research/4331/

25 / 37

Objectives

Using micro-level event data of armed conflicts in Africa, this study aims to show how a data-driven geospatial analytics approach can be used reveal useful spatio-temporal pattern of the conflict events,
Demonstrating how a reproducible research can be conducted by using R Markdown, Rstudio and other appropriate R packages, and
Sharing the findings and more importantly, the approaches we used to the practice political researchers so that they are confident to conduct similar studies by themselves.

26 / 37

Violence against civilians in Africa, 1997-2017

]

27 / 37

LISA Cluster Map for Phase I - from 1997 to 2003

28 / 37

LISA Cluster Map Phase II - from 2004 to 2010

29 / 37

LISA Cluster Map Phase III - from 2011 to 2017

30 / 37

Case Study 2: Point-based

Reference: https://ink.library.smu.edu.sg/sis_research/4376/

31 / 37

Objectives

This project aims to contribute new knowledge towards the study of electricity consumption and its analyses in two ways.

Firstly, we will analyze electricity consumption using mainly two spatial analysis methods, Local Indicators of Spatial Association (LISA) and Geographically-Weighted Principal Components Analysis (GWPCA) to discover spatial patterns on electricity consumption patterns.
Secondly, this project aims to be a proof-of-concept of using R to achieve an end-to-end analytics solution, from data cleaning and preparation all the way to visualization of results. The visualizations should not only present the analysis results in a clear manner but also prompt users to do their own exploration of electricity consumption in Singapore and derive insights that fit their purpose.

32 / 37

Exploratory Data Analysis

Boxplots are used to reveal the statistical distributions of electricity consumption by dwelling types.

33 / 37

LISA Map of average electricity consumptions

At the national level

34 / 37

LISA Map of average electricity consumptions

At the planning area level

35 / 37

LISA Map of average electricity consumptions

At the local level

36 / 37

Reference

Anselin, L. (1995). "Local indicators of spatial association – LISA". Geographical Analysis, 27(4): 93-115.
Getis, A. and Ord, J.K. (1992) “The analysis of spatial association by use of distance statistics”. Geographical Analysis, 24(3): 189-206.
Ord, J.K. and Getis, A. (2010) “Local spatial autocorrelation statistics: Distributional issues and an application”. Geographical Analysis, 27(4): 286-306.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Topic 8: Localised Geospatial Statistics

Dr. Kam Tin SeongAssoc. Professor of Information Systems

School of Computing and Information Systems,Singapore Management University

2020-5-1 (updated: 2021-07-05)

Content

Localised Geospatial Analysis

Local Indicator of Spatial Association (LISA)

What is geographically referenced attributes?

Detecting Spatial Clusters and Outliers

Local Moran's I

Local Moran and Moran’s I

Test statistics of Local Moran

Interpretation of Local Moran

Interpretation of Local Moran and Scatterplot

Other forms of LISA

Detecting hot and cold spot areas

Getis-Ord Gi*

Getis-Ord Gi*

Interpretation of Getis-Ord Gi and Gi*

Fixed weighting scheme

Adaptive weighting schemes

Best practice guidelines

Best practice guidelines

Select an appropriate spatial weighting method

Best practice guidelines

Select an appropriate spatial weighting method

Best practice guidelines

Select an appropriate spatial weighting method

Futher guide on selecting a fixed-distance band value

In colclusion

Case Study 1: Area-based

Objectives

Violence against civilians in Africa, 1997-2017

LISA Cluster Map for Phase I - from 1997 to 2003

LISA Cluster Map Phase II - from 2004 to 2010

LISA Cluster Map Phase III - from 2011 to 2017

Case Study 2: Point-based

Objectives

Exploratory Data Analysis

LISA Map of average electricity consumptions

At the national level

LISA Map of average electricity consumptions

At the planning area level

LISA Map of average electricity consumptions

At the local level

Reference

Content

Help

Dr. Kam Tin Seong
Assoc. Professor of Information Systems

School of Computing and Information Systems,
Singapore Management University