Introducing Localised Geospatial Analysis
Cluster and Outlier Analysis
Hot Spot and Cold Spot Areas Analysis
Case Studies
A collection of spatial statistical analysis methods for analysing the location related tendency (clusters or outliers) in the attributes of geographically referenced data (points or areas).
Can be indecies decomposited from their global measures such as local Moran's I, local Geary's c, and Getis-Ord Gi*.
These spatial statistics are well suited for:
A subset of localised geospatial analysis methods.
Any spatial statistics that satisfies the following two requirements (Anselin, L. 1995):
the LISA for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation;
the sum of LISAs for all observations is proportional to a global indicator of spatial association.
Attributes that are recorded based on a geographical entity such as postal code, postal area, census block, district, state, province, and country
These geographical entities can be in either point or polygon features.
The attributes can be in absolute counts (i.e. number of people age 65 and above) or rates (i.e. proportion of population age 65 and above).
It is univariate in nature.
Given a set of geospatial features (i.e. points or polygons) and an analysis field, the spatial statistics identify spatial clusters of features with high or low values. The tool also identifies spatial outliers.
local Moran's I is the most popular spatial statistical method used, other methods include local Geary's c.
In general, the analysis will calculate a local statistic value, a z-score, a pseudo p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and pseudo p-values represent the statistical significance of the computed index values.
Given a geographically referenced attribute field, X the formula of local Moran's I is:
An outlier: significant and negative if location i is associated with relatively low values in surrounding locations.
A cluster: significant and positive if location i is associated with relatively high values of the surrounding locations.
In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant.
The commonly used alpha-values are 0.1, 0.05, 0.01, 0.001 corresponding the 90%, 95, 99% and 99.9% confidence intervals respectively.
Given a set of geospatial features (i.e. points or polygons) and an analysis field, the spatial statistics tell you where features with either high (i.e. hot spots) or low values (cold spots) cluster spatially.
The spatial statistic used is called Getis-Ord Gi* statistic (pronounced G-i-star).
Getis and Ord (1992) define the local G and G∗ statistics for region i (i=1,···,n) as
For variable x to be nonpositive and the weight matrix W(d) to be nonbinary the G and G* statistics are defined as
A hot spot area: significant and positive if location i is associated with relatively high values of the surrounding locations.
A cold spot area: significant and negative if location i is associated with relatively low values in surrounding locations.
Things to consider if fixed distance is used:
All features should have at least one neighbour.
No feature should have all other features as neighbours.
Especially if the values for the input field are skewed, you want features to have about eight neighbors each.
Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense.
In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters).
Adaptive schemes adjust itself according to the density of data
Shorter bandwidths where data are dense and longer where sparse.
Finding nearest neighbors are one of the often used approaches.
Results are only reliable if the input feature class contains at least 30 features.
The input field mst be in continuous data type such as a count, rate, or other numeric measurement, no categorical attribute field is allowed.
The polygon contiguity method is effective when polygons are similar in size and distribution, and when spatial relationships are a function of polygon proximity (the idea that if two polygons share a boundary, spatial interaction between them increases).
The fixed distance method works well for point data. It is often a good option for polygon data when there is a large variation in polygon size (very large polygons at the edge of the study area and very small polygons at the center of the study area, for example), and you want to ensure a consistent scale of analysis.
The k-nearest neighbours method is effective when you want to ensure you have a minimum number of neighbors for your analysis.
Especially when the values associated with your features are skewed (are not normally distributed), it is important that each feature is evaluated within the context of at least eight or so neighbors (this is a rule of thumb only).
When the distribution of your data varies across your study area so that some features are far away from all other features, this method works well.
Note, however, that the spatial context of your analysis changes depending on variations in the sparsity/density of your features.
When fixing the scale of analysis is less important than fixing the number of neighbors, the k-nearest neighbours method is appropriate.
Select a distance based on what you know about the geographic extent of the spatial processes promoting clustering for the phenomena you are studying.
Use a distance band that is large enough to ensure all features will have at least one neighbor, or results will not be valid.
Try not to get stuck on the idea that there is only one correct distance band. Reality is never that simple. Most likely, there are multiple/interacting spatial processes promoting observed clustering.
Select an appropriate distance band or threshold distance.
Spatial statistics methods are not a blackbox. Before performing the analysis, a geospatial analyst should consider the followings:
What is the geographical question?
What is the geospatial feature?
What is the analysis field?
Which conceptualization of spatial relationships is appropriate?
Using micro-level event data of armed conflicts in Africa, this study aims to show how a data-driven geospatial analytics approach can be used reveal useful spatio-temporal pattern of the conflict events,
Demonstrating how a reproducible research can be conducted by using R Markdown, Rstudio and other appropriate R packages, and
Sharing the findings and more importantly, the approaches we used to the practice political researchers so that they are confident to conduct similar studies by themselves.
This project aims to contribute new knowledge towards the study of electricity consumption and its analyses in two ways.
Firstly, we will analyze electricity consumption using mainly two spatial analysis methods, Local Indicators of Spatial Association (LISA) and Geographically-Weighted Principal Components Analysis (GWPCA) to discover spatial patterns on electricity consumption patterns.
Secondly, this project aims to be a proof-of-concept of using R to achieve an end-to-end analytics solution, from data cleaning and preparation all the way to visualization of results. The visualizations should not only present the analysis results in a clear manner but also prompt users to do their own exploration of electricity consumption in Singapore and derive insights that fit their purpose.
Boxplots are used to reveal the statistical distributions of electricity consumption by dwelling types.
Anselin, L. (1995). "Local indicators of spatial association – LISA". Geographical Analysis, 27(4): 93-115.
Getis, A. and Ord, J.K. (1992) “The analysis of spatial association by use of distance statistics”. Geographical Analysis, 24(3): 189-206.
Ord, J.K. and Getis, A. (2010) “Local spatial autocorrelation statistics: Distributional issues and an application”. Geographical Analysis, 27(4): 286-306.
Introducing Localised Geospatial Analysis
Cluster and Outlier Analysis
Hot Spot and Cold Spot Areas Analysis
Case Studies
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |