Friday, April 25, 2014

Liquid wealth?

Creating a 3D map with QGIS to visualize the spatial relation between raster and vector data. 

Looking at Kenya’s GDP, the African country has the highest gross domestic product per capita in the East African Community and still only reaches a value that is less than 2 per cent of the GDP of my home country Germany

I wanted to learn more about the well-heeled and the poor in Kenya. The platform OpenData Kenya offers detailed statistics that also include information on poverty. The Database County Data Sheet Indicators 2009 provides various indicators that are related more or less to wealth and poverty. I decided to join that database to a county Shapefile to examine the distribution of wealth by looking at each county's poverty rate. According to that database, the poverty line is defined by 1562 KSH (~18$) in rural areas and 2913 KSH (~33.5$) in urban areas per month.

Distribution of poverty by Kenyan counties
The allocation of poverty shows a heterogeneous geographical distribution of poverty. Especially the North and some Counties close to the coast seem to struggle, whereas the counties south of the equator are much wealthier.
For sure, there are numerous effects that cause that distribution. For example, Kenya’s north is close to the current “lost countries” South Sudan and Somalia and also Ethiopia is struggling economically. The region is thus challenged by frequent security issues and less profitable for investors and entrepreneurship.

But there is one factor that I wanted to highlight in this project. Due to my work around Kisumu, I talk a lot to people that are involved in horticultural businesses through their family owned farms for income generation. This business is strongly bound to rain and it is obvious, that the availability of water is thus important to avoid poverty. To tackle the question, whether there is really a relation between rain/precipitation you could easily do some statistic tests like calculating the correlation and creating a scatter plot. 

As you can see, there is at least some correlation, although it is pretty weak. Nevertheless, the results indicate a higher relation as soon as you disregard some of the extreme values which are mainly the counties with large cities like Mombasa, Nairobi and Kisumu.

Anita Graser’s blog inspired me once again to use a plugin that she presented. In one of her recent posts, she introduced the mighty Qgis2threejs tool. She shows how to create a 3D map including trees, buildings, a basemap and a DEM (Digital Elevation Model). According to the values of the raster data (in Anita’s post the DEM), the overlaying vector data will be distorted to include hills and holes, but you can also use that tool to visualize other raster data.
Our initial map of counties per poverty can thus be combined with a raster data about precipitation. You just have to add the styled county Shapefile and the appropriate raster tiles and run the tool. It will create a well working HTML document that presents your interactive 3D map. 

To finalize my project, I used my terrible HTML skills and I must admit to be quite proud of the result. You can see that there is nearly no precipitation, especially in the poor, northern counties like Turkana, Marsabit and more. I also like that you can see that the rainfalls increase slightly close to the coast because of the marine influence. I think it is a great tool to visualize possible relations between raster data and vector data.
Below an animated GIF to highlight which regions experience the lowest rain falls.

Counties with highest poverty are challenged by low precipitation

Summarizing my results, I must admit that the assumption of a strong connection between wealth and precipitation cannot be proofed because there are, as already mentioned, a lot of factors that influence poverty. Nevertheless, I think a visualization can support to create awareness and it will stick in your head.

Wednesday, April 16, 2014


How to use the Network Analyst to determine service areas for reanimation of patients with asystoles. Klick here to download the final result before proceeding.

When a heart stops beating (asystole), there are only few minutes left for reanimation. The last effort to come back to life is much more successful when using a defibrillator or done by experts. Recently, there were several defibrillators installed in Cologne that can be operated even by an amateur and should increase the survival rate in case of an asystole. 
Nevertheless, the success of a reanimation is strongly connected to time. While an instant treatment can save the patient’s life in about 90 per cent of the cases, the survival rate decreases by 7 – 10 per cent per minuteTherefore, it is crucial to be close to either one of these public defibrillators or to be in a close range to an ambulance.

The impressive open data site Offene Daten: Köln provides the geographer with a bunch of useful geodata for whole Cologne. Besides detailed vector data for roads and addresses (a 157.415 point shapefile!!!), you will also find a dataset on the defibrillators. I only had to geocode the sites of ambulances, which are most of the time also the city’s fire stations

To prepare the analysis, the road data had to be manipulated because the research should also include sites of ambulances that are close but not in Cologne’s municipal area. Thus, I digitized some of the major roads that would be used by the ambulance when having an operation in the city. To digitize the access roads, QGIS and the OpenStreetMap WMS turned out to be comfortable tools. It is important to create a new feature for every junction. Otherwise you will experience problems in the analysis afterwards.

Digitizing access roads with OSM WMS

After preparing the route data, I moved to ArcMap because I still struggle using the QGIS pgRouting Plugin. ArcMap includes the Network Analyst, which is quite handy for such kind of analysis.
Before starting to create service areas for the defibrillators and the ambulances, you will have to build a road Network Dataset that can then be used in the Network Analyst. By creating the service areas, the Network Analyst will take different locations (e.g. fire stations and defibrillators) and the road network to calculate polygons that show the accessibility based on your defined distances.

Creating a Network Dataset with ArcMap

According to the literature an asystole cannot be survived more that 8 to 10 minutes in most cases. Therefore, I decided to estimate the distances that you can cover by foot and by ambulance in this period. For the defibrillators, it seemed to be reasonable to take a decreasing running speed of around 18 to 11 km/h and for the ambulances I decided to assume an average speed of 50 km/h. You should not forget that the utilization of the defibrillator usually does not only include the running to the defibrillator but also the rushing back. Thus, the covered distance for the defibrillators must be halved. For the ambulance, you can also include one minute for the emergency call and the departure of the vehicle.

When calculating the service areas considering the coverage per minute, you will have polygons that indicate where help arrives within 1 to 9 minutes.

Service areas of defibrillators and ambulances

Right now, there is only one major mistake. Due to the overlapping of the defibrillators service areas, some areas are coloured red although they are much better covered by the ambulance. To fix this issue, you will have to merge the data and either set a hierarchy for the displaying of the areas (high preference for minute 1, low preference for minute 8) or clip and intersect some data.

Both Service Area Layers have to be combined to recieve final result

Initially, I planned another style of visualization, which is why I did not bother that much on how to present the polygons. Due to the availability of a point Shapefile that includes every address in Cologne, I decided to symbolize them according to the appropriate service area. By this kind of presentation, the viewer is enabled to zoom in and identify its own house or even search for it in a web based map.

The addresses can be extended by the attribute value indicating the minutes until arrival of help by performing a spatial join. Using the merged service areas and the point Shapefile, it is possible to only join the minimum value of an attribute to the target layer. This option resolves the problem of overlapping polygons with different values and stores the minimal arrival time for each address independent on whether it is served by a public defibrillator or the ambulance.

For the final styling I switched back to QGIS because I feel much more comfortable and powerful using the QGIS Print Composer. Due to storage limitations and diverse issues, my plans to also publish the data in a web map are not yet implemented but I hope to find a satisfying solution very soon.

Above the not really satisfying approach to put the data in an acceptable web map. I guess I will still have to work on this issue. At least it offers you the option to identify single features. If you go to the official web map, you can search for your adress using the input field on the upper right.

To enjoy the whole map, I recommend to download this PDF file and zoom in

You can see, that there are only few locations that are endangered drastically. The city's center is well covered by ambulances and defibrillators. It would be interesting to play a little bit with the variables for the coverage of the ambulances and defibrillators. Possibly my assumptions are over- or underestimating the reality. Moreover, it is likely that I missed one or more sites of ambulance vehicles.

Thursday, April 3, 2014

Contributing to OpenStreetMap

Google Maps is great when it comes to accuarcy of their maps. They offer the best satellite imagery and a deeply detailed set of vector data. As long as you want to check a route, see your house from above or look for whales in the ocean, you will be satistfied by the provided service. As soon as it comes to use the data for GIS projects you will be lost because Google's Terms of Use prophibit you to utilize their data. 
Therefore, OpenStreetMap is a great alternative to get global data for your GIS project. OSM data can be used by everyone for everything because all contributions are under the Open Database License. Several GIS applications like QGIS and ArcMap can be extended by plugins that enable you to extract data from OSM.

Although OSM is not providing satellite imagery, their vector data gets better every day. Everyone can contribute and start digitizing data. To see current changes, I recommend to go to Show Me The Way.

Due to my intentions to use GIS data of my current homebase Kisumu, Kenya, I realized that Google is offering much better data in the study area. In Nyalenda, a slum close to Kisumu's city center, motorcycle taxis are crucial for the public transport. The gravel roads are muddy and clutterd with potholes, which is why there is few competition by regular taxis or Tuk Tuks. 

For the Bachelor's thesis, I want to identify service deserts according to the motorcycle taxi stages and the road network. As you can see, Google already has some of the major gravel roads in Nyalenda digitized.

OSM obviously lacks a lot of these roads. And the ones that are already in Nyalenda have been digitized by me in a test run.

Screenshot of OSM to show current status

Digitizing data for OSM is very user friendly. As soon as you have created your OSM account, you can start contributing. For beginners it seems to be recommendable to just use the embbeded editor and absolve the short introductional tutorial. 

The OSM vector data will overlap the bing satellite imagery, which is used as template. After adding some lines that cover undigitzied roads, you can add attribute values like surface, name and so on.

Digitizing road in Nyalenda with the embedded editor

For sure, digitizing is very time consuming, but I like the idea of contributing to such an influential project. It trains your skills and sometimes you can help yourself, like I do in Nyalenda.

I hope that I will find time to continue the work for Nyalenda soon. You can check my proceedings here: OSM Nyalenda 

Saturday, February 8, 2014

Finding murders with OpenStreetMap

Since the end of the nationwide suspension of capital punishments in 1976, 1365 people have been executed in the US. Only 13 convicts were female and a majority of 87 per cent died by lethal injection. Alarming 34 per cent were black, which is around 2.5 times the number that should have been doomed according to the distribution of population. The most executions were done in Texas (509).

The Death Penalty Information Center (DPIC) offers a detailed database including all executions and necessary details to map the cruelty. After an informal mail correspondence, I received the approval to use the database for non-commercial use as long as I mention the source. Their site is not clearly giving copyright information, thus I recommend to ask for permission before you publish something derived from their data.

DPIC’s database is not providing coordinates but county names, which is why it is a good source to show how you can geocode your data when you only have place names. Nominatim is a great extension using OpenStreetMap to do geocoding or reverese geocoding. The best thing about it is that Nominatim has an API interface and thus you can do the geocoding with a script. My programming skills are very rudimentary but I had experienced assistance to develop a well working Python script for a former project. Adapting a few lines in the script, it was easy to send my county names and state names to Nominatim and get back the lat and lon values for each county’s weighted centre. There are several scripts that you can use with Nominatim or Google Maps in case you also want to avoid a lot of handwork when geocoding. You will find them online.

Using Python to geocode (input: red, output: green)

After data cleaning and saving the worksheet to a .csv file, QGIS’ Add Delimited Text Layer tool can easily import the execution data and you can save it to a point Shapefile.

Adding lat lon data to QGIS
I wanted to present the data in a catchy way so I adopted some of Anita Graser's ideas. Her blog helps a lot when learning about QGIS functions and to get some inspiration for your projects.

To make the map as patriotic as possible, you can use the new blending functions in the Print Composer and add an American flag. Anita describes how to do that in her post on making vintage maps. Moreover, I chose to also use the newly introduced Data defined properties to present each jail in relation to the number of executed convicts. Anita explains that function in her project about meteorite landings.

Data defined properties styling

Finally, we need to create a legend. Since QGIS 2.0.1, you can use HTML tags to arrange your annotations. My HTML skills are as rudimentary as my programming skills, so I only added some basic text styling.  

I don’t know much about the death penalty in the US, so I was quite surprised that there is a huge difference between the West and the East Coast. I am interested in the factors that caused this heterogeneity. The main reason should be the unequal distribution of population but there could be also historic reasons related to the colonization of the States and still persistent differences in the attitude of people or it might be just a coincidence of law regulations. I guess it is a mixture of everything. Let’s just hope that there will be more blue hatched states and less red circles in the future.

Monday, February 3, 2014

Oh, hello Asiafrica!

How to use Wikipedia to create animated maps.

Geodata is everywhere but there is one major obstacle that prevents constructing beautiful maps with it. Many databases are containing all elements to provide geodata but lots of these databases are not intended to supply users with geographic information and attributes in a way that you can easily use them for mapping. There are thousands of ideas for possibly amazing maps that could be created if there was sufficient data or time to extract and prepare it. 

If you want to produce geodata by yourself, a look at Wikipedia can be very helpful. There are lists of earthquakes, most western points of countries and so on that contain lat and lon values already. You can easily import these using the Add Delimited Text Layer function in QGIS. Moreover there are lists that contain geographical information that you can symbolize by a country’s or city’s name. The utilization of these lists for GIS is sometimes a little bit tricky. For example, the list of terrorist attacks would need some geocoding whereas data that is linked to a whole country can be visualized using the Join function.

The recent project about where the bad guys vote made me interested in creating another cartogram. Inspired by the demographic animation of Germany I decided to also work on a project that shows a development within a certain period. 

The Wikipedia list of countries by past and the future population provides the population of nearly every country from 1950 to 2050 in a 5 years interval. Thus, it is possible to generate 21 cartograms to visualize the whole Wikipedia list in one animation.

To import the list we can take the link of the demanded Wikipedia site, open Excel and paste it into the File Name text box just like we would open any other file stored on our computer. The whole site will be opened and look a little bit clumsy but it is now easy to extract the list data in a new worksheet and we can prepare the data to join it with a global dataset that includes country names. I decided to take data from Natural Earth.

After saving the joined data to a new Shapefile, we can start creating cartograms with Scape Toad for every year. It will take some time!
By the creation of 21 cartograms, we have a good number for our animation. After styling and exporting each cartogram via QGIS’ Print Composer the cartograms can be compiled to an animated GIF by using an image manipulation programme like GIMP

Lots of Layers and a lot of handwork for styling and export

Creating an animated GIF with GIMP

It took me some time to figure out a decent styling. In fact, I do not like this colourful presentation of the continents but it turned out to me the most reasonable styling. First, I tried to use one graduated colour for all continents but it made the impression that there is some kind of order or importance within the continents. Thus, I ended up using the colourful symbolization.

As you can see, the animation illustrates the decrease of population in Europe compared to Asia and then at around 2015 also in relation to Africa. It seems to be only a matter of time until the Europeans actually need something that is more forceful than Frontex to “protect” against refugees crossing the Mediterranean Sea.

Monday, January 20, 2014

Where the bad guys vote

The German party NPD is the most known actor in nationwide politics that you can define as radical right-winged. The followers of this party are distributed heterogeneously in the country, which means that the NPD has much “better” results in the eastern part of Germany and especially in the countryside. Usually you can observe that in big cities small parties like the ecological Die Grünen and the internet phenomenon Piratenpartei tend to score better percentages than in the countryside.

I decided to visualize the results of the last federal election (Bundestagswahl 2013) with a focus on the bad guys which will here be defined by their vote for the NPD. The website offers detailed data related to the bygone election. For the visualization we need geodata which contains the information of geometry and attributes. Thus, we can use the provided Shapefile data of the 299 election districts. This data can be connected to our attribute data, which is the table of the election results that is in CSV-format. Both datasets are downloadable on the mentioned website.

After data cleaning with Excel or OpenOffice, the CSV table can be imported in QGIS and then joined to the election districts by using the Join function and connecting the data through the election district ID. A quick check whether the intuition about the heterogeneous distribution of the party’s follwers can now be made by visualizing the results with the graduated symbology option in QGIS.

As we can see, the data seems to reflect the thesis of an allocation of bad guys in the eastern part of Germany.

We could now stop our work and finish with a clear and simple output that supports our thesis. Nevertheless, this kind of visualization is quite boring and will not necessarily catch our audience’s attention.

Therefore, I decided to create a cartogram. Cartograms distort the geometry of each feature according to an attribute value. There are several beautiful cartograms that for example show the size of each country according to its population/wealth/beer consume and so on.

By using the awesome GIS programme ScapeToad 1.1 ( we can easily use the number of electorates and the number of votes for the NPD to create two cartograms. The cartogram that is based on the electorate shows the size of each election district according to the number of eligible voters, the other one should show the size in relation to the actual votes for the NPD.
ScapeToad offers also to generate a distorted grid for every cartogram and the option to export our outcomes in Shapefile format. Thus, we can now finish our map by working on the styling and using the new Print Composer. I decided to colour the election circles with high population to highlight major cities so we can examine the differences in distortion between cities and the countryside.

Click to enlarge

Our result shows each election circle’s importance for the nationwide outcome and in comparison the importance of each circle according to the NPD votes. The right map shows the blown up eastern part of the country quite well. Moreover, I find it interesting to study the major cities. For example, Munich, Hamburg, Freiburg and Cologne seem to have less problems with the bad guys than Dresden and Leipzig but also Berlin.

Download map in high quality