May 28, 2014

Data is Not Sterile: What is Geospatial Data Made From?

When I first started working with GIS and GPS data there were two basic truths. One, all GIS was equal to ESRI ArcGIS. And two, all data I need is neatly organized in a database somewhere on a server - you just need to find the server.  Three months later I moved to the Dadaab refugee camps and discovered no software, no data, and no server.  The head camp architect for the UN had never even heard of GIS.

At the time, I struggled with a solution.  In one project I was tasked with site plans for some buildings for Save the Children. I conducted an array of interviews in the camps to select the sites. Then I used a satellite phone to get GPS coordinates from which I extrapolated distances and drew vector maps in Autocad. I imposed the Autocad layer on top of a scanned topo map. The vectors could also be exported to ArcGIS upon return to my US university, since I obviously didn't have the money for a personal ESRI suite.  Technically, the solution worked well enough but I encountered another unforeseen challenge.  

Now the only problem is about the quality of the data. What is the combination of objectivity and subjectivity that goes into the creation of a single POI? How does we measure its value and how do we design the data collection to maximize that value?

For years I continued to search for strategies to create GIS data in places where it was unavailable. I experimented with walking papers, proposed ideas to software development friends, and wrestled with ambiguity.  I experimented for years with this problem in Egypt and was never happy with the outcome.  When I discovered the mobile application Fulcrum sometime in 2010 or 2011, my eyes were opened to the world of mobile data collection.  Suddenly the technical side of the problem was solved.  I could geolocate any survey. How you design the survey for the creation of spatial data is another matter.

The quality of the data is a continual obsession of mine.  Working in dangerous environments or even in multiple cultures creates special problems.  For example, if I were asked to rank the quality of infrastructure in Somalia, personally - I would label all of it as poor. There has been barely any development in decades but lots of bombs and bullets. In my eyes, as an American urban planner and designer, every road in Somalia is a nightmare.  But does that judgement present any value? Does this do any service to an external analyst or local project manager?


Because of the demanding conditions, it is more important to rank the data according to the values of the local population.  In the eyes of a person who has lived in Somalia for a lifetime, how does one road compare against another?  It is through this local level comparison that the POI earns higher level of value to the analyst.  To continue the road example, I can now use this data to estimate the scheduling of work or to select where to start, such as the area in most need or the quick fix? Obviously, from my perspective everything needs improved, but now I can adapt my project to the local context for improved success. The population will recognize that the development is starting with the worst road - or going for the quick fix - and this understanding generates support.  Working in dangerous conditions, there is no such thing as too much support, regardless of the endeavor.

All budding cartographers must realize that no GIS data is founded on a universal set of standards. Every POI is connected to a body of perceptions, values, and judgements.  When we look at the collective data, we are looking at a story about a place and we are looking through the eyes of the person(s) that assembled the story. 

You might argue that some data is somehow void of this conflict. Census data, for example, seems fairly objective. But this is not the case - instead, census data is established by opening the story creation to all participants.  By means of the aggregate we get closer to objectivity, but the deeper you drill into the data, the more ambiguity will present itself. While some questions might seem objective (how many children do you have?), their simplicity is deceptive.  Another question - how many people live in your household? - will not bring the authorities crashing if the respondent answers "14" in a 1-bedroom house. But will a respondent be honest to admit "14" if that is the case? It is unlikely.  Social values, paranoia, and personal psychology will inform a respondents answer.  The closer you get to the person, the closer you get to uncertainty.

Unfortunately in higher level education for geography, planning, design, and other cartography related fields, there has been little focus on data creation. It is seen as a purely technical process. Yet I argue that students should begin their GIS studies by building the data before learning about the variety of GIS tools for analysis (note: variety, not just ESRI). Only by building the data will students learn to recognize the subtleties of its composition and help them become more critical of their own work.  

A nuanced understanding of the data will contribute to deeper levels of insight into the the data set and ultimately to a broader understanding of other data components such as the importance of metadata and data shelf-life. After all, GIS data is snapshot of a given moment in an ever changing world, only by understanding its creation can we realize its mortality, ultimately, to realize how to leverage its death. There is definitely something called "bad data," yet I'd argue that a more common data affliction is "poorly understood." This problem isn't difficult to fix, you just need to start building it yourself.