Connecting PostGIS to Leaflet using PHP

For a few years now, I’ve been building wikimaps that rely on a PostgreSQL/PostGIS database to store geographic data and Leaflet to display that data on a map. These two technologies have increasingly become the industry standard open-source front- and back-end web mapping tools, used together by such behemoths as Openstreetmap and CartoDB. While you can use a go-between such as Geoserver Web Feature Service (WFS) to connect the two, the most simple, flexible, and reliable way I’ve found to connect data to map is through a little PHP script that essentially formats the queries and lets PostGIS and JavaScript do all the heavy lifting (note that my opinion on this has changed since I wrote my series of tutorials on web mapping services two years ago).

It occurred to me recently that I should share my basic technique, and I did so for UW-Madison Cartography students in a short presentation as part of our Cart Lab Education Series. This blog post is essentially a transcription of that tutorial. It assumes you have already installed Postgresql with the PostGIS extension and pgAdminIII GUI (I highly recommend installing all three through the Stack Builder), and possess a working understanding of SQL queries, HTML, JavaScript, and Leaflet.js. I will gently introduce some PHP; this shouldn’t be too painful if you already have a bit of background in JS.

Let’s get started, shall we?

I have provided the tutorial sample code on GitHub. A colleague just introduced me to the wonders of Adobe Brackets, so let’s use it to take a look at the directory tree first:

Directory Tree

As you can see, I’ve provided a data folder with a complete shapefile of some example data I had lying around. This open-access data is frac sand mines and facilities in western Wisconsin, and comes from the Wisconsin Center for Investigative Journalism. The first step is getting the data into a PostGIS-enabled database using pgAdminIII’s PostGIS Shapefile and DBF Loader (enabling this plug-in is slightly tricky; I recommend these instructions). After you have created or connected to your PostGIS database, select the loader plug-in from the pgAdminIII Plugins menu. Click “Add File”, navigate to the data directory, and select the shapefile. Make sure you change the number under the SRID column from 0 to 26916, the EPSG code for a UTM Zone 16N projection. PostGIS will require this projection information to perform spatial queries on the data. Once you have changed this number, click “Import”.

PostGIS Shapefile and DBF Loader

With your table created, we can now move to the fun part—code! For formatting clarity, I have only included screenshots of the code below, and will issue a reminder that the real deal is posted on GitHub here. I’ll only briefly touch on the index.html file and style.css files. Within index.html are links to the jQuery, jQuery-ui, and Leaflet libraries. I am mainly using jQuery to facilitate easy AJAX calls and jQuery-ui to create autocomplete menus for one of the form input text boxes. Leaflet of course makes the map. There are two divs in the body, one for the map and one for a simple form. The most useful thing to point out here is the name attributes of the text input elements, which will become important for use in constructing the SQL queries to the database.

html snippit

Style.css contains basic styles for placing the map and form side-by-side on the page, and bears no further mention.

main.js snippet

Turning to main.js (above), I have defined three global variables. The first, map, is for the Leaflet map. The second, fields, is an array of field names corresponding to some of the many attribute fields in my fracsandsites table in the database; this is the attribute data I want to see in the pop-ups on the map (other fields may be added). The third variable, autocomplete, is an empty array that will hold feature names retrieved from the database for use in the autocomplete list.

The screenshot above shows the first two functions defined after the global variables, with a $(document).ready call to the initialize function. This function sets the map height based on the browser’s window height, then creates a basic Leaflet map centered on Wisconsin with a simple Acetate tileset for the basemap. It then issues a call to the getData function. Here’s where the fun really begins.

The jQuery.ajax method is a very simple substitute for a whole lot of ugly XMLHttpRequest native code. It can take data as a string of parameters in URI scheme or as a JavaScript object; I’m using the latter because it is neater. You can include any parameters, but the important part is to think about what you need out of the DOM to create the SQL query that’s going to grab your data. I’m designating the table name and the fields here, although you could also hard-code both in the PHP if you don’t need them to be dynamic.

OK, let’s flip over and see what’s going on in getData.php…

php snippet

If you’re not used to seeing PHP code, some things here may look a bit odd. The first two lines declare that what follows is php code for the interpreter and enable some feedback on any I/O errors that occur. PHP is very picky about requiring semicolons at the end of each statement that isn’t a control structure (open or closing curly brace), and a syntax error will cause the whole thing to fail silently despite line 2. Lines 5-9 assign the database credentials to variables, which are denoted with the dollar sign (unlike JS, there is no var keyword equivalent). Make sure to change these to your own database credentials. On line 11, the $conn variable is assigned a pg_connect object, which connects to the database using the parameters provided above. Note that in PHP, there is a difference between double and single quotes: both denote a string, but when using double quotes you can put variables directly into the string without concatenation and they will be recognized as variables by the interpreter, rather than as string literals. The following if statement tests the integrity of the connection and quits with an error if it fails.

One important thing to note here is that for this to work, you must already have PHP installed and enable the php_pgsql extension by uncommenting it in your php.ini file, which is stored in your PHP directory (probably somewhere in Program Files if you’re on a PC). You can get PHP here.

Lines 18 and 19 retrieve the data sent over from the $.ajax method in the JS. $_GET is a special designated variable in PHP that is an array of parameters and associated values submitted to the server with a GET header (there is also one for the POST header). In PHP, an array is analogous to both an object and an array in JavaScript; it’s just that the latter form uses zero-based sequential integers as keys. In this case, we can think of the $_GET array as just like the AJAX data object, with the exact same keys and values (table with the string value "fracsandsites" and fields with its array of string values). Line 18 assigns the first to a new PHP $table variable and line 19 assigns the second to a $fields variable.

Since $fields is another array, to use it in a SQL query its values must be joined as comma-separated values in one string. The foreach loop on line 23 does this, assigning each array index to the variable $i and each value to the variable $field. Within the loop, each variable is concatenated to the $fieldstr variable (the . is PHP’s concatenation operator), preceded by l. because the SQL statement will assign the alias l to the table name (why will become clear later).

After all fields have been concatenated, a final piece is added to the $fieldstr: ST_AsGeoJSON(ST_Transform(l.geom,4326)). This is the first bit of code we’ve seen that is specifically meant for PostGIS. We want to extract the geometry for each feature in the table in a form that’s usable to Leaflet, and that form is GeoJSON. Fortunately for us—and what makes PostGIS so easy to use for this purpose—PostGIS has a native method to translate geometry objects stored in the database into GeoJSON-formatted strings. ST_AsGeoJSON can simply take the geometry column name as its parameter, but in order for the data to work on a Leaflet map, it has to be transformed into the WGS84 coordinate reference system (unprojected lat/long coordinates). For this purpose, PostGIS gives us ST_Transform, which takes the geometry column name and the SRID of the CRS into which we want to transform it (In this case, the familiar-to-web-mappers 4326).

At this point, we now have all of the components of our first SQL query (line 31). If you were to print (or echo in PHP parlance) the whole thing without the variables, you would see

$sql = "SELECT l.gid, l.createdby, l.featname, l.feattype, l.status, l.acres, ST_AsGeoJSON(ST_Transform(l.geom,4326)) FROM fracsandsites l";

And, in fact, if you copied everything inside the quotes into the SQL editor in pgAdminIII, you would get a solid response of those attributes from all features in the table. Go ahead and do it. DO IT NOW!

sql editor output

For now, I’m going to skip the next few lines (we’ll come back to them later) and wrap up my PHP with this:

PHP snippet

Line 45 tests for a response from the database, but also sends the query to the server using the pg_query method and assigns the response to the variable $response. The while loop on lines 51-56 retrieves each table row from the $response object (note: this is emphatically not an array; hence the use of the pg_fetch_row method) and echoes each attribute value, with the attribute values separated by comma-spaces and the rows separated by semicolons. As previously mentioned, PHP’s echo command “prints” data, in this case by sending it back to the browser in the XMLHttpRequest response object.

At this point we can go back to the browser and look at what we have. If you’re using Firebug, by default it will log all AJAX calls in the console, and you can see the response once it’s received. You should be able to see something like this:

Response in the console

Now all we have to do is process this data through a bit of JavaScript and stick it on the map. Easy-peasy. I’ll start with the first part of the mapData callback function:

js snippet

Lines 39-44 remove any existing layers from the Leaflet map, which isn’t really necessary at this stage but will become useful later when we implement dynamic queries using the HTML input form. For now, skip down to Line 47 and notice that we are starting to build ourselves a GeoJSON object from scratch. This is really the easiest way to get this feature data into Leaflet. If you need to be reminded of the exact formatting, open any GeoJSON file in a text editor, or start making one in geojson.io. Once we have a shell of a GeoJSON with an empty features array, the next step is to go ahead and split up the rows of data using the trailing comma-space and semicolon used in getData.php to designate the end of each row. Since these are also hanging onto the end of the last row, once the data is split into an array we need to pop off the last value of the array, which is an empty string. Now, if you console.log the dataArray, you should see:

dataArray in console

Now, for each row, we need to correctly format the data as a GeoJSON feature:

js snippet

Each value of the dataArray is split by the comma-spaces into its own array of attribute values and geometry. We create the GeoJSON feature object. The geometry is in the last value in the feature array (d), which we access using the length of the fields array since that array is one value shorter than d and therefore its length matches the last index of d. properties is assigned an empty object, which is subsequently filled with attribute names and values by the loop on lines 69-71. The if statement on lines 74-76 tests whether the feature name is in the autocomplete array, and if not, adds it to the autocomplete array. Finally, the new feature is pushed into the GeoJSON features array. Lines 82-84 activate the autocomplete list on the text input for the feature name in the query form. If you were to print the GeoJSON to the console and examine it in the DOM tab, you should see:

the geojson in the DOM tab

Now that we have our GeoJSON put together, we can go ahead and use L.geoJson to stick it on the map.

js snippet

I won’t go through all of this because it should be familiar code to anyone who has created GeoJSON overlays with Leaflet before. If you’re unfamiliar, I recommend starting with the Using GeoJSON with Leaflet tutorial.

This gets us through bringing the data from the database table to the initial map view. But what’s exciting about this approach is how dynamic and user-interactive you can make it. To give you just a small taste of what’s possible, I’ve included the simplest of web forms with which a user can build a query. If you’re at all familiar with SQL queries through database software, ArcMap, etc. (and you should be if you’ve gotten this far in this tutorial), you know how powerful and flexible they can be. When you’re designing your own apps, think deeply about how to harness this power through interface components that the most novice of users can understand. As a developer, you gain power through giving it to users.

As previously mentioned, the form element in the index.html file contains two text inputs with unique name attributes. The first of these is designated for distance (in kilometers), and the second is for the name of an anchor feature. We will use these values to perform a simple buffer operation in PostGIS, finding all features within the specified distance of the anchor feature. Ready to go? OK.

In index.html, the value of the form’s action attribute is "javascript:submitQuery()". This calls the submitQuery function in main.js. Here is that function:

js snippet

We use jQuery’s serializeArray method to get the values from the form inputs. This returns an array of objects, each of which contains the name and value of one input. Then, instead of creating the data object inline with the AJAX data key, we create it as a variable so we can add the serialized key-value pairs to it. This is done through the forEach loop, which takes each object in the formdata array and assigns the name value as a data key and the value value as a data value. Get it? Good. (If not, just console.log the data object after the loop).

With the data object put together, it’s time to issue a new $.ajax call to getData.php. Let’s flip back over and take another look at that. Everything is the same except now we have a few more $_GET parameters to deal with and a different query task. Hence the if statement on lines 34-40:

php snippet

The if statement tests for the presence of the featname parameter in the list of parameters sent through AJAX. If it exists, that parameter’s value gets assigned to the $featname variable and the distance parameter value, multiplied by 1000 to convert kilometers to meters, gets assigned to the $distance variable.

Now for the hard part. Remember our simple SQL statement in which we gave the table and all of its attributes an alias (l) for no apparent reason? Well, the reason is that we now have to concatenate SQL code for a table join onto it. Whenever you do a join in PostgreSQL, each table on either “side” of the join needs its own alias. Since the initial table reference is on the left side of the JOIN operator, I assigned the original table the alias l, for left, and the joined table r, for right. Obvious, huh? Well, maybe not. In any case, the principle is that although both sides of the join reference the same table, Postgres will look at them like they are different tables. This is a LEFT JOIN, meaning that the output will come from the table on the left, and the table on the right is used for comparison.

There are two parts to the comparison here: the ON clause and the WHERE clause. The ST_DWithin statement following ON specifies that output from the left table will be rows (features) within the user-given distance of rows (features) from the right table; since our table is stored in a UTM projection, the distance units will be meters (if it were stored as another CRS, say WGS84, we would have to use ST_Transform on each table’s geometry for it to work). The WHERE clause narrows the right-hand comparison to a single feature: the one named by the user in the input form. Translating to English, you could read this as, “Give me the specified attribute values and geometry for all of the features in the left table within my specified distance of the feature I named in the right table.” Or something like that.

OK, that’s the biggest headache of the whole demo, and it’s over. The features that get returned from this query now go back to the mapData function in main.js. The map.eachLayer loop that removes existing layers from the map now has a purpose: get rid of the original features so only the returned features are shown. The new features are plunked into a new homemade GeoJSON and onto the map through L.geoJson. Here’s an example using a query for all sites within 10 km of the Chippewa Sands Company Processing Plant:

screenshot of query results

That’s it. There’s lots more you should learn about data security (particularly with web forms), PDO Objects, error prevention and debugging, etc before going live with your first app. But if you’ve gotten through this entire tutorial, congratulations—you’re on your way to designing killer user-friendly database-centered web maps.

Update 3/31/2017: I have been getting a lot of comments on this blog post recently requesting help with some error or other a reader is experiencing while trying to implement this tutorial. While I’m flattered the tutorial is getting a lot of attention, I am also very busy with work and family and unfortunately don’t have time to work through users’ issues with the code. Thus, I will no longer be responding to comments on this post. Keep in mind that the parameters and properties used in the examples above are tailored to the example dataset, and many will need to be altered if you’re implementing your own app. Also check that the right PHP extensions are enabled and your database connection info and credentials check out. For further assistance, I highly recommend using StackOverflow, W3Schools, and the PostgreSQL, PostGIS, and PHP documentation.

Advertisements

Leaflet Draw: Implementing Custom Tools

As software developers, I think we can get caught up in the power and elegance of our own creations and fail to consider the importance of explaining their inner workings in a way that is understandable to those who were not intimately involved in their creation. Another way of saying this: teaching is hard. We don’t always know what we know. This has been on my mind this week as I’ve found myself struggling to learn a number of very useful but not-very-well-documented tools by a popular web cartography startup.

Over the past few months, I have been working on an interesting project: building a Leaflet-based wikimap of herding routes in eastern Senegal for use by academics, NGOs, and government officials examining land use conflicts between farmers and herders in the area. Because the users will not be computer experts, I am particularly concerned about not making skill-based assumptions and am trying to carefully think through the interface and how to make it as simple and novice-friendly as possible, while also providing a powerful suite of analysis tools.

A screenshot of my latest wikimap project, showing my custom tools interface.
A screenshot of my latest wikimap project, showing my custom tools interface.

Leaflet Draw tools

Some of these are measurement tools. The best out-of-the-box tools for drawing vectors and measuring lengths and areas on a Leaflet map are included in the Leaflet Draw library. This library has become the standard drawing plug-in for Leaflet, used for such apps as geojson.io, and for good reason: it’s lightweight, elegant, and functionally versatile. Unfortunately for me, all of the documentation in the README is geared toward using the toolbar that is integrated into the library (left). What to do if I need to design my own toolbar? While developers at Mapbox and CartoDB are super good at reverse-engineering and editing tools for their own needs, I’m still at a API-reading and Google-for-examples skill level. Plus I didn’t really want to modify the library itself to fit my needs; I just wanted to make use of its internal structure in a way that didn’t require the vertical toolbar.

There IS a way to do this, but it took me several hours to figure out, mainly because tapping into the library’s innards isn’t covered in the API.  I’ll paste the code I came up with below, then walk through it. First, for measuring distances:

function measure(){
  var stopclick = false; //to prevent more than one click listener

  var polyline = new L.Draw.Polyline(map, {
    shapeOptions: {
      color: '#FFF'
    }
  });
  polyline.enable();

  //user affordance
  $("button[name=measure] span").html(messages.beginmeasure);
	
  //listeners active during drawing
  function measuremove(){
    $("button[name=measure] span").html(messages.distance + polyline._getMeasurementString());
  };
  function measurestart(){
    if (stopclick == false){
      stopclick = true;
      $("button[name=measure] span").html(messages.distance);
      map.on("mousemove", measuremove);
    };
  };
  function measurestop(){
    //reset button
    $("button[name=measure] span").html(messages.measure);
    //remove listeners
    map.off("click", measurestart);
    map.off("mousemove", measuremove);
    map.off("draw:drawstop", measurestop);
  };

  map.on("click", measurestart);
  map.on("draw:drawstop", measurestop);
}

First, I should point out that the measure() function is called by the onclick attribute of the “Measure distance” <button>. Immediately, I define a boolean variable, stopclick, which will be used to ensure that the following code won’t set duplicate event listeners, which can get messy. Then, to start drawing the measure line, I use:

var polyline = new L.Draw.Polyline(map, {
  shapeOptions: {
    color: '#FFF'
  }
});
polyline.enable();

That’s it. Just a new object instance and a call to the .enable() method (although it isn’t shown in the API) starts up the drawing tool. But it’s not really obvious to the user what to do next, and I also want to display the distance inside the interface button to make it extra easy to see. So the first thing to do is tell the user how to use the draw tool:

//user affordance
$("button[name=measure] span").html(messages.beginmeasure);

Here I’m referencing a separate object, messages, that holds every word of human language that appears anywhere in the interface. In this case, messages.beginmeasure holds the string "Click map to begin". I do this because the map must be bilingual, and storing my interface strings in a separate object with a copy of each in French (the official language of Senegal) and one in English will facilitate switching between the languages.

Next, I have three event listener handlers:

//listeners active during drawing
function measuremove(){
  $("button[name=measure] span").html(messages.distance + polyline._getMeasurementString());
};
function measurestart(){
  if (stopclick == false){
    stopclick = true;
    $("button[name=measure] span").html(messages.distance);
    map.on("mousemove", measuremove);
  };
};
function measurestop(){
  //reset button
  $("button[name=measure] span").html(messages.measure);
  //remove listeners
  map.off("click", measurestart);
  map.off("mousemove", measuremove);
  map.off("draw:drawstop", measurestop);
};

The first handler, measuremove(), simply updates the button contents with the measurement string constantly as the user moves their cursor across the map. Notice I had to use an internal function, _getMeasurementString() to get at this info, which is unfortunate. The library really should include a simple getMeasurement() method available and published in the API. But it doesn’t. Oh well.

The next handler, measurestart(), makes sure that we’re not trying to pull measurement data before the user starts drawing, because that gets messy and starts throwing errors in the console. I have to use a click listener on the map to trigger this, but only want to trigger it once instead of each time the user clicks on the map. Hence stopclick. The handler only executes its code if stopclick hasn’t been altered from false to true yet, and within that code it sets stopclick to true. It’s going to change the contents of the button to say "Distance: " and apply a mousemove listener with the measuremove() handler discussed above.

Finally, we have the measurestop handler. This is going to reset the button contents to the original "Measure distance" string, then remove the three event listeners added within the measure() function so we don’t place any duplicate listeners.

Then finally:

map.on("click", measurestart);
map.on("draw:drawstop", measurestop);

The listeners above should be fairly self-explanatory: when to start measuring (when the user clicks on the map) and when to stop. Ah! With the second listener, we finally get to use something that’s actually specified in the API: the "draw:drawstop" map event. Well, okay, the initial Polyline options are listed in the API too. But not a lot else.

I got all done making this linear measure go, then decided: why stop there? Wouldn’t it be nice for my users to be able to measure areas too? Like, the area of a farmer’s field or the size of a village, perhaps? So I did one for area too:

function measureArea(){
  var stopclick = false; //to prevent more than one click listener

  var polygon = new L.Draw.Polygon(map, {
    showArea: true,
    allowIntersection: false,
    shapeOptions: {
      color: '#FFF'
    }
  });
  polygon.enable();

  //user affordance
  $("button[name=measureArea] span").html(messages.beginmeasure);
	
  //listeners active during drawing
  function measuremove(){
    $("button[name=measureArea] span").html(messages.area + polygon._getMeasurementString());
  };
  function measurestart(){
    if (stopclick == false){
      stopclick = true;
      $("button[name=measureArea] span").html(messages.area);
      map.on("mousemove", measuremove);
    };
  };
  function measurestop(){
    //reset button
    $("button[name=measureArea] span").html(messages.measureArea);
    //remove listeners
    map.off("click", measurestart);
    map.off("mousemove", measuremove);
    map.off("draw:drawstop", measurestop);
  };

  map.on("click", measurestart);
  map.on("draw:drawstop", measurestop);
};

This one’s very similar to the first one, except for the options and a few different messages. One important thing with the options that, again, the API doesn’t tell you is that if you want the Polygon tool to show the area measurement, you have to set allowIntersection to false. That took digging through the source code to figure out.

I hope my multiple hours of trial and error can help prevent the same headache for someone else. And, ideally, I hope these holes in the Leaflet Draw documentation get filled. It really is a powerful set of tools. Happy playing!

Measure Area
Leaflet Draw custom area measurement tool

D3 Selection Mysteries… Solved?

Edit: It was brought to my attention that I should acknowledge the excellent resource for learning D3 provided by Scott Murray in his book Interactive Data Visualization for the Web and in his AlignedLeft blog tutorials. I can’t say enough about how well-written and beginner-friendly the book and tutorials are. This post is simply meant to extend the understanding I got from reading Murray’s blog post on Binding Data, which I highly recommend reading first.

The selectAll-data-enter sequence used to create multiple selections in D3 has always been a bit mysterious to me. I even went so far as to call .enter() “one of the great mysteries of the universe” while teaching it. But it’s really not all that mysterious. I don’t know why I didn’t think to do this until now, but a simple series of console.log statements can reveal the stages of multiple selection creation and data binding.

1. Empty selection:

var provinces = map.selectAll(".provinces");
console.log(provinces);

screenshot1

Here we have an empty selection. Note that it is simply an empty set of nested arrays. I’m still not sure why this is a nested array instead of a single-layer array, but I’m sure there’s a good reason. In any case, the inner array appears to be what’s important.

2. Adding the Data:

var provinces = map.selectAll(".provinces")
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features);
console.log(provinces);

screenshot2

When an array of data is added to the selection, a number of blank slots are created in the array that matches the number of elements in the data array.

3. Binding the Data:

var provinces = map.selectAll(".provinces")
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter();
console.log(provinces);

screenshot3

What do each of those data objects look like?

screenshot4

Enter binds the data to the selection, so that each element in the selection array is now an object holding its datum within the __data__ property.

4. Appending an Element:

var provinces = map.selectAll(".provinces")
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter()
    .append("g")
    .attr("class", "provinces");
console.log(provinces);

screenshot5

So what’s in each <g> element?

screenshot6

Notice how the datum (__data__) has been attached as a property of the element. This is what gets passed every time a method further down in the block calls an anonymous function with a d parameter.

Smell that? That smells like… understanding.

Bonus! Here’s the video of Mike Bostock’s awesome keynote at FOSS4G2014 back in September:

Ten Rules for Coding with D3

Anyone familiar with JavaScript who has tried their hand at D3 knows that coding in it is a little, well, different. For instance, take this snippet of code included in the dummy example I wrote for the UW-Madison Geography 575 lab my students were just assigned:

var provinces = map.selectAll(".provinces")
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter()
    .append("g")
    .attr("class", "provinces")
    .append("path")
    .attr("class", function(d) { return d.properties.adm1_code })
    .attr("d", path)
    .style("fill", function(d) {
        return choropleth(d, colorize);
    })
    .on("mouseover", highlight)
    .on("mouseout", dehighlight)
    .on("mousemove", moveLabel)
    .append("desc")
    .text(function(d) {
        return choropleth(d, colorize);
    });

Now, someone who is familiar with jQuery or Leaflet will probably recognize the method chaining, and some of the methods may even look familiar. But what’s really going on here is somewhat more complex than the syntax lets on. This fall, I’ve had to put a lot of attention into figuring out how to teach this powerful data visualization library to Cartography majors, many of whom had never written a line of JavaScript before taking the class. Fortunately, I’ve had some great tools at my disposal, including Scott Murray’s excellent book, Mike Bostock’s thorough API documentation, and the awesome D3 Examples Gallery. In making use of these resources, it has come to my attention that there’s a set of unwritten but generally agreed-upon conventions for D3 code that go beyond those of ordinary JavaScript. I’ve also decided that there are a few practices that may not be used universally by D3 programmers but help make the workings of the code more clear for newbies, and therefore should become standard practice. Finally, while teaching this week, I found myself inventing a bit of terminology and combining it with other words defined by Mike Bostock to describe D3 coding to students. It dawned on me that sharing my own set of D3 rules via a blog post might be useful to others who are in the process of making heads or tails of the library, so I humbly offer these up as suggestions.

D3 Code Rules

Chain syntax is not a new term; it refers to the syntax pioneered by jQuery that allows you to piggyback methods in sequence. D3 raises method chaining to an art form, resulting in chains that can get quite long and unwieldy. As Scott Murray puts it, “Both I and your optometrist highly recommend putting each method on its own indented line.” As in the code above, this formatting practice is used universally in the examples posted to the D3 Gallery to make the code neat and understandable.

Rule 1: Put each new method on its own indented line.


When writing the lab tutorial, I took to calling these chunks of chained methods code blocks or just blocks, which makes sense given a) their nice rectangular gestalt and b) Bostock’s bl.ocks.org site, a viewer for code saved on Gist. I recognize that Bostock may have meant “blocks” as a synonym for “Gists,” that is, whole snippets of sharable code; but I think it works better as a term for the segments of chained methods within the code. Since he didn’t explicitly define what a block is, I am taking the liberty to do so in the way that’s most useful to me.

Two things about blocks have already been conceptual snags for my students. The one I expected and hopefully inoculated them against was misplaced semicolons. Since JavaScript is conveniently sloppy and lets you get away with not placing a semicolon at the end of a statement in unminified code, beginners tend to think that semicolon placement doesn’t really matter. One of my most common errors in writing D3 is to tack more methods on to the end of a block I finished earlier and accidentally forget to move the semicolon, which of course breaks the code because now you have orphan methods that don’t reference anything. For instance:

var provinces = map.selectAll(".provinces")
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter()
    .append("g")
    .attr("class", "provinces"); //SEMICOLON FAIL
    .append("path")
    .attr("class", function(d) { return d.properties.adm1_code });

The code above will break at .append(“path”) because that method now references nothing, since the semicolon above it ended the block.

Rule 2: If your code breaks, look for a wayward semicolon.


The second conceptual snag, which was less anticipated, was the struggle it’s taking for students to get what the methods actually reference, and even how they can tell which methods belong to D3 versus native JavaScript or some other library. It’s true that lots of these methods—.on, .append, .attr, etc.—are written the same way in multiple code libraries. I’ve found myself explaining that you have to reach backwards through the sequence of methods to find the original operand (the thing being operated on) and determine how it was created or selected. Understanding the flow of the script is one of the hardest things for a beginning web developer to learn, and stepping through the code forwards and backwards is a good way to become more familiar with it. (One of the most popular mini-assignments I give my students is to comment every line of a code snippet). It’s like the game Mousetrap, or any other Rube Goldberg machine for those of a different generation. Find the first stimulus in the reaction chain and you should be able to see whether that operand starts with d3, $, L, or just a plain JavaScript object/value. This also will determine what methods are available to use to manipulate that object.

Rule 3: The methods depend on how the operand was created.


In D3, the operand is often either a selection or a new element. Selection is a D3 term defined by Bostock as “an array of [markup] elements pulled from the current document.” A new element is a markup element added to the document. .select and .selectAll create a new selection (.select puts only one element in the array), while .append and .insert create new elements. The methods that follow an operand and do things to it Bostock calls operators. Thus, a code block may contain several operands, with each operator referencing the most recently selected or created element, e.g.:

var provinces = map.selectAll(".provinces") //FIRST OPERAND--SELECTION
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features) //OPERATOR ON SELECTION
    .enter() //OPERATOR ON SELECTION
    .append("g") //SECOND OPERAND--NEW ELEMENT
    .attr("class", "provinces") //OPERATOR ON NEW ELEMENT

This can result in confusion if too many new elements are created in a single block. It is a good idea to create only one new element with each block, so you know what the variable assigned to the block is referencing and can easily access it again without creating a new selection. You can always pick up the selection and add on to it in a new block.

The code above violates this principle; I wrote it before I had solidified my own practices. So let’s fix it:

var provinces = map.selectAll(".provinces") //SELECTION
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter()
    .append("g") //NEW ELEMENT
    .attr("class", "provinces")
    .append("path") //NEW ELEMENT
    .attr("class", function(d) { return d.properties.adm1_code })
    .attr("d", path)
    .style("fill", function(d) {
        return choropleth(d, colorize);
    })
    .on("mouseover", highlight)
    .on("mouseout", dehighlight)
    .on("mousemove", moveLabel)
    .append("desc") //NEW ELEMENT
    .text(function(d) {
        return choropleth(d, colorize);
    });

…changes to…

var provinces = map.selectAll(".provinces") //SELECTION
    .data(topojson.feature(europe, europe.objects.FranceProvinces).features)
    .enter()
    .append("g") //NEW ELEMENT
    .attr("class", "provinces");

var provincesPath = provinces.append("path") //NEW ELEMENT
    .attr("class", function(d) { return d.properties.adm1_code })
    .attr("d", path)
    .style("fill", function(d) {
        return choropleth(d, colorize);
    })
    .on("mouseover", highlight)
    .on("mouseout", dehighlight)
    .on("mousemove", moveLabel);

var provincesDesc = provincesPath.append("desc") //NEW ELEMENT
    .text(function(d) {
        return choropleth(d, colorize);
    });

Sure, it’s a little longer, but now we have three variables instead of one, each referencing its own set of elements in the selection. All three of these blocks reference the same selection, and since this is a .selectAll selection, the methods in each will apply iteratively using the same data given to the selection in the first block (see this page of the API for more info on selections; or read the simplified explanation in the book).

Rule 4: Create only one new element (or element set) per block.


Notice that I assigned each block to its own variable, which I didn’t have to do for the code to work at this stage. Again, the variable will reference the last operand (selection or new element) in the block if operators are called on it in the future. I find that assigning each block a variable makes it easier to reference the operand as needed, both in future code and in tutorials that explains the code. In this sense, the variable each block is assigned to functions as the name of the block. For instance, if I am working with a student having difficulties, I can say something like, “take a look at your provinces block” or “check the syntax of the provincesPath block.”

Rule 5: Assign each block to a logical variable (the block’s ‘name’).


It sometimes happens that you need to create a new selection of elements that were placed in the document or otherwise reference those elements for styling with CSS. If you have a lot of elements being created by D3, inspecting the document can get confusing. To keep things consistent between the various parts of the DOM, I usually assign each new element a class name that is the same as the name of the block that creates it. That way, I know where the elements I create are coming from in the code.

One of the blocks above (the provinces block) does this; the other two new blocks do not. In the case of the provincesPath block, I needed to assign unique class names to each element in the array based on the data, as those class names are used later in the code to link these path elements to other elements in other graphics. At the time I wrote it, I didn’t think to give it two class names (separated by a space), but that is a logical solution. The desc element set probably should also get a class, now that it’s in its own block. Let me fix these issues now:

var provincesPath = provinces.append("path")
    .attr("class", function(d) { 
        return d.properties.adm1_code + " provincesPath"; //ADDED A SECOND CLASS
    })
    .attr("d", path)
    .style("fill", function(d) {
        return choropleth(d, colorize);
    })
    .on("mouseover", highlight)
    .on("mouseout", dehighlight)
    .on("mousemove", moveLabel);

var provincesDesc = provincesPath.append("desc")
    .text(function(d) {
        return choropleth(d, colorize);
    })
    .attr("class", "provincesDesc"); //NEW CLASS

Rule 6: Assign each new element a class name identical to the block name.


Using element classes (as opposed to ids) is especially important with D3, since you need multiple elements with identical names to use .selectAll and create a multiple-element selection. But what about using .selectAll to create an empty selection? An empty selection (again, Bostock’s term, though poorly explained in the API) happens when .selectAll is applied to a selector that does not yet exist in any elements in the document. One of the cognitively challenging concepts in D3, it essentially creates a placeholder in the DOM for elements-to-be. The provinces block above starts by creating an empty selection; it applies the “.provinces” selector, which does not match any existing elements at the time .selectAll is called. The elements (new <g> tags) are actually created three lines down and assigned their class name on the line below that. So why bother feeding .selectAll a selector in the first place? It actually does work to omit the selector, i.e.:

var provinces = map.selectAll(".provinces")

//WORKS THE SAME AS

var provinces = map.selectAll()

//WHEN CREATING EMPTY SELECTIONS

But the problem here is, say you call this method inside of a function that could be used to both create new elements and reset the matching elements if they exist? Without the selector, you’ll be stuck just creating more identical elements rather than grabbing any existing ones from the document to manipulate. Aside from this “just in case” scenario, there is something to be said here once again for human semantics—the selector links the .selectAll statement visually to the elements that will be created later in the block.

Rule 7: Always pass the block’s name as a class selector to the .selectAll method, even when creating an empty selection.


Making groovy visualizations is all about how you style the elements on the page. The great advantage of D3 is that it gives you massive power to dynamically assign and modify the positioning, size, color, effects, animations, etc. of the elements you use it to create based on the data you pass to it. In many instances, though, there may be some elements that do not need to be modified after they are created, and others for which it is helpful to have a default style that can be overridden by user interaction. In these instances, it makes sense just to assign the element(s) a class and use CSS to create some static styles. For instance:

//IN THE SCRIPT

var countries = map.append("path")
    .datum(topojson.feature(europe, europe.objects.EuropeCountries))
    .attr("class", "countries")
    .attr("d", path);

//IN A CSS STYLESHEET

.countries {
    fill: #fff;
    stroke: #ccc;
    stroke-width: 2px;
}

Rule 8: Assign static or default styles using a CSS stylesheet.


When styling dynamically, of course, you want to assign styles in your code blocks. SVG graphics can be styled by passing the style rules as either attributes or in-line CSS styles. You might think (as I did when I started) that passing the styles as individual attributes would take precedence over in-line CSS rules assigned to a single style attribute, but in fact it’s the other way around. For instance:

    .style("fill", function(d) {
        return choropleth(d, colorize);
    })

//OVERRIDES

    .attr("fill", function(d) {
        return choropleth(d, colorize);
    })

Things can get confusing if you assign a style rule as a style in one place and then try to re-assign it as an attribute in another. Thus, it’s best to pick one or the other, and style generally seems more appropriate to me. Note that this does not apply to element x/y positions or path d strings, which are only available as attributes.

Rule 9: For dynamic inline style rules, use .style instead of .attr.


Through all of these recommendations, I haven’t really touched on the data that is going into the element creation and manipulation. D3 works with data in the form of arrays. The combination of .select and .datum executes the operators following it once, treating the data passed to .datum as a single data point (or datum). The combination of .selectAll, .data, and .enter prime the selection to execute the following operators once for each value in the array that is passed to .data.

The three main data types for single values in JavaScript are Number (e.g., 42), String (e.g., “the answer to life, the universe, and everything”), and Boolean (e.g., true or false). As a weakly typed language, JavaScript doesn’t make you declare the data type of variables and lets you play fast and loose with the different datatypes. But since the outcome may differ for certain operations depending on the data type, it’s best to pay close attention to what type you are passing the data as (console.log(typeof d)), and force-type it before use if necessary.

Rule 10: Make sure the data are the correct type for the operations using them.


There is lots more that can be said—and hopefully will be said—about coding with D3. For instance, I haven’t even mentioned generator functions—functions that return other functions—which deserve a whole blog post to themselves. These rules and terms are suggestions, but I realize every developer has their own style and there could even be logic errors in mine. I don’t really care whether you start using what I’ve defined here. Rather, my take-home message is this: we should be thinking not only in terms of how to make sense of D3 ourselves, but also how to teach it to others in a logical and consistent fashion. I am sure I’ll come up with more ideas about this over the next few weeks of teaching experience, and I hope that others add theirs on as well.

Whither the Wikimap?

Rationale

This post is complimentary to my identically-named conference talk for NACIS, the slideshow for which is above. I have tried to distill the content of the talk as best I could here.

I did my master’s thesis project on creating a wikimap to be a form of “online participatory mapping” that could hopefully empower people living in an area facing the possibility of a large-scale open-pit mining project to visualize the resources and values that could be impacted. The project taught me several things, the key one being, don’t try to do participatory mapping on your own as a master’s thesis. Beyond reinforcing the knowledge that I tend to bite off more than I can chew, it made me question some of the hypotheses and outright assumptions of those who herald the ‘democratization’ supposedly wrought by web mapping technologies and VGI as perhaps a bit over-enthusiastic. Here is my attempt to put into words, in brief, what came out of my project and some new angles from which to look at wikimaps as they are studied further.

1. Ideas

You might say this all started with the diametric opposite of a free, open-source, open-data project: the UK Ordinance Survey. Back in 2004, a bloke named Steve Coast got frustrated that he had to pay for OS data and got a group of people together to drag GPS units around London and upload the data to a public website. Thus was born OpenStreetMap, which is now the oldest and biggest wikimap of them all. The ‘crowdsource’ movement inspired by Wikipedia found geographic expression in 2006 with Wikimapia, the first attempt to map literally anything and everything on the internet. A year later, people started bludgeoning each other in Kenya over the outcome of a tense election, and Ushahidi was born to map eyewitness accounts of the violence. The organization has since spun off its wikimap platform as Crowdmap, which has been used to assist responders in several high-profile calamities.

2. Hypotheses

To observers of Web 2.0, the common thread of these kind of maps seems to be empowerment. Even before OSM hit the web, some cartographers and geographic information scientists were predicting that increased interactivity would drive people-powered mapping on the internet. In 2003, Michael Wood wrote for The Cartographic Journal that mapping was being restored to its rightful position as universal birthright because “now [the user] can truly interact with the map (and with its data sources) and become, in the process, a map-maker also.” Four years later, Michael Goodchild rocked the GIS world by inventing the dubious term “Volunteered Geographic Information” and rebranding humanity as “six billion sensors.” For their part, social scientists have declared that web maps with local knowledge can help Indigenous and other marginalized groups “facilitate the reappropriation of contested places” when used in concert with participatory methods.

3. Applications

Because I wanted to do something neat and helpful for my thesis, and because I was just bull-headed enough to believe I could learn enough code to pull it off, I centered my project around creating a wikimap of a rural watershed in northern Wisconsin, with the goal of allowing users to add stories and multimedia and tag landscape values on the map. The design process involved sitting down with local community members to explore the uses they saw in such an application and seeking feedback from the participants on multiple prototypes. I held a few workshops that were not as well attended as I had hoped but nonetheless produced the bulk of the data that ultimately ended up on the map. I wrote into the code functions to track types of interactions that users were engaging in, which formed the backbone of my analysis. I had no real way to gauge how effective the map was at creating dialogue in the area, but I did garner some valuable insights into how users used the map.

Based on what I learned during my thesis project, I started working on a second wikimap, this one covering a hot resource extraction issue in a different part of the state. I sought to modify and improve the interface and symbolization strategies, vary the tools on offer for different levels of user commitment, and connect with social media. As it stands, this can generously be called a work in progress. Truthfully, I ran out of motivation to finish it over the summer, but still hope to pick it back up by the end of the year and have it out before spring.

4. Findings

How do people use wikimaps? Of all the questions I had hoped my thesis to answer, this one got the best results—and not what I expected given all the talk about user empowerment. Most of those who visited the map did one or both of two things: they engaged with the basemap, exploring by panning and zooming and changing layers, or they engaged with the volunteered data by looking at the names, content, and characteristics of the marked sites. In fact, only thirteen percent of my users did any contributing to the map whatsoever, and of those only half made it their primary task when they visited the site.

Now, one might chalk this up to my little project being on the wrong side of the adoption chasm, but it turns out to be not too far off the mark of what other research says about the biggest, oldest, most well-used of wikimaps. UIUC grad Nama Budhathoki did a dissertation on OpenStreetMap in 2010 and found that only 30% of registered users had ever contributed a single thing (to say nothing of all the guest visits), and of those, less than two-thirds contributed more than once. Further, those who contribute are mostly “educated and tech savvy males with some prior experience in geospatial technology… predominantly from Europe and North America.” Clearly, this falls short of the expectation that wikimaps will bring about an egalitarian mapping paradise.

One possible bright spot may be the use of web maps along with in-person guidance by a facilitator in a process that is monitored and controlled from within the community they are specialized for. A colleague of mine at UW-Madison has been doing participatory mapping in the same area where I located my thesis, but with a more exclusive focus on supporting residents of the local Native American community (mine included the surrounding, predominantly White communities). Using her preexisting status as a leader of a program for tribal youth, Jessie Conaway had the kids interview elders on the reservation about places that were important to them, then added these stories to a simple web map as well as physical maps that could be conversation pieces for presentations by tribal members. The process fostered intergenerational understanding as well as gorgeous maps. This project seems to have been a success—although thanks to old-fashioned cooperation and relationship-building, not so much to shiny Web 2.0 technology.

5. Questions

So where do we go from here? In academia, we seem to focus on building upon our specialty, with ample disincentive to critique the founding principles of our own research, however much we might critique its current state. I’ll go out on a limb and argue this is probably what’s going on with the title and conclusion of the paper Budhathoki co-published with Zorica Nedovic-Budic: “How to motivate different players in VGI?” This is one question, and it may be legitimate. The easiest possible answer, if it were true, would be by tweaking the design of future wikimaps so as to have an easier-to-use interface, better UI controls, nicer symbols, etc. Perhaps implementing user rating systems would lend trust to the volunteered data while connecting to social media might give the map a chance to “go viral.” I aim to play with these variables as I work toward completing my ongoing project.

But at a deeper level, this all seems like fiddling around with the shell and missing the nut of the problem. My critique of the aforementioned question-title, if I might offer one, is that assumes that the cause of low participation lies with the motivation (or lack thereof) of non-White non-GIS professional non-European-or-North-Americans, and further implies that simply motivating these players is the right goal to have. I’m not convinced of this. I’d rather we mapping professionals back up a step and ask whether wikimaps ever can be universalized. That is, might we come to recognize that our emancipating projects will never appeal to everyone? Doing so does not inherently remove the worth of wikimaps, but it could allow us to explore their limitations and boundaries so as to recognize just who our projects are serving and how, and thus act more honestly in their creation.

Looking at wikimaps this way allows us to be similarly critical of the data they seek to visualize. Can we really call this data “democratizing?” Many of the bold claims about VGI and democracy have yet to be tested scientifically. With recent revelations that mobile device users tend to unwittingly contribute locational data to commercial and government actors with dubious interests, VGI may walk a thin line between empowerment and surveillance. Wikimaps and other VGI applications may actually inherently reinforce power imbalances, since anyone may contribute but the tech-savvy professionals are still ultimately in control of the uses to which the data are put.

Coda

I hope that this post (and talk) do not dissuade more people from making wikimaps. In fact, we need more wikimaps to test whether and where and when they are actually a good thing, and how to make them better for these situations. Clearly, people get something out of OpenStreetMap, Wikimapia, and Ushihidi. Clearly, users were interested in seeing what my thesis wikimap could teach them. But let’s not go in with the assumption that our projects are going to appeal to everyone we want them to, or that we can create a level playing field for all users to contribute equally. And I think that for honesty’s sake, we need to stop the talk of democracy and empowerment coming from Web 2.0 until we have something actual to show for it.

Putting Homelessness on the Map

We cartographers know that maps have the power to make features of the landscape visible or invisible in the public consciousness. A bike map emphasizes bike routes but doesn’t show you, say, forested and non-forested areas, while a USGS topo map shows you where a platoon could hide from choppers in the summer but not what streets have a low traffic volume.

One of the things that some maps try to make visible isn’t things, but people. Census data provides a rich trove of demographic data to visualize, putting various categories of people “on the map” and powerfully impacting the way we think of the communities we inhabit. In the past few years, a popular way to do this has been through dot maps, where one dot equals a certain number of people and the dots are randomly placed within the census blocks where the corresponding people live.

This trend follows the work of radical cartographer Bill Rankin, who points out that the more traditional choropleth maps “make a strong visual argument. They make it seem that human beings are naturally divided into relatively homogenous groups, separated by sharply-defined boundaries.” Obviously, in the real world this isn’t the case; people blend with each other across space. It’s all the more powerful, then, to see the smooth transitions of racial groups more accurately represented, and still note startlingly sharp divisions between neighborhoods. It’s one thing to talk about America’s persisting racial segregation problem in abstract terms; it’s quite another to see it writ large in crayola-box color, as in the below map of Madison put together in 2011 by the UW Applied Population Lab.

A map showing Madison's population as different colored dots for each race
Madison’s racial dot map

But herein lies a rub: maps like this might better represent a phenomenon, but they can only be as truthful about it, at most, as the underlying data. Census maps of any kind may serve to make certain people invisible—those without addresses. The nonprofit organization Porchlight estimates that over 3,500 people each year experience homelessness in Madison. Can a map like this tell us something about who the homeless are?

I was first struck by this question about two years ago, when I first saw the APL map at a statewide conference of GIS professionals. Staring mesmerizedly at the poster, I suddenly noticed a surprising incongruity.

2010Census_Madison_zoomin
Why is that block blue?
2010Census_Madison_blockzoom
Blue block

In a sea of green dots representing white people, one census block was populated almost entirely by blue, representing African-Americans. Living in a U.S. city, you come to expect such segregation at the neighborhood level, but a single block? What was going on here? This block in particular happened to be across the street from the apartment I was living in at the time, and I knew that it was not a very residential one, but was home to businesses and light industry. I thought through my mental map of what was there. A Goodyear garage. A warehouse? A post office. It struck me that there was a USPS carrier annex somewhere along that side of East Washington Avenue quite close to my apartment, and I came to the conclusion that all those blue dots must represent mailing addresses that few people with homes would use, i.e., P.O. Boxes.

Post Office Annex, 700 Block of East Washington Avenue
Post Office Annex, 700 Block of East Washington Avenue

This isn’t quite the end of the story. What prompted me to go back and write about this was a conversation I had in the car the other day with my fiancée and her mother, who is a sociologist, in which I pointed out this phenomenon. In seeking to verify what I thought I was seeing to write about it intelligently, I discovered that I had misremembered the exact location of the post office, and in fact it was one block over from the peculiar census block. I also called them up, and it turns out this particular carrier annex does not have post office boxes or accept general delivery. I used Google Maps to survey what actually is on the mystery block that might account for non-resident addresses, and it turns out to be:

Salvation Army Social Services Building, 600 Block of East Washington Avenue
Salvation Army Social Services Building, 600 Block of East Washington Avenue

I called up the Salvation Army, and the woman I spoke to informed me that yes, they did have “quite a few” people who received mail there; she wasn’t quite sure how many, but estimated 20-50 at any given time. But I count 131 dots on the block. There also is a 32-unit upscale condo development on the back side of the block facing Mifflin Street completed in 2008, although with units going for $190-$400K during the big real estate crunch, I’m going to guess they weren’t quite full up when census workers came around in 2010. Perhaps the Salvation Army’s social workers were helping people who didn’t usually get mail there fill out census forms as well, using their address? It also seems likely that many of those 20-50 individuals have dependent children who would have been reported on the forms. If we had the individual addresses on the forms, we eliminate the condo-dwellers and see for sure who was relying on an institutional address. One thing we can say with some confidence, though, is that most people belonging to the latter category were black. And that says something about who is homeless in Madison.

There is no “I” in internet

I am beginning my maps blog with a polemic against faulty capitalization on the internet… of “the internet.”

(Why on a maps blog? I’m a grad student who makes a lot of web maps, and my faculty advisor is always on my case about capitalizing the internet. Not that I’m blaming him–this is a widespread bad convention.)

Yes, the AP Stylebook, the New York Times, and every academic who thinks they know how to spell capitalizes the word “Internet,” according to Wikipedia and this blog post. But I respectfully disagree with them.

Supposedly, “Internet” is to be capitalized because it is claimed to be a proper noun, the name of a particular unique entity, as distinguished from generic “internets” or “internetworks.” But the internet isn’t a name; it’s a thing, and most things that are written with “the” preceding them are common nouns. I don’t go shopping at the Grocery Store and I don’t go to the Beach, even though those are particular places. I might have only one grocery store or one beach I ever go to, and I mean the same one each time I say it, but that doesn’t give them the right to capitalization.

I might go to The Beach if it’s referring to the name of the waterpark near where I grew up, but in this case “The” is also capitalized. For that matter, if I’m at a concert of The Shins or The White Stripes, I’m not at a concert of the Shins or the White Stripes, because the former two are band names and latter two are wrongly-capitalized common nouns.

So take that, the internet. I’m bringing you down a peg.

More fun posts to come.