Using msLandscape to create customized landscape configurations for simulation with ms

msLandscape overview

msLandscape is a toolbox to make customized landscape-scale simulations with the coalescent simulator ms easier and faster. msLandscape consists of an R package, Python scripts, and a bash shell script wrapper around ms that work seamlessly together, so you can focus on building your landscape instead of needing to manually write ms instruction files (referred to here as ‘flag files’) or cobbling together data conversions.

The msLandscape R package contains three R functions:

msLandscape_networkPlotter - This quickly visualizes any landscape configuration (population locations and their migration connections) contained in an ms flag file, including information about the number of individuals sampled from each population and the migration rate between each pair of populations.
msLandscape_convertToEEMSDiffs - This is a utility function to finish the conversion of simulated data from ms into the form pairwise differences format required for EEMS. Use of this function is detailed in the ‘msLandscape data conversion’ tutorial.
msLandscape_layerPNGImages - This is a utility script to create composite images of two or more .png images that are stacked on top of each other using transparency (average color for each pixel across all images) to allow the features of all of the images to be visible relative to each other. This is meant to allow comparisons of EEMS, SpaceMix, or un-PC output under different scenarios. This works best with relatively few images. Use of this function is detailed in the ‘msLandscape layer images’ tutorial.

This overview will only work with the msLandscape_networkPlotter and three of its accompanying scripts:

msLandscape_create_ms_flagFile.py
msLandscape_cleanup_ms_flagFile.py
msLandscape_ms_multipleSimulationWrapper.sh

To follow along easily with exactly what is written below, you will need to make these scripts executable. On Linux or MacOS, this can be done by running chmod u+x <scriptName> in a Terminal window from within the directory containing the scripts, replacing <scriptName> with the name of the script (e.g. chmod u+x msLandscape_create_ms_flagFile.py).

Alternatively, instead of pre-pending the ./ to the scripts in order to run them as is done below (and which needs to be run in the directory that contains the scripts), the Python scripts (ending in .py) can be run using e.g. python msLandscape_create_ms_flagFile.py, and the bash shell script (ending in .sh) can be run using e.g. bash msLandscape_ms_multipleSimulationWrapper.sh Of course you can run the scripts from a different directory by providing the full path to the scripts when using them.

A minor note: If you are running these commands directly within the R markdown document, make sure to change the path to each of the Python scripts after the cd command in each of the {bash} code chunks. The cd commands in the code chunks are not printed to the screen when the .Rmd file is knit into its html output.

Let’s get started!

We start with the Python script to build the landscape we want and to automatically generate the instruction (or flag) file for ms that represents that landscape.

This landscape is built using hexagonal ‘tiles’ of one sampled (focal) population surrounded by six unsampled (ghost) populations. Let us take a look at how we generate one of these ‘tiles’:

./msLandscape_create_ms_flagFile.py -r 1 -c 1

## 
## Copy the following into a plain text file and use as a flag file for ms
## using the '-f' flag (e.g. ms nsam howmany -f <path to the plain text flag file>)
## 
## -s 1 -I 7 0 0 0 10 0 0 0 0.0 -m 1 5 3.0 -m 5 1 3.0 -m 1 4 3.0 -m 4 1 3.0 -m 1 2 3.0 -m 2 1 3.0 -m 2 4 3.0 -m 4 2 3.0 -m 2 3 3.0 -m 3 2 3.0 -m 3 4 3.0 -m 4 3 3.0 -m 7 4 3.0 -m 4 7 3.0 -m 6 4 3.0 -m 4 6 3.0 -m 5 4 3.0 -m 4 5 3.0 -m 3 7 3.0 -m 7 3 3.0 -m 7 6 3.0 -m 6 7 3.0 -m 6 5 3.0 -m 5 6 3.0
## 
## 
## This is the ASCII representation of the landscape that is simulated
## using the above ms simulation configuration:
## 
##            *          
##        :   :   :      
##     :      :      :   
## *          :          *
## :   :      :      :   :
## :      :   @   :      :
## :         @0@         :
## :      :   @   :      :
## :   :      :      :   :
## *          :          *
##     :      :      :    
##       :    :   :       
##            *           
## 
## 
## The coordinates for the focal (sampled) populations are
## (listed for populations from left to right, top to bottom):
## 
## 1 1

We have just created our first landscape! The output from the script has three parts that printed to the screen:

The first part is the actual ms instruction information encoded as a series of flags that tells ms what landscape configuration and specifications we want to simulate. We will come back to this later.
The second part is an ASCII representation of the landscape we just created. Because we just simulated a single hexagonal tile here, that is what is represented, with '*' representing unsampled (ghost) populations, the
```
 @
@0@
 @
```
representing the single sampled (focal) population for this tile, and the lines of ':' representing each pairwise migration connection between populations. This ASCII representation allows us to quickly verify whether the landscape that we have created matches what we were expecting, but we will work with more flexible way to plot the landscape below.
The coordinates of the focal population on the landscape - this really is minimal for a single population but becomes important then the landscape contains many focal populations and this information is used for un-PC, EEMS, and SpaceMix analysis after the ms simulation.

Expanding our landscape

Let us make this landscape a little larger by adding additional hexagonal tiles to it. We can use the ‘-r’ flag to specify how many rows of tiles we want for the landscape, and use the ‘-c’ flag to specify the number of columns of tiles we want for the landscape. We will make it have five rows and two columns:

./msLandscape_create_ms_flagFile.py -r 5 -c 2

## 
## Copy the following into a plain text file and use as a flag file for ms
## using the '-f' flag (e.g. ms nsam howmany -f <path to the plain text flag file>)
## 
## -s 1 -I 44 0 0 0 0 0 10 10 0 0 0 0 0 0 10 10 0 0 0 0 0 0 10 10 0 0 0 0 0 0 10 10 0 0 0 0 0 0 10 10 0 0 0 0 0 0.0 -m 1 8 3.0 -m 8 1 3.0 -m 1 6 3.0 -m 6 1 3.0 -m 1 2 3.0 -m 2 1 3.0 -m 2 6 3.0 -m 6 2 3.0 -m 2 3 3.0 -m 3 2 3.0 -m 3 6 3.0 -m 6 3 3.0 -m 10 6 3.0 -m 6 10 3.0 -m 9 6 3.0 -m 6 9 3.0 -m 8 6 3.0 -m 6 8 3.0 -m 9 8 3.0 -m 8 9 3.0 -m 3 10 3.0 -m 10 3 3.0 -m 3 7 3.0 -m 7 3 3.0 -m 3 4 3.0 -m 4 3 3.0 -m 4 7 3.0 -m 7 4 3.0 -m 4 5 3.0 -m 5 4 3.0 -m 5 7 3.0 -m 7 5 3.0 -m 12 7 3.0 -m 7 12 3.0 -m 11 7 3.0 -m 7 11 3.0 -m 10 7 3.0 -m 7 10 3.0 -m 5 12 3.0 -m 12 5 3.0 -m 9 17 3.0 -m 17 9 3.0 -m 9 14 3.0 -m 14 9 3.0 -m 9 10 3.0 -m 10 9 3.0 -m 10 14 3.0 -m 14 10 3.0 -m 10 11 3.0 -m 11 10 3.0 -m 11 14 3.0 -m 14 11 3.0 -m 19 14 3.0 -m 14 19 3.0 -m 18 14 3.0 -m 14 18 3.0 -m 17 14 3.0 -m 14 17 3.0 -m 11 19 3.0 -m 19 11 3.0 -m 11 15 3.0 -m 15 11 3.0 -m 11 12 3.0 -m 12 11 3.0 -m 12 15 3.0 -m 15 12 3.0 -m 12 13 3.0 -m 13 12 3.0 -m 13 15 3.0 -m 15 13 3.0 -m 21 15 3.0 -m 15 21 3.0 -m 20 15 3.0 -m 15 20 3.0 -m 19 15 3.0 -m 15 19 3.0 -m 13 21 3.0 -m 21 13 3.0 -m 21 20 3.0 -m 20 21 3.0 -m 16 24 3.0 -m 24 16 3.0 -m 16 22 3.0 -m 22 16 3.0 -m 16 17 3.0 -m 17 16 3.0 -m 17 22 3.0 -m 22 17 3.0 -m 17 18 3.0 -m 18 17 3.0 -m 18 22 3.0 -m 22 18 3.0 -m 26 22 3.0 -m 22 26 3.0 -m 25 22 3.0 -m 22 25 3.0 -m 24 22 3.0 -m 22 24 3.0 -m 25 24 3.0 -m 24 25 3.0 -m 18 26 3.0 -m 26 18 3.0 -m 18 23 3.0 -m 23 18 3.0 -m 18 19 3.0 -m 19 18 3.0 -m 19 23 3.0 -m 23 19 3.0 -m 19 20 3.0 -m 20 19 3.0 -m 20 23 3.0 -m 23 20 3.0 -m 28 23 3.0 -m 23 28 3.0 -m 27 23 3.0 -m 23 27 3.0 -m 26 23 3.0 -m 23 26 3.0 -m 20 28 3.0 -m 28 20 3.0 -m 25 33 3.0 -m 33 25 3.0 -m 25 30 3.0 -m 30 25 3.0 -m 25 26 3.0 -m 26 25 3.0 -m 26 30 3.0 -m 30 26 3.0 -m 26 27 3.0 -m 27 26 3.0 -m 27 30 3.0 -m 30 27 3.0 -m 35 30 3.0 -m 30 35 3.0 -m 34 30 3.0 -m 30 34 3.0 -m 33 30 3.0 -m 30 33 3.0 -m 27 35 3.0 -m 35 27 3.0 -m 27 31 3.0 -m 31 27 3.0 -m 27 28 3.0 -m 28 27 3.0 -m 28 31 3.0 -m 31 28 3.0 -m 28 29 3.0 -m 29 28 3.0 -m 29 31 3.0 -m 31 29 3.0 -m 37 31 3.0 -m 31 37 3.0 -m 36 31 3.0 -m 31 36 3.0 -m 35 31 3.0 -m 31 35 3.0 -m 29 37 3.0 -m 37 29 3.0 -m 37 36 3.0 -m 36 37 3.0 -m 32 40 3.0 -m 40 32 3.0 -m 32 38 3.0 -m 38 32 3.0 -m 32 33 3.0 -m 33 32 3.0 -m 33 38 3.0 -m 38 33 3.0 -m 33 34 3.0 -m 34 33 3.0 -m 34 38 3.0 -m 38 34 3.0 -m 42 38 3.0 -m 38 42 3.0 -m 41 38 3.0 -m 38 41 3.0 -m 40 38 3.0 -m 38 40 3.0 -m 42 41 3.0 -m 41 42 3.0 -m 41 40 3.0 -m 40 41 3.0 -m 34 42 3.0 -m 42 34 3.0 -m 34 39 3.0 -m 39 34 3.0 -m 34 35 3.0 -m 35 34 3.0 -m 35 39 3.0 -m 39 35 3.0 -m 35 36 3.0 -m 36 35 3.0 -m 36 39 3.0 -m 39 36 3.0 -m 44 39 3.0 -m 39 44 3.0 -m 43 39 3.0 -m 39 43 3.0 -m 42 39 3.0 -m 39 42 3.0 -m 36 44 3.0 -m 44 36 3.0 -m 44 43 3.0 -m 43 44 3.0 -m 43 42 3.0 -m 42 43 3.0
## 
## 
## This is the ASCII representation of the landscape that is simulated
## using the above ms simulation configuration:
## 
##            *                     *          
##        :   :   :             :   :   :      
##     :      :      :       :      :      :   
## *          :          *          :          *
## :   :      :      :   :   :      :      :   :
## :      :   @   :      :      :   @   :      :
## :         @0@         :         @0@         :
## :      :   @   :      :      :   @   :      :
## :   :      :      :   :   :      :      :   :
## *          :          *          :          *           
##     :      :      :   :   :      :      :   :   :       
##       :    :   :      :      :   :   :      :      :    
##            *          :          *          :          *
##            :   :      :      :   :   :      :      :   :
##            :      :   @   :      :      :   @   :      :
##            :         @0@         :         @0@         :
##            :      :   @   :      :      :   @   :      :
##            :   :      :      :   :   :      :      :   :
##            *          :          *          :          *
##        :   :   :      :      :   :   :      :      :    
##     :      :      :   :   :      :      :   :   :       
## *          :          *          :          *           
## :   :      :      :   :   :      :      :   :
## :      :   @   :      :      :   @   :      :
## :         @0@         :         @0@         :
## :      :   @   :      :      :   @   :      :
## :   :      :      :   :   :      :      :   :
## *          :          *          :          *           
##     :      :      :   :   :      :      :   :   :       
##       :    :   :      :      :   :   :      :      :    
##            *          :          *          :          *
##            :   :      :      :   :   :      :      :   :
##            :      :   @   :      :      :   @   :      :
##            :         @0@         :         @0@         :
##            :      :   @   :      :      :   @   :      :
##            :   :      :      :   :   :      :      :   :
##            *          :          *          :          *
##        :   :   :      :      :   :   :      :      :    
##     :      :      :   :   :      :      :   :   :       
## *          :          *          :          *           
## :   :      :      :   :   :      :      :   :
## :      :   @   :      :      :   @   :      :
## :         @0@         :         @0@         :
## :      :   @   :      :      :   @   :      :
## :   :      :      :   :   :      :      :   :
## *          :          *          :          *
##     :      :      :       :      :      :    
##       :    :   :            :    :   :       
##            *                     *           
## 
## 
## The coordinates for the focal (sampled) populations are
## (listed for populations from left to right, top to bottom):
## 
## 5 1
## 5 3
## 4 2
## 4 4
## 3 1
## 3 3
## 2 2
## 2 4
## 1 1
## 1 3

Now we can see from the ASCII representation that each row of tiles is staggered relative to the rows above and below it. (The staggering pattern can be inverted using the -s no option in the command above). The list of focal population locations now has the coordinates for each of the 10 focal populations. We also see that there are many more flags included in the ms instruction compared to before, and together they are getting too big to keep displaying on the screen.

Saving the output as files

If we do not want to print the output to the screen any longer, we can save it to files instead using the -o option. With the -o option, we specify the prefix stem for the output files we want made and there will be three separate files created:

the ms flag file
the ASCII representation
the coordinates of the focal populations

OK, let’s try it here:

./msLandscape_create_ms_flagFile.py -r 5 -c 2 -o writeOut

We see no output printed to screen now, but there are three new files in our directory, each with the ‘writeOut’ prefix stem that we specified using the -o option:

writeOut_haploidSamples_ms_nsam_100_msFlagFile.txt
writeOut_ASCII_tilePlotFile.txt
writeOut_popnCoordinatesFile.txt

Two notes about the ms flag file output:

The ‘haploidSamples’ part of the ms flag file name denotes that it will simulate haploid individuals (i.e. the number of simulated chromosomes is the same as the number of individuals to simulate). Simulating haploid samples is the default. This can be changed using the -d option to signify whether the ms flag file should simulate diploid individuals using -d yes (i.e. the number of simulated chromosomes is twice the number of indiviudals) or haploid by using -d no or just omitting the -d flag like we did above).
The ‘ms_nsam_100’ part of the ms flag file name gives the total number of individuals to be simulated by the flag file, in this case 100. This is the <nsam> value to use with ms when creating the simulation using this ms flag file. If we had used the -d yes flag above to simulate diploids, this <nsam> value would have been 200 instead.

Plotting the landscape configuration encoded by ms flag files

Let us now use the ms flag file to try out the more flexible network graph-based landscape plotter using the msLandscape_networkPlotter function in the msLandscape R package. This function only requires the path to the ms flag file in order to plot to the plot window in R (see below for an example that saves the plot as a pdf to a file).

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_100_msFlagFile.txt")

Now we can see the landscape configuration specified by that long ms flag file!

The populations are denoted by symbols:

Sampled (focal) populations have green-filled symbols with the size of the symbol proportional to the number of individuals that will be sampled from each population (this will be half the number of chromosomes sampled if the flag file is for diploids).
Unsampled (ghost) populations have gray-filled symbols that are all the same size.

The migration connection between each pair of populations is denoted by a line segment, with the width of the segment proportional to the specified migration rate between that pair of populations. Any migration connections in the ms flag file that are uni-directional (i.e. only specified in one direction between the population pair) are denoted by a pink line segment (See the ‘Identifying missing migration connections’ section below for an example).

Now instead of trying to edit or proofread the ms flag file directly, which can be tedious and error-prone, we can now visually check it much faster to see if it needs any changes.

Each population (symbol) in this graph is labelled by its population number in the ms flag file (the script generating the ms flag file creates population #1 in the upper left corner of the landscape and then numbers populations from left-to-right and top-to-bottom).

As we see in the graph above, these labels CAN be reflected and/or rotated compared to how they are defined in the ms flag file, as well as what we would expect based on the ASCII representation (compare the pattern of tiling between the first row of the graph above with that in the ASCII representation). Because the orientation of the graphs can shift compared to what we might expect, these population labels can be important to initially orient ourselves to how the landscape configuration has been plotted.

These population labels also make it easy to specify specific populations to edit (we will get to that soon).

Note: the landscape configuration for a given ms flag file should plot in the same orientation regardless of whether it is plotted with or without labels. This allows us to turn the labels on to determine how the landscape has been plotted and then turn the labels off if we need to verify details of the landscape configuration that the labels may obscure.

Let’s see how to turn these population labels off:

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_100_msFlagFile.txt", addPopnLabels = FALSE)

This makes it easier to see the five rows and two columns of hexagonal tiles that we simulated (each has a focal population in its center).

It is also apparent that the tiles are not exactly evenly shaped hexagons. This is an artifact of the physical model of springs and repelling magnets that the network plotter uses to determine the landscape configuration based only on the information about populations and their connections that is contained in the ms flag file. This underlying physical model is what makes network plotting so versatile here - it can correctly find the landscape configuration for pretty much any landscape specified in the ms flag file.

While we are working with the population numbers turned off, let us revisit the script that creates the ms flag file and look at its -g option. This flag creates ‘ring(s)’ of hexagonal tiles comprised only of ghost populations (i.e. the center population of the tile is a ghost population instead of a focal population) around the entire array of tiles that contain focal populations.

These rings of ghost population tiles act to enclose the full landscape of focal tiles in a border of additional ghost populations; this may better match the reality of population sampling on a landscape where the number of sampled populations is generally low compared to the species distribution. The number of ghost population tile rings to generate is specified by a number following the -g option. Let us encircle the landscape we have been using with one ring of ghost tiles using -g 1 and write the output directly to file:

./msLandscape_create_ms_flagFile.py -r 5 -c 2 -g 1 -o writeOut_withGhostRing

And now we plot the resulting landscape just like we did before (without population numbers):

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_withGhostRing_haploidSamples_ms_nsam_100_msFlagFile.txt", addPopnLabels = FALSE)

Alright! Even though using the rings of ghost population tiles is desireable in most situations, it can make the population numbering a bit cluttered, and so we will stick with our original landscape for the next stage of working with these ms flag files: ‘sculpting’ them! (Note: the sculpting process works exactly the same regardless of whether the landscape has rings of ghost population tiles or not.)

‘Sculpting’ and otherwise editing ms flag files

Now that we have a rectangular lanscape configuration, we can think of it like a sculptor approaches a block of stone - it is our raw material that starts larger than our desired landscape shape; we can remove populations and change the sampling of focal populations as we work to create a landscape configuration that better matches the shape we need.

Let’s see how this works.

First, we will re-plot our original landscape using the population labels:

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_100_msFlagFile.txt", addPopnLabels = TRUE)

Say we want to remove populations from the lower left and the upper right corners of the landscape, for example because in our study system those areas correspond to the ocean, and we know the organism we are simulating cannot live there. To remove populations we need to create a plain text file (e.g. using TextEdit on a Mac) that lists the populations that we want to remove by using each population’s number label from our landscape configuration graph, with one population per line. The population numbers can be given in any order. Note that this fully removes the specified populations from the landscape, including all of their migration connections.

For example, let us start with removing the focal populations #7 and #38, and the ghost population #12. Here are the contents of our file to do this:

12  
7  
38

We will save this as ‘msLandscapeDemo_populationsToRemove1.txt’ (Copies of each of the population editing files used in this tutorial are also in the msLandscape_toolboxScripts/data directory of the msLandscape GitHub repository.)

Now we use the Python script called ‘ghostLands_msFlagFileCleanUp.py’ to take care of editing the ms flag file for us, using the -f option to provide the name of the ms flag file we want to edit, the -e option to provide the name of the file containing the population numbers to remove that we just made, and the -p option to provide the original file of focal population coordinates (generated automatically by the script that made our original ms flag file). Here we go:

./msLandscape_cleanup_ms_flagFile.py -f writeOut_haploidSamples_ms_nsam_100_msFlagFile.txt -e ./data/msLandscapeDemo_populationsToRemove1.txt -p writeOut_popnCoordinatesFile.txt

## Removing populations from the ms flag file.
## The popnCoordsFile has been updated to remove the previous focal populations that are now specified as ghost populations after editing.

Alright, now let’s plot the new flag file to see what the landscape looks like now that we have removed those three populations. In the plotting call below, note that the file name of the ms flag file after this screening/editing has changed in two ways:

We removed two focal populations that each had 10 individuals sampled (by default; this can be changed using the -n option when creating our landscape), so our number of samples for ms is now 80 instead of 100 like it was before, and that is updated in the screened ms flag file name.
The ms flag file name now includes the word ‘screened’ along with the number of times it has been screened through the Python script (currently one). This prevents the screened ms flag file from overwriting the initial ms flag file. In case we removed more populations from the landscape than we wanted with this screening, we can easily go back to the original ms flag file and start again. As we will see below, this also acts as a version numbering system for each incremental screen of the ms flag file.

And let’s plot our newly screened ms flag file:

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_80_screened_1_time_msFlagFile.txt", addPopnLabels = TRUE)

Even though this looks a bit weird, it did what we wanted. Let us see how to verify this-

First, removing these populations caused the landscape configuration to be plotted as a mirror image compared to what we had before (i.e. population #1 is now in the upper left corner of the landscape as opposed to the upper right corner). The network plotter will sometimes do this and will also rotate the landscape compared to what we might expect (as it has also done here), which is why labelling the populations with their numbers can be important to orient ourselves.

Alright, now that we are oriented the next thing to note is that the screening script always keeps the population numbers sequential. However for it to do this when populations have been deleted, as in our case here, it has to change some of the population numbers compared to our original graph. Before we had 44 populations, but now because we removed 3 populations, the population numbering stops at 41 instead. For instance, comparing the previous landscape configuration with the current one (after mentally flipping it), we can see for instance that current population #40 is the same as population #43 before.

Now, let’s verify that the correct populations have been removed:

To start with, we see that previous population #38 has been removed (along with all of its migration connections) as we specified in our editing file. Now there is a ring of ghost populations (#30-32 and #37-39) around its previous location.
We also specified previous population #7 to be removed, which would have resulted in a similar ring of ghost populations, except that we also removed previous population #12, a ghost population that ‘anchored’ one side of this ring of ghost populations to the rest of the landscape. Without previous population #12, the remaining ghost populations (current populations #4 and #5) are only connected to current population #3, forming a line of ghost populations.
Let us also make sure the screening did what we expect to the file containing the coordinates of the focal populations. This file was automatically generated by the initial landscape construction script, and contains one line per focal population. The first entry for each line is the y coordinate for that population on the landscape, and the second entry is the x coordinate, so an entry ‘5 3’ would position the population at Cartesian coordinates (3,5). These coordinates are 1-based and positive, with (1,1) being in the bottom left corner of the landscape. Here is what the file looked like before the screening:

In order, these lines refer to locations of previous focal population numbers: #6, #7, #14, #15, #30, #31, #38, and #39. Because we removed focal populations #7 and #38, we would expect the second and the second to last entries of the original population coordinates to also be removed from the focal population coordinates file after screening. Let’s make sure that happened. Here is the file after screening (in the file named “writeOut_haploidSamples_ms_nsam_80_screened_1_time_popnCoordsFile.txt”):

Good, They were removed.

Now we will keep fine tuning the editing of the landscape:

We want to remove the ghost populations remaining in the top right and the bottom left corners of the landscape. So we will start making a new text file for populations to edit from this current landscape configuration and enter those ghost population numbers:

We save this file as ‘msLandscapeDemo_populationsToRemove2.txt’

Let’s remove these (Note - because this is the second screening, for it to work correctly we need to give the screening script both the ms flag file and the population coordinates file that have already been screened by it once - i.e. have ‘screened_1_time’ in their file names):

./msLandscape_cleanup_ms_flagFile.py -f writeOut_haploidSamples_ms_nsam_80_screened_1_time_msFlagFile.txt -e ./data/msLandscapeDemo_populationsToRemove2.txt -p writeOut_haploidSamples_ms_nsam_80_screened_1_time_popnCoordsFile.txt

## Removing populations from the ms flag file.
## The popnCoordsFile has been updated to remove the previous focal populations that are now specified as ghost populations after editing.

And then we will plot the new landscape configuration to see how it looks (Note - now we are plotting the ms flag file that has been screened twice - i.e. has ‘screened_2_times’ in its file name):

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_80_screened_2_times_msFlagFile.txt", addPopnLabels = TRUE)

Good, the ring and the the line of ghost populations have now been removed like we wanted (Note: the graph has also flipped back to its original orientation).

Changing the sampling of focal populations

We are pretty happy now with the overall shape of the landscape, but we still want to do two more things:

We want to remove the ghost populations #16 and #27 to clean up the edges of the landscape
We want to change the number of individuals that are sampled from some of the current focal populations.

We already know how to accomplish our first goal, and the good news is we can use the same population screening file and the same screening script to accomplish our second goal at the same time.

To do this, we start by creating another plain text file of populations to remove:

16
27

OK. Now we want to change the number of individuals sampled from population #9 from its current value of 10 individuals (the default) to 35 individuals. We do this by adding a line with two entries to the file to edit that we started above. These entries can be separated by a space or a tab. On this line, the first entry is the population number and the second entry is the new number of individuals we want sampled from that population, so in this case the line will be:

9 35

And we add that to our text file with populations to edit:

16
27
9 35

A note about this:

Entries for populations to delete and entries for populations to change the sampling can be interspersed in this file and they do not have to be in a particular order. For clarity and for the script to run correctly, there should be only one entry (line) per population (either deleting that population or changing the number of individuals sampled).

Changing the number of individuals sampled can also be used to turn a previous focal population into a ghost population by setting its new sampling to 0. Let’s suppose we do not want population #25 to be a focal population, but we still want it simulated as a ghost on the landscape with all of its migration connections. We will use the line:

25 0

in the editing file to turn it into a ghost population instead.

Let us also change the sampling of population #18 to 22 individuals.

Our final editing file then looks like this:

We then save this file as ‘msLandscapeDemo_populationsToRemove3.txt’

Let us run it (using the ‘…screened_2_times…’ ms flag file and population coordinates file with our new ‘msLandscapeDemo_populationsToRemove3.txt’ file) -

./msLandscape_cleanup_ms_flagFile.py -f writeOut_haploidSamples_ms_nsam_80_screened_2_times_msFlagFile.txt -e ./data/msLandscapeDemo_populationsToRemove3.txt -p writeOut_haploidSamples_ms_nsam_80_screened_2_times_popnCoordsFile.txt

## Removing populations from the ms flag file.
## The popnCoordsFile has been updated to remove the previous focal populations that are now specified as ghost populations after editing.

And then let’s plot the new landscape configuration to see how it looks (Note - now we are now plotting the ms flag file that has been screened three times):

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/writeOut_haploidSamples_ms_nsam_107_screened_3_times_msFlagFile.txt", addPopnLabels = TRUE)

Great! We can see two things immediately:

The size of the green dot marking focal populations #9 and #18 has changed in proportion to the new number of individuals sampled from each.
Current population #24 (previous population #25) is now a ghost population (has a small gray dot marking its location as opposed to a green dot) as we wanted.

Because we changed a previous focal population into a ghost population here, this is also updated in the population coordinates file (‘writeOut_haploidSamples_ms_nsam_107_screened_3_times_popnCoordsFile’), which now has seven rows (one for each remaining focal population) as we can see:

Identifying missing migration connections

As mentioned above, when there is only one migration connection specified between a pair of populations (uni-directional connection), instead of the usual two (bi-directional), then the network plotter represents each uni-directional connection with a pink line instad of with the gray line of a bi-directional connection. Uni-directional connections are usually a result of manually removing migration connections in the ms flag file but only deleting one of the two migration connections between each population pair. Note: uni-directional connections do not cause ms to fail, they just are not usually desired in the flag file.

Let’s look at this on a larger landscape using the included ms flag file: ‘msLandscape_missingConnections_haploidSamples_ms_nsam_160_msFlagFile.txt’

This file is located in the msLandscape_toolboxScripts/data directory of the msLandscape GitHub repository.

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/data/msLandscape_missingConnections_haploidSamples_ms_nsam_160_msFlagFile.txt",addPopnLabels = FALSE)

With the population labels turned off, we can easily see that there are two pairwise connections that are only specified by a single migration connection (denoted by the pink lines).

Now that we know there are missing migration connections, let us turn the population labelling on and see which populations have these uni-directional connections:

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/data/msLandscape_missingConnections_haploidSamples_ms_nsam_160_msFlagFile.txt",addPopnLabels = TRUE)

OK. Now we can see the uni-directional connections connect:

Populations #23 and #35
Populations #107 and #116

(Note: The labels show up a bit cramped on this small plot; when working with the graph interactively in R, you can make the plot window larger and that makes the labels clearer).

Let us see how these uni-directional connections look in the ms flag file that the network plot represents.

In the current flag file, these appear as:

... -m 35 23 3.0 ...

and

... -m 107 116 3.0 ...

Where each ‘-m’ flag represents one direction of the connection from population #35 to population #23 (top) and from population #107 to population #116 (bottom). These are the only times these pairs of population numbers occur after a ‘-m’ flag in this file. The ‘3.0’ is the specified migration rate for that connection, and the ‘…’ just implies that there are other contents in the file.

This is in contrast with a bi-directional migration connection, for example between populations #10 and #18:

... -m 10 18 3.0 -m 18 10 3.0 ...

Now we can see how these bi-directional connections are set up - they specify migration both from population #10 to population #18 as well as from population #18 to population #10. To fix the uni-directional migration connections, we would manually add the following two ‘-m’ flags to the flag file:

... -m 23 35 3.0 ...

and

... -m 116 107 3.0 ...

Saving plots directly to file

By default, the plots created by the msLandscape_networkPlotter function print to the screen. However it is also possible to save them directly to a file (in pdf format).

Let’s do that here:

msLandscape::msLandscape_networkPlotter(msFlagFileName = "~/msLandscape-master/msLandscape_toolboxScripts/data/msLandscape_missingConnections_haploidSamples_ms_nsam_160_msFlagFile.txt",addPopnLabels = FALSE, savePlotToFile = TRUE, outputFileName = "~/msLandscape-master/msLandscape_toolboxScripts/msLandscape_networkPlotter_autoSavedPlot.pdf", plotWidth = 6, plotHeight = 6)

## quartz_off_screen 
##                 2

This makes a plot that is six inches wide and six inches tall, and saves it to the file (or path + file) that we specified. (We do not worry about that quartz_off_screen message.)

Running the ms simulations

OK, now we will return to our ‘sculpted’ landscape and will use it to create the coalescent simulations with ms. msLandscape includes a small wrapper script around ms that enables the generation of multiple, independent simulations from the same ms flag file. Each of these independent simulations will return slightly different results due to the stochastic nature of coalescent simulations, but each is consistent with the landscape features specified by the ms flag file. In order to ensure each of these multiple simulations is independent, Python is used within the wrapper script to generate random number seeds that are used for each ms invocation.

The arguments to the msLandscape_ms_multipleSimulationWrapper.sh script are as follows (and are also printed by running the script with no input):

Usage: bash msLandscape_ms_multipleSimulationWrapper.sh <nsam> <howmany> <msFlagFile> <numItersToRun> <outputFileStem>

Where -

<nsam> is directly passed to ms and is the number of chromosomes to simulate. This is the number that appears after ‘ms_nsam_’ in the ms flag file name that we made above (e.g. our ms flag file is ‘writeOut_haploidSamples_ms_nsam_107_screened_3_times_msFlagFile.txt’ and therefore the <nsam> value is 107). This cannot be changed for any given ms flag file because it represents the specified sampling of chromosomes that is specified after the -I flag in the ms flag file.
<howmany> is directly passed to ms and is the number of markers (loci) to simulate. This can be any desired value.
<msFlagFile> is the name of our ms flag file, in this case ‘writeOut_haploidSamples_ms_nsam_107_screened_3_times_msFlagFile.txt’
<numItersToRun> is the number of independent simulation iterations that we want run with this ms flag file. This can be any desired value.
<outputFileStem> is the user entered file name prefix that will be used to construct the output file for each independent ms simulation. Files are named <outputFileStem>_Iter_##.msout where <outputFileStem> is replaced with its user entered value and ## is replaced by the iteration number for the given simulation output.

Note: The arguments must appear in exactly this order (separated by spaces) for the script to work correctly.

Now we can put this all together to create the simulations. Running these simulations is simple as long as ms is on our $PATH:

Note - the code block below does not run in the .Rmd file

$ bash msLandscape_ms_multipleSimulationWrapper.sh 107 100 writeOut_haploidSamples_ms_nsam_107_screened_3_times_msFlagFile.txt 3 msLandscape_msTrialSimulations
Output name is: msLandscape_msTrialSimulations_Iter_1.msout
Output name is: msLandscape_msTrialSimulations_Iter_2.msout
Output name is: msLandscape_msTrialSimulations_Iter_3.msout

That’s it! We have just made the landscape simulations from our ‘sculpted’ landscape in the ms flag file. Because we specified three independent simulations to be run (each with 100 markers), if we look in our directory after running this command, we will see the three new .msout files that contain the ms simulation results:

msLandscape_msTrialSimulations_Iter_1.msout
msLandscape_msTrialSimulations_Iter_2.msout
msLandscape_msTrialSimulations_Iter_3.msout

These ms simulation results can then be used for any downstream analysis, including conversion to the data formats required for:

smartPCA followed by unPC
EEMS
SpaceMix

That is accomplished by other scripts in the msLandscape toolbox and are demonstrated in the ‘msLandscape data conversion’ tutorial.

Limitations to what this workflow can do automatically

Although this work flow is versatile and should streamline building and proofing landscape-scale ms flag files, there are some important limitations to what it can do automatically:

In cases where the migration rate is varied across the landscape (e.g. migration barriers like the scenario we tested from Figure 2B), changes to the migration rate between each pair of affected populations need to be made manually. After manual editing (which can include the deletion of migration connections), the plotting script will automatically plot the revised landscape with the thickness of the line connecting each population pair being proportional to the specified migration rate. If only one of the two required -m flag entries for each population pair has been accidentally deleted during editing, that connection will be plotted as a pink line.
If additional flags are desired in the ms flag file in (e.g. the flags necessary to simulate long distance migration), then those need to be added manually (they can be anywhere before the -I flag (e.g. a -t flag or after the ‘0.0’ entry following the -I flag population sampling entries - see the next bullet). Both the plotter script and the screening script disregard any additional flags, meaning the screening script keeps any additional flags in its screened output and the plotter script only plots the landscape configuration as normal (it cannot plot long distance migration).
In the case of any manual editing, the ‘0.0’ entry in the ms flag file that immediately follows all -I flag population sampling entries must remain in that position. This enables correct parsing of other optional, manually added flags in addition to -m flags, like -es and -ej flags that simulate long distance migration.
Although the screening script can also change the number of individuals sampled from ghost populations, which therefore turns them into new focal populations, it is not smart enough to add the locations of these newly created focal populations to the population coordinates file. The coordinates file is important for analysis using un-PC, EEMS, or SpaceMix after running the ms simulations. If focal populations are newly created in this way, their coordinates will need to be manually added to this file, and the coordinates must be given in order within the file so that the coordinates for the top left population occurs first in the file, and the coordinates for the bottom right population occurs last in the file, and the coordinates are given in the order they occur on the landscape from left to right and top to bottom.
TIP If the plotter script fails to run on an ms flag file after manual editing, try running it through the screening script using only the -f option (i.e. requesting no edits). The screening script takes the ms flag file apart and re-assembles it in a standard way, which may resolve the problem.