# 7. Guide to individual tools¶

## 7.1. Options common to multiple tools¶

Several tools share the following parameters:

• Input, Output

Shape layers for input and output

Select fields for grade separation. If given, links will only be deemed connected if
1. they share the same endpoints (x,y,z)
2. they share the same grade separation value at those endpoints

Grade separation can also be used to emulate multilayer/multimode networks. For example, use values of 0,1,2… for grade separation on roads, 10,11,12… for railways, etc.

• Analysis Metric

Select metric for analysis: Euclidean, angular, custom, other (see Distance Metric for details).

Custom metrics require selection of a custom metric data field.

Some metrics provided are preset forms of hybrid metric designed for specific network types (pedestrian, vehicle, cycle, public transport). You can inspect the message output of sDNA to discover exactly what formula they use. Some presets include user variables, which have a default value but can be changed in advanced config; see preset metric variables.

• Weighting, Origin weight, Destination weight

Select Weighting (link, length or polyline) and the data to use. If origweightformula or destweightformula are supplied in advanced config, they override origin and destination weight.

Enter one or more radii to use as radius of analysis (see Radius for details). By default these are Euclidean and expressed in the spatial units of the data, which should usually be metres if your data is projected correctly. If entering several, separate them by commas. Radius ‘n’ can be used to specify global analysis, limited only by the size of the network (but computationally intensive). Example: 100, 200, 400, 800, 1600, n

• Continuous space

Select whether continuous space analysis is used.

If this mode is chosen, measures for each radius exclude links contained in the next smallest radius. This allows multivariate analysis across radii while keeping cross-correlation of measures to a minimum.

• Disable lines

Allows discarding of certain polylines from the analysis. This is useful for testing multiple design options, or for switching on and off parts of the network such as cycle paths.

This parameter takes an expression which, if it evaluates to a nonzero result for a given polyline, will disable that polyline. In the simplest case, this can be a field name such as walk_net (which would disable all lines for which the walk_net field is nonzero) or a combination such as walk_net||cycle_net (which would disable both the walking and cycling network - useful for a vehicle analysis).

• Origin Destination Matrix file

Optionally allows weights to be determined by an Origin-Destination (OD) matrix. The file must be formatted correctly, see Creating a zone table or matrix file. All geodesic and destination weights are replaced by values read from the matrix. The matrix is defined between sets of zones; polylines must contain fields to indicate their zone.

In the case of sDNA Integral, Two phase betweenness is disabled, because use of a two phase model for determining geodesic and destination weights conflicts with use of an OD matrix to determine these.

## 7.2. Individual tool details¶

### Preparation¶

#### Prepare network¶

Prepares spatial networks for analysis by checking and optionally repairing various kinds of error.

Note that the functions offered by sDNA prepare are only a small subset of those needed for preparing networks. A good understanding of Network Preparation is needed, and other (free) tools can complement sDNA Prepare.

The errors fixed by sDNA Prepare are:

• endpoint near misses (XY and Z tolerance specify how close a near miss)
• duplicate lines
• traffic islands (requires traffic island field set to 0 for no island and 1 for island). Traffic island lines are straightened; if doing so creates duplicate lines then these are removed.
• split links. Note that fixing split links is no longer necessary as of sDNA 3.0 so this is not done by default
• isolated systems

Optionally, numeric data can be preserved through a prepare operation by providing the desired field names, separated by commas, to the parameters Absolute data to preserve and Unit length data to preserve.

#### Individual Line Measures¶

Outputs connectivity, bearing, euclidean, angular and hybrid metrics for individual polylines.

This tool can be useful for checking and debugging spatial networks. In particular, connectivity output can reveal geometry errors.

### Analysis¶

#### Integral Analysis¶

sDNA Integral is the core analysis tool of sDNA. It computes several flow, accessibility, severance and efficiency measures on networks. Full details of the analysis are given in Analysis: what the results mean and Analysis: full specification.

Integral allows output of various groups of measures to be switched on and off.

#### Specific Origin Accessibility Maps¶

Outputs accessibility maps for specific origins, including metric between each origin-destination, Euclidean path length and absolute diversion (difference between Euclidean path length and crow flight path length, similar to circuity, notated here as ‘Div’).

The accessibility map tool also allows a list of origin polyline IDs to be supplied (separated by commas). Leave this parameter blank to output maps for all origins.

If outputting “maps” for multiple origins, these will be output in the same feature class as overlapping polylines. It may be necessary to split the result by origin link ID in order to display results correctly.

#### Integral from OD Matrix (assignment model)¶

A simplified version of sDNA Integral geared towards use of an external Origin Destination matrix. Note that several other tools (including Integral) allow Origin Destination matrix input as well.

The file must be formatted correctly, see Creating a zone table or matrix file. All geodesic and destination weights are replaced by values read from the matrix. The matrix is defined between sets of zones; polylines must contain text fields to indicate their zone.

#### Skim Matrix¶

Skim Matrix outputs a table of inter-zonal mean distance (as defined by whichever sDNA Metric is chosen), allowing high spatial resolution sDNA models of accessibility to be fed into existing zone-base transport models.

### Geometries¶

The geometry tools output individual geometries used in an integral analysis. These may be useful either for visualization, or for exporting to external analysis tools. For example, you could join geodesics to a pollution dataset to estimate exposure to pollution along everyday travel routes.

#### Convex Hulls¶

Outputs the convex hulls of network radii used in Integral Analysis.

The convex hulls tool also allows a list of origin polyline IDs to be supplied (separated by commas). Leave this parameter blank to output hulls for all origins.

#### Geodesics¶

Outputs the geodesics (shortest paths) used by Integral Analysis.

The geodesics tool also allows a list of origin and destination polyline IDs to be supplied (separated by commas). Leave the origin or destination parameter blank to output geodesics for all origins or destinations. (Caution: this can produce a very large amount of data).

Outputs the network radii used in Integral Analysis.

The network radii tool also allows a list of origin polyline IDs to be supplied (separated by commas). Leave this parameter blank to output radii for all origins.

### Calibration¶

sDNA Learn and Predict provide a way to calibrate sDNA outputs against measured variables (flows, house prices, etc). Currently they offer bivariate regression with Box-Cox transformation. Multiple predictor variables (the outputs of sDNA) can be tested to see which gives the best cross-validated correlation with the target variable.

#### Learn¶

sDNA Learn selects the best model for predicting a target variable, then computes GEH and cross-validated $$R^2$$. If an output model file is set, the best model is saved and can be applied to fresh data using sDNA Predict.

Available methods for finding models are:

• Single best variable - performs bivariate regression of target against all variables and picks single predictor with best cross-validated fit
• Multiple variables - regularized multivariate lasso regression
• All variables - regularized multivariate ridge regression (may not use all variables, but will usually use more than lasso regression)

Candidate predictor variables can either be entered as field names separated by commas, or alternatively as a regular expression. The latter follows Python regex syntax. A wildcard is expressed as .*, thus, Bt.* would test all Betweenness variables (which in abbreviated form begin with Bt) for correlation with the target.

Box-Cox transformations can be disabled, and the parameters for cross-validation can be changed.

Weighting lambda weights data points by $$\frac{y^\lambda}{y}$$, where $$y$$ is the target variable. Setting to 1 gives unweighted regression. Setting to around 0.7 can encourage selection of a model with better GEH statistic, when used with traffic count data. Setting to 0 is somewhat analagous to using a log link function to handle Poisson distributed residuals, while preserving the model structure as a linear sum of predictors. Depending on what you read, the literature can treat traffic count data as either normally or Poisson distributed, so something in between the two is probably safest.

Ridge and Lasso regression can cope with multicollinear predictor variables, as is common in spatial network models. The techniques can be interpreted as frequentist (adding a penalty term to prevent overfit); Bayesian (imposing a hyperprior on coefficient values); or a mild form of entropy maximization (that limits itself in the case of overspecified models). More generally it’s a machine learning technique that is tuned using cross-validation. The $$r^2$$ values reported by learn are always cross-validated, giving a built-in test of effectiveness in making predictions.

Regularization Lambda allows manual input of the minimum and maximum values for regularization parameter $$\lambda$$ in ridge and lasso regression. Enter two values separated by a comma. If this field is left blank, the software attempts to guess a suitable range, but is not always correct. If you are familiar with the theory of regularized regression you may wish to inpect a plot of cross validated $$r^2$$ against $$\lambda$$ to see what is going on. The data to do this is saved with the output model file (if specified), with extension .regcurve.csv.

#### Predict¶

Predict takes an output model file from sDNA Learn, and applies it to fresh data. For example, suppose we wish to calibrate a traffic model, using measured traffic flows at a small number of points on the network.

• First run a Betweenness analysis at a number of radii using Integral Analysis.
• Use a GIS spatial join to join Betweenness variables (the output of Integral) to the measured traffic flows.
• Run Learn on the joined data to select the best variable for predicting flows (where measured).
• Run Predict on the output of Integral to estimate traffic flow for all unmeasured polylines.

## 7.3. Advanced configuration and command line options¶

sDNA supports a wide variety of options for customizing the analysis beyond what is shown in the user interface. All of these are accessed through the advanced config system.

Advanced config options are specified in a long string with options separated by semicolons (;) like this:

nohull;probroutethreshold=1.2;skipzeroweightorigins


This is an example of an advanced config for sDNA Integral, which means

• Don’t compute convex hull
• Problem route threshold = 1.2
• Skip zero weight origins

When calling sDNA Integral Analysis and Prepare Network from the command line (Using sDNA from the command line), the entire configuration is specified as an advanced config. Therefore, the advanced config options include some which are usually set via the graphical interface. If these options are given as advanced config in the sDNA graphical interface, an error (“Keyword specified multiple times”) will result.

### Advanced config options for sDNA Prepare¶

Option Description
startelev= Name of field to read start elevation from
endelev= Name of field to read end elevation from
island= Name of field to read traffic island information from. Anything other than zero will be treated as traffic island
islandfieldstozero= Specifies additional data fields to set to zero when fixing traffic islands (used for e.g. origin or destination weights)
data_unitlength= Specifies numeric data to be preserved by sDNA prepare (preserves values per unit length, averages when merging links)
data_absolute= Specifies numeric data to be preserved by sDNA prepare (preserves absolute values, sums when merging links)
data_text= Specifies text data to be preserved (merges if identical, concatenates with semicolon otherwise)
xytol= Manual override xy tolerance for fixing endpoint connectivity
ztol= Manual override z tolerance for fixing endpoint connectivity
merge_if_identical= Specifies data fields which can only be merged if identical, i.e. split links will not be fixed if they differ (similar to ‘dissolve’ GIS operation)

xytol and ztol are manual overrides for tolerance. sDNA, running on geodatabases from command line or ArcGIS, will read tolerance values from each feature class as appropriate. sDNA running in QGIS or on shapefiles will use a default tolerance of 0, as shapefiles do not store tolerance information:- manual override is necessary to fix tolerance on shapefiles.

### Advanced config options for sDNA Integral and geometry tools¶

sDNA Convex Hulls, Network Radii, Geodesics and Accessibility Map are all different interfaces applied to sDNA Integral, so will in some cases accept these options as well.

Option Default Description
startelev=   Name of field to read start elevation from
endelev=   Name of field to read end elevation from
metric= angular Metric – angular, euclidean, custom or one of the presets
startelev=   Name of field to read start elevation from
endelev=   Name of field to read end elevation from
origweight=   Name of field to read origin weight from
destweight=   Name of field to read destination weight from
origweightformula=   Expression for origin weight (overrides origweight)
destweightformula=   Expression for destination weight (overrides destweight)
weight=   Name of field to read weight from. Applies weight field to both origins and destinations.
zonesums=   Expressions to sum over zones (see zone sums below)
lenwt   Specifies that weight field is per unit length
custommetric=   Specified field name to read custom metric from
xytol=   Manual override xy tolerance for fixing endpoint connectivity.
ztol=   Manual override z tolerance for fixing endpoint connectivity.
outputgeodesics   Output geometry of all pairwise geodesics in analysis (careful – this can create a lot of data)
outputdestinations   Output geometry of all pairwise destinations in analysis (careful – this can create a lot of data). Useful in combination with origins for creating a map of distance/metric from a given origin.
outputhulls   Output geometry of all convex hulls in analysis
origins=   Only compute selected origins (provide feature IDs as comma separated list). Useful in conjunction with outputgeodesicsm, outputdestinations, outputhulls, outputnetradii.
destinations=   Only compute selected destinations (ditto)
nonetdata   Don’t output any network data (used in conjunction with geometry outputs)
pre=   Prefix text of your choice to output column names
post=   Postfix text of your choice to output column names
nobetweenness   Don’t calculate betweenness (saves a lot of time)
nojunctions   Don’t calculate junction measures (saves time)
nohull   Don’t calculate convex hull measures (saves time)
outputsums   Output sum measures SAD, SCF etc as well as means MAD, MCF etc.
probroutes   Output measures of problem routes – routes which exceed the length of the radius
forcecontorigin   Force origin link to be handled in continuous space, even in a discrete analysis. Prevents odd results on very long links.
nqpdn= 1 Custom numerator power for NQPD equation
nqpdd= 1 Custom denominator power for NQPD equation
skipzeroweightorigins   Skips calculation of any output measures for origins with zero weight. Saves a lot of time if many such origins exist.
skipzeroweightdestinations 1 Zero weight destinations are skipped by default. Note this will exclude them from geometry outputs; if this is not desired behaviour then set skipzeroweightdestinations=0
skiporiginifzero=   Specified field name. If this field is zero, the origin will be skipped. Allows full customization of skipping origins.
skipfraction= 1 Set to value n, skips calculation for (n-1)/n origins. Effectively the increment value when looping over origins.
skipmod= 0 Chooses which origins are calculated if skipfraction?1. Effectively the initial value when looping over origins: every skipfractionth origin is computed starting with the skipmodth one.
nostrictnetworkcut   Don’t constrain geodesics to stay within radius. This will create a lot more ‘problem routes’. Only alters behaviour of betweenness measures (not closeness).
probrouteaction= ignore Take special action for problem routes that exceed the radius by a factor greater than probroutethreshold. Can be set to ignore, discard or reroute. Reroute changes geodesic to shortest Euclidean path. Only alters betweenness output, not closeness.
probroutethreshold= 1.2 Threshold over which probrouteaction is taken. Note this does not affect computation of probroutes measures, which report on all routes which exceed the radius length regardless of this setting.
outputdecomposableonly   output only measures which are decomposable i.e. can be summed over different origins (useful for parallelization)
linkcentretype= Angular for angular analysis, Euclidean otherwise Override link centre types – angular or Euclidean
lineformula=   Formula for line metric in hybrid analysis (see below)
juncformula= 0 Formula for junction turn metric in hybrid analysis (see below)
bidir   Output betweenness for each direction separately
oneway=   Specified field name to read one way data from (see note 1 below)
vertoneway=   Specified field name to read vertical one way data from (see note 1 below)
oversample= 1 Number of times to run the analysis; results given are the mean of all runs. Useful for sampling hybrid metrics with random components.
odmatrix   Read OD matrix from input tables (a 2d table must be present)
zonedist= euc Set expression to determine how zone weights are distributed over links in each zone, or 0 to skip distribution (all lines receive entire zone weight)
intermediates=   Set expression for intermediate link filter. Geodesics are discarded unless they pass through link where expression is nonzero.
disable=   Set expression to switch off links (links switched off when expression evaluates nonzero)
outputskim   Output skim matrix file
skimorigzone   Origin zone field (must be text) for skim matrix
skimdestzone   Destination zone field (must be text) for skim matrix
skimzone   Skim matrix zone field for both origin and destination (sets both skimorigzone and skimdestzone)
datatokeep=   List of field names for data to copy to output

### Preset metric variables¶

A number of preset metrics are provided. These are special cases of hybrid metrics, sometimes with a fairly complex formula. To inspect the formula for a given metric, run Individual Line Measures with the metric selected, and inspect the message output where the full formula will be shown.

The CYCLE_ROUNDTRIP metric, as the name implies, measures a round trip to take account of hills in both directions.

Certain variables within the preset metric formulae can be changed by assigning to them in advanced config. To date, the list is:

Metric Variable Default Meaning
Cycle aadtfield aadt Name of data field containing annual average daily vehicle traffic estimate
Cycle t 0.04 Tendency of cyclists to avoid vehicle traffic
Cycle a 0.3 Tendency of cyclists to avoid angular change
Cycle s 0.5 Tendency of cyclists to avoid slope

### Interpretation of one way data¶

One way data is interpreted as follows:

• 0 – traversal allowed in both directions (so long as vertoneway allows this too)
• positive number – forward traversal only
• negative number – backward traversal only

Forwards/backwards are taken with respect to the direction in which the link is drawn in the network (ordering of points in the data).

Vertical one way data is interpreted as follows:

• 0 – traversal allowed in both directions (so long as oneway allows this too)
• positive number – upward traversal only
• negative number – downward traversal only

Upward/downward are deduced by measuring the endpoints of the link only. In the event that these have the same elevation/height and this leads to ambiguity, sDNA will print an error message and exit.

If conflicting oneway and vertoneway data are provided, sDNA will print an error message and exit. Note that if either field is zero, the other is permitted to override it without conflict.

### Creating a zone table or matrix file¶

sDNA can read custom zone data, that is, data attached to zones rather than individual lines in the network. This can come from

• one-dimensional zone tables: provide the zone files to sDNA’s inputs, and then reference the variables in expressions in the same way as you would use network data. This performs a function similar to a database join, to link zonal data to individual polylines. See Zone Data and Zone Sums.
• a custom origin-destination (OD) matrix: provide sDNA with a two-dimensional table and it will override all other weights

One dimensional tables can be provided in list format, and two dimensional tables can be provided in list or matrix format. The list format allows for sparse data, that is, data need not be given for all zones, and is assumed to be zero where not given.

All tables must be saved in CSV (comma separated) format.

#### 1d table in list format¶

 list 1 zone houses jobs schools westeros 2000 4000 4 royston vasey 1800 7 1 mordor 600 10000 0 narnia 2100 500 3

A 1d table in list format must have

• list and 1 in the header row
• zone field name and data names in the second row. The network must contain a text field with name matching the zone field name (in this case “zone”)
• zones and data below

#### 2d table in list format¶

 list 2 zone zone flow shire rivendell 4 rivendell mordor 6 mordor shire 2 gondor mordor 10000 mordor gondor 5000

A 2d table in list format must have

• list and 2 in the header row
• origin and destination zone field names followed by data names in the second row. The network must contain a text field with name matching the zone field names. In this case, the origin and destination zones are drawn from the same set so these are both named “zone”. Different sets of zones for origin and destination are supported however (e.g. for use with census residential and workplace zones).
• zones and data below

#### 2d table in matrix format¶

 matrix zone zone flow shire rivendell gondor mordor shire 0 4 0 0 rivendell 0 0 0 6 gondor 0 0 0 10000 mordor 2 0 5000 0

This table shows the same data as the 2d table in list format above

A 2d table in matrix format must have

• matrix in the first line followed by the origin zone field name then the destination zone field name. The network must contain a text field with name matching the zone field names. In this case, the origin and destination zones are drawn from the same set so these are both named “zone”. Different sets of zones for origin and destination are supported however (as with 2d list tables above).
• the second row starts with the name of the data, then the name of each destination zone
• the left column from row 3 downwards contains the name of each origin zone
• the remainder of the matrix contains the data

### Zone Data and Zone Sums¶

Zone data is accessed in the same way as field data in expressions, described below. The following computes origin weights by multiplying zoneweight (taken from a table provided to sDNA) with the euclidean length of each polyline:

origweightformula = zoneweight * euc


Using zonesums in sDNA Integral’s advanced config, it is possible to sum data over network zones. This is useful for controlling how zonal weights are distributed over polylines. The following example

• gives an example of how to use multiple zone schemes. It assumes two zonal variables are provided; residential_weight is defined for each zone in res_zone, and retail_weight is defined for each zone in ret_zone. In each case, the zone file will specify the fieldname which tells sDNA which zone each polyline belongs to.

• gives an example of how to compute multiple zone sums. These are specified in the form sum1=expr1@zonefield1,sum2=expr2@zonefield2,…. The config creates two zone sum fields, eucsum which is the total Euclidean length in each residential zone (res_zone), and linksum which is the total link count in each retail zone (ret_zone).

• gives an example of how to distribute zonal weights over the zones. origweightformula distributes the residential_weight zonal variable evenly over network length in each residential zone, while destweightformula distributes the retail_weight zonal variable evenly over links in each retail zone. (Note that polylines may constitute partial links, hence the use of FULLlf):

zonesums = eucsum=euc@origzonefield, linksum=FULLlf@destzonefield; origweightformula = residential_weight*proportion(euc,eucsum); destweightformula = retail_weight*proportion(FULLlf,linksum)


The proportion(x,y) function divides x by y, which is useful to work out what proportion of zone weight is found in the current link. It correctly handles the special cases where the zone contains no weight.

Note that origweightformula and destweightformula are always computed in discrete, rather than continuous space.

### Expression reference¶

Operator (in reverse order of precedence) Name Example Meaning
, Statement separator a,b,c Do a, then b, then output c
= Assignment _a=b Set _a equal to b
?: If-then-else p?x:y If p then x else y
&& Logical and a&&b a and b
|| Logical or a||b a or b
<= Less than or equal a<=b a is less than or equal to b
>= Greater than or equal a>=b a is greater than or equal to b
!= Not equal a!=b a is not equal to b
== Equal a==b a is equal to b
> Greater than a>b a is greater than b
< Less than a<b a is less than b
+ Addition a+b a plus b
- Subtraction a-b a minus b
* Multiplication a*b a times b
/ Division a/b a divided by b
^ Exponentiation a^b a to the power of b
() Parentheses 2*(x+1) add one to x then multiply by 2
Builtin functions

sin(x), cos(x), tan(x)

asin(x), acos(x), atan(x)

sinh(x), cosh(x), tanh(x)

asinh(x), acosh(x), atanh(x)

Trigonometric functions of x (in radians).
log2(x) Logarithm of x base 2
log10(x), log(x) Logarithm of x base 10
ln(x) Logarithm of x base e
exp(x) e to the power of x
sqrt(x) Square root of x
sign(x) -1 if x is negative, else 1
rint(x) x rounded to nearest integer
abs(x) Absolute value of x

min(a,b,c,…)

max(a,b,c,…)

sum(a,b,c,…)

avg(a,b,c,…)

Minimum, maximum, sum and average of all arguments
trunc(x,l,u) Truncate x to the range [l,u] (including endpoints)
randnorm(m,s) Random number drawn from normal distribution with mean m and standard deviation s
randuni(l,u) Random number drawn from uniform distribution on range [l,u]
proportion(x,y) Divides x by y. Returns 0 if x=y=0 and stops calculation with error if x>0 and y=0. Useful for distributing zonal weights over links.

Random numbers are generated from Mersenne Twister mt19937 algorithm. “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator”, Makoto Matsumoto and Takuji Nishimura, ACM Transactions on Modeling and Computer Simulation: Special Issue on Uniform Random Number Generation, Vol. 8, No. 1, January 1998, pp. 3-30.

Constants
inf infinity
pi pi
Variables
ang Angular change
euc Euclidean distance
hg Height gain
hl Height loss
FULLang Angular change for entire polyline
FULLeuc Euclidean distance for entire polyline
FULLhg Height gain for entire polyline
FULLhl Height loss for entire polyline
FULLlf Link fraction for entire polyline
_x (where x is any name): Temporary variable (initialized to 0)
x (where x is any name not used as function or other value): field data on polyline

Any variable can be assigned to with =, but the new value will only affect the current formula being evaluated (assigning to ang will not change the shape of the network, for example!). It is recommended to use only temporary variables of the form _x as targets for assignment.