# Welcome to SSP Solver’s documentation!¶

# SSP Solver¶

*Markov decision processes* (MDPs) are stochastic systems widely used to model both stochastic and nondeterministic situations, requiring to optimise cost to achieve a goal (e.g., an agent interacting with its environment).
An MDP consists of states, actions and probability distributions formed by the current state of the system and the action selected.
*Strategies* for MDPs prescribe which actions to choose in each state, according to or not the previously visited states in the system.

SSP Solver is a tool written in `python3`

that synthesises strategies for different Stochastic Shortest Path problems
in MDPs following a specification of the system in the form of a `yaml`

file.
The algorithms implemented are based on those presented in “Variation on the stochastic shortest path problem”.

## Setup¶

SSP Solver has many dependencies and you can install them with pip:
`pip3 install -r requirements.txt`

Note that GLPK is also required.

## System specification¶

You can encode your systems in a `yaml`

file with the following syntax:

```
mdp:
states:
# for each state s of the system
- name: <name of a state s>
enabled actions:
# for each enabled action α of the state s
- name: <name of an enabled action α of s>
transitions:
# for each transition s ➞ α ➞ s'
- target: <name of an α-successor s' of s>
probability: <probability to go from s to s' by choosing α>
actions:
# for each action in the system
- name: <name of an action α>
weight: <weight of α> # strictly positive weight
```

You can find some examples here.

## Strategy synthesis¶

The MDPs are in a configuration where actions have strictly positive costs. The solver is able to synthesise strategies optimising

- the probability of reaching a set of target states (reachability problem)
- the expected cost of reaching a set of target states (Stochastic Shortest Path Expectation problem)
- the probability of reaching a set of target states with a cost bounded (Stochastic Shortest Path Percentile problem)

The algorithms synthesising these strategies make use of *linear programming*.

### Reachability problem¶

The reachability solver allows to synthesise a strategy maximising the probability of reaching a subset a target states.

#### Usage¶

```
python3 solvers/reachability.py <yaml_file> t1 t2 ... tn
```

where T = {t1, t1, …, tn} is the set of target states.

#### Example¶

For example, consider our example MDP, we are interested in synthesising a strategy that maximises the probability of reaching the state v. We run

```
python3 solvers/reachability.py examples/mdp2.yaml v
```

The output is the **value** of each state, i.e., the max. probability of reaching the target states:

```
Optimal solution :
v[s] = 0.0568182 v[t] = 0
v[u] = 0.125 v[v] = 1
v[w] = 0.0454545 v[x] = 0
v[y] = 0
```

Moreover, a representation of the strategy (depicted in red) is generated:

### Stochastic shortest path: Expectation problem¶

The sspe solver allows to synthesise a strategy minimising the expected cost to a subset of target states

#### Usage¶

```
python3 solvers/sspe.py <yaml_file> t1 t2 ... tn
```

where T = {t1, t1, …, tn} is the set of target states.

#### Example¶

For example, consider our example MDP, we are interested in synthesising a strategy that minimises the expectation to reach the state t. We run

```
python3 solvers/sspe.py examples/mdp2.yaml t
```

The output is the **value** of each state, i.e., the min. expected cost to reach the target states:

```
Optimal solution :
v[s] = 19 v[t] = 0
v[u] = 24 v[v] = inf
v[w] = 24 v[x] = inf
v[y] = inf
```

Moreover, a representation of the strategy (depicted in red) is generated:

### Stochastic shortest path: percentile Problem¶

The sspp solver allows to synthesise a strategy maximising the probability of reaching a subset of target states from a given state with a fixed cost bounded.

#### Usage¶

```
python3 solvers/sspp.py <yaml_file> s0 l b t1 t2 ... tn
```

where s0 is the initial state, l is the cost threshold, b is the probability threshold and T = {t1, t2, …, tn} is the set of target states. Thus, the strategy synthesised satisfies Prmax(s0 ⊨ ♢T | cost of paths <= l) >= b. Note that if you don’t want to involve the probability threshold b into the synthesis, just set it to 0.

#### Example¶

For example, consider the following MDP.

We are interested in synthesising a strategy that maximises the probability of reaching the state t within a cost of 8 from the state s. We run

```
python3 solvers/sspp.py examples/simple_mdp.yaml s 8 0 t
```

The output is the **value** of each state, i.e., the max. probability of reaching the target states within a cost of 8:

```
v[(s, 0)] = 0.75 v[(t, 3)] = 1
v[(u, 3)] = 0.5 v[(u, 8)] = 0
v[(s, 5)] = 0.5 v[(t, 8)] = 1
v[⊥] = 0
```

Note that each state considered by the strategy is a tuple formed by the current state of the system and the current cost of paths. The state ⊥ represents the state for which the cost of the current path has exceeded 8. That means that the strategy records the current cost of paths of the execution. A representation of the strategy (depicted in red) is then generated: