Skip to content

Latest commit

 

History

History
555 lines (400 loc) · 15.8 KB

README.md

File metadata and controls

555 lines (400 loc) · 15.8 KB

Hands on with Sat4j @ SAT+SMT 2019

Here are the files needed to experiment a few features of Sat4j.

Researchers and seats

Can we sit m researchers on m-1 seats?

More precisely, we consider that

  • Each researcher should have a seat
  • Each seat cannot host more than a researcher

CNF encoding

CNF encoding has to be done using the Dimacs format.

Each variable is represented by a number. Each literal is represented by a signed number. A clause is represented by a sequence of signed integer terminated by 0.

$x_1 \lor x_2 \lor \lnot x_3$ is represented in Dimacs by 1 2 -3 0.

It is also mandatory to provide in the header of the file the number of variables and the number of clauses.

p cnf <nbvars> <nbclauses>

To represent the simple formula $(x_1 \lor x_2 \lor \lnot x_3) \land (\lnot x_1 \lor x_2) \land (x_2 \lor x_3) \land \lnot x_2$ with 3 variables and 4 clauses, we can use the following Dimacs file:

p cnf 3 4
1 2 -3 0
-1 2 0
2 3 0
-2

Such files are not meant to be generated by hand, but generated using a script.

See this python's script to generate the CNF in Dimacs format for the researchers and seats problem.

To generate some instances of the problem for various m, we can use the script that way:

$ ./sittingresearchers.py >sitting4.cnf
$ ./sittingresearchers.py 10 >sitting10.cnf
$ ./sittingresearchers.py 20 >sitting20.cnf

Those formulas are tiny for current SAT solvers:

$head -1 sitting*.cnf
==> sitting10.cnf <==
p cnf 90 415

==> sitting20.cnf <==
p cnf 380 3630

==> sitting4.cnf <==
p cnf 12 22

However, the last one is already out of reach for modern SAT solvers without specific counting capabilities.

To run Sat4j on those problems:

$ java -jar org.sat4j.core.jar sitting4.cnf
$ java -jar org.sat4j.core.jar sitting10.cnf

CNF encoding using the opb format

The same encoding can be provided to PB solvers. Each clause is simply translated to the corresponding PB constraint, without coefficient, and a threshold of 1.

The following Python's script does this encoding for you.

$ ./sittingresearcherscnfopb.py >sitting4cnf.opb
$ ./sittingresearcherscnfopb.py 10 >sitting10cnf.opb
$ ./sittingresearcherscnfopb.py 20 >sitting20cnf.opb

To run Sat4j on those problems:

$ java -jar sat4j-pb.jar sitting4cnf.opb
$ java -jar sat4j-pb.jar sitting10cnf.opb
$ java -jar sat4j-pb.jar CuttingPlanes sitting20cnf.opb

You can check that CuttingPlanes proof system as implemented in Sat4j (Generalized Resolution) does not help in this case.

Pseudo-Boolean encoding (with PB constraints)

The pseudo-Boolean encoding can be done using the PB Evaluation format.

It is similar in spirit with the Dimacs format.

The header has to provide the number of variables and the constraints found in the file.

A variable is represented by xI where I is an integer.

Sat4j supports the use of literals, using ~xI notation. This is useful for representing clauses in the OPB format.

Else one has to use $\lnot x_1 = 1 - x_1$ in the OPB context.

A PB constraint has signed coefficients in front of the variables (or literals) in the LHS.

The competition requires the inequality to be >=, but Sat4j supports also <= and =.

The threshold of the constraint is a signed integer.

The constraint is terminated by ;.

So $x_1 \lor x_2 \lor \lnot x_3$ is represented by the PB constraint $x_1 + x_2 + \overline{x_3} \geq 1$. In OPB format, it will be represented by

+1 x_1 +1 x_2 +1 ~x_3 >= 1;

in Sat4j format, or by

+1 x_1 +1 x_2 -1 x_3 >= 0;

in the strict evaluation format.

To represent the simple formula $3x_1 + 2x_2 + \overline{x_3} \geq 3 \land x_1 + x_2 + x_3 <= 1$, Sat4j can read the following opb file:

* variables= 3 *constraints= 2
+3 x_1 +2 x_2 +1 ~x_3 >= 3;
+1 x_1 +1 x_2 +1 x_3 <= 1;

which is equivalent to the following one in the strict PB evaluation format:

* variable= 3 *constraint= 2
+3 x_1 +2 x_2 -1 x_3 >= 2;
-1 x_1 -1 x_2 -1 x_3 >= -1;

See this python's script to generate the PB constraints in Sat4j PB format for the researchers and seats problem.

To generate some instances of the problem for various m, we can use the script that way:

$ ./sittingresearchersopb.py >sitting4.opb
$ ./sittingresearchersopb.py 10 >sitting10.opb
$ ./sittingresearchersopb.py 20 >sitting20.opb

Those formulas are tiny for current PB solvers, and much more compact than the CNF encoding.

$head -1 sitting*.opb
==> sitting10.opb <==
* #variable= 90 #constraint= 19

==> sitting20.opb <==
* #variable= 380 #constraint= 39

==> sitting4.opb <==
* #variable= 12 #constraint= 7

To run Sat4j on those problems:

$ java -jar sat4j-pb.jar sitting4.opb
$ java -jar sat4j-pb.jar sitting10.opb
$ java -jar sat4j-pb.jar CuttingPlanes sitting20.opb

By default, the resolution proof system is used. Only the solver using the CuttingPlanes proof system can solve quickly the problem with m=20.

Retrieving cardinality constraints

Sat4j is able to retrieve cardinality constraints semantically from a CNF input.

You can check that it allows the CuttingPlanes solver to solve quickly the CNF encoding of our problem for m=20.

$ java -jar sat4j-pb.jar DetectCards sitting20.cnf 

When looking at the output of Sat4j, one can see the 3610 clauses have been replaced by 19 cardinality constraints.

c #constraints  20
c constraints type 
c org.sat4j.pb.constraints.pb.OriginalHTClausePB => 20
c 20 constraints processed.
c launching cardinality constraint revelation process
c remaining constraints: 0/3610
c cardinality constraints found (preprocessing): 19
c cardinality search time (preprocessing): 82ms
c found 19 at-most cardinality constraint of degree 1 and size 20
c solver contains 39 constraints

How to produce Sat4j search trace

Sat4j can provide for small examples the trace of the search performed to get the solution.

The trace is provided as a dot file which can then by rendered using graphviz.

The following commands allow to produce the traces for the small m=4 example with the default proof system (Resolution) and the cutting planes proof system (Hooker's generalized resolution).

$ java -jar sat4j-sat.jar -d sitting4res.dot sitting4.cnf
$ java -jar sat4j-sat.jar -d sitting4cp.dot -s CuttingPlanes sitting4.opb

The traces have been rendered as image for your convenience:

Sat4j on the fly visualization and control

To visualize some information about the solver internal state and be able to manually change the settings of the solver on the fly, you can run the following command to launch Sat4j ``remote control'':

$ java -jar sat4j-sat.jar -remote -r sitting20.cnf 

Your turn: sudoku

In the previous exercises, you saw the file format for CNF or OPB files.

Sudoku is a simple puzzle with an n^2xn^2 grid where all the cells should contain one of 1 to n^2 mark, and each row, column, and nxn block should contain all the marks.

So the following constraints have to be expressed:

  • each cell contain exactly one value between 1 and n^2
  • each value appears only once in each row
  • each value appears only once in each column
  • each value appears only once in each nxn block

This puzzle can be expressed with Boolean variables $x_{rcv}$ meaning that in row r, column c, the value is v.

Case 4x4 (toy one)

Here is an example of filled in 4x4 sudoku:

2 3  1 4
1 4  3 2

4 1  2 3
3 2  4 1

We need thus 4x4x4 = 64 variables to model this puzzle.

However, it is probably easier to consider using variables numbered 111 to 444. As such, you will have to declare 444 variables in your OPB file header.

Check that solving the constraints without hints gives you a valid 4x4 sudoku. You can use the following script to interpret and display the solution given by the SAT solver.

$ java -jar sat4j-pb.jar sudoku4.opb | ./decodesudoku4.py 
32 14
14 32

21 43
43 21

Case 9x9 (usual one)

Solve the following sudoku 9x9 problem using either clauses or cardinality constraints:

We will use the following convention to make it easy to check the constraints and be able to interpret the solution of the solver.

Each variable will be numbered 111 to 999 with the following meaning: each number rcv will represent as first digit the row, as second digit the column and third digit the value.

So variable 123 will mean: in row 1, column 2, the value is 3.

You will need to add at the end of the file the hints:

CNF:

111 0
193 0
237 0
242 0
256 0
274 0
288 0
314 0
349 0
353 0
365 0
396 0
423 0
444 0
458 0
472 0
524 0
531 0
546 0
569 0
573 0
636 0
678 0
689 0
715 0
727 0
738 0
754 0
792 0
843 0
887 0
912 0
995 0

OPB:

+1 x111 >= 1;
+1 x193 >= 1;
+1 x237 >= 1;
+1 x242 >= 1;
+1 x256 >= 1;
+1 x274 >= 1;
+1 x288 >= 1;
+1 x314 >= 1;
+1 x349 >= 1;
+1 x353 >= 1;
+1 x365 >= 1;
+1 x396 >= 1;
+1 x423 >= 1;
+1 x444 >= 1;
+1 x458 >= 1;
+1 x472 >= 1;
+1 x524 >= 1;
+1 x531 >= 1;
+1 x546 >= 1;
+1 x569 >= 1;
+1 x573 >= 1;
+1 x636 >= 1;
+1 x678 >= 1;
+1 x689 >= 1;
+1 x715 >= 1;
+1 x727 >= 1;
+1 x738 >= 1;
+1 x754 >= 1;
+1 x792 >= 1;
+1 x843 >= 1;
+1 x887 >= 1;
+1 x912 >= 1;
+1 x995 >= 1;

Do not forget to add those 33 additional constraints in the header of the file.

Once you have generated your CNF or OPB file, you can check if you solve that puzzle by decoding the solution returned by the SAT solver using that script.

Playing with PB, PBO and MAXSAT

Subset sum

Subset sum is an NP-complete problem.

Some examples of subset sum instances can be found here.

Is it possible to find a subset of 15,22,14,26,32,9,16,8 which sums to 53?

Is it possible to find s subset of 267,493,869,961,1000,1153,1246,1598,1766,1922 which sums to 5842?

Note that such problem can be expressed using a single constraint in Sat4j OPB format.

Knapsack problem

Another famous combinatorial problem is the Knapsack problem.

Given some items with a specific weight and specific value, find the way to pack your items in your luggage with a given weight capacity to maximize the value of the items.

You can find some instances of this problem [here)(https://people.sc.fsu.edu/~jburkardt/datasets/knapsack_01/knapsack_01.html).

This is no longer a decision problem but an optimization problem.

The OPB format allows you to define an objective function to minimize. If you have a maximization problem, you just need to minimize the negation of the objective function.

The objective function must be given after the head but before the constraints, using the same format as the LHS of and LPB constraint.

* #variable= 3 #constraint= 1
min: +4 x1 +3 x2 +1 x3;
+1 x1 +1 x2 +1 x3 >= 1;

Given 10 items with the following weights and values:

weights = 23,31,29,44,53,38,63,85,89,82

values  = 92,57,49,68,60,43,67,84,87,72

Which items should you choose for a capacity of 165?

MAXSAT

We got back to our initial Researchers and Seat problem.

Suppose that there are more participants than the available seats. You may still want to assign the available seats to some researchers, even if you cannot satisfy everybody.

In that case, it means that we will violate the constraint ``Each researcher should have a seat". Obviously, we would like to violate that constraint for a minimum number of researchers.

We enter the MAXSAT world!

An unbiased solution using partial MAXSAT

We still want to ensure that each seat is assigned to at most one researcher. This is called a hard constraint.

Let's consider that all the researchers are equal. Thus the constraint asserting that each research should have a seat can be assigned a uniform penalty (say 1). Those constraints become soft.

The WCNF format for expressing MAXSAT problem is quite close to Dimacs one.

The header mentions the format wcnf instead of cnf. It must also contain an additional information: the top weight, representing the weight of the hard constraints.

The clauses are prefixed by a weight:

  • hard clauses are prefixed by the top weight
  • soft clauses are prefixed by any weight smaller than the top clause

In the example below, the top weight is 10. The first two clauses are hard, and the remaining two clauses are soft, with a weight of 1.

p wcnf 3 4 10
10 1 2 -3 0
10 -1 2 0
1 2 3 0
1 -2

It is quite simple to modify the script generating a Dimacs file to generate a Partial MAXSAT formula:

  • the script now takes too parameters, m the number of researchers and n the number of seats. By varying n you can check how many clauses must be violated to assign the available seats;
  • the preamble now starts with p wcnf instead of p cnf;
  • the top weight 10 is used to define hard clauses;
  • the at least one constraint is made soft by adding a weight of 1;
  • the at most one constraints are made hard by adding a weight of 10.

A small script to decode the answer is provided to check the assignment.

$ ./sittingresearchersmaxsat.py 10 7 >maxsat10-7.wcnf
$ java -jar sat4j-maxsat.jar maxsat10-7.wcnf |./whoisseated.py maxsat10-7.wcnf
R3S6 R5S7 R6S2 R7S5 R8S1 R9S3 R10S4

The solver assigned the seats to the researchers 5 to 10, plus researcher 3.

Note that the head of department, researcher 1, is not seated!

A biased solution using weighted partial MAXSAT

If all the constraints are not equally relaxable, it is possible to give them different weights. In that case, we enter the Weighted Partial MAXSAT world.

Let's consider that the order of the numbers assigned to the researchers should be respected, i.e. that R1 should be the first to be seated, then R2, R3 and so on.

We can just assign a different weight to each at least constraint: here $m-i+1$ where i is the number associated to the researcher.

This is exactly what does this updated script.

Relaxing PB constraints with Weighted Boolean Optimization

It is also possible to relax PB constraints: the PB evaluation provides a format for that, called Weighted Boolean Optimization.

The header should be extended with additional information:

  • #soft= the number of soft constraint
  • mincost= the minimal cost of a soft constraint
  • maxcost= the maximal cost of a soft constraint
  • sumcost= the sum of the costs of all soft constraints

A new specific line soft: indicates the threshold of cumulated cost that makes the formula unsat.

Finally, the constraints can be prefixed by a weight, given between square brackets.

In the following example, the second constraint is soft, with a weight of 1.

* variables= 3 *constraints= 2 #soft= 1 mincost= 1 maxcost= 1 sumcost= 1
soft: 10 ;
+3 x_1 +2 x_2 +1 ~x_3 >= 3;
[1] +1 x_1 +1 x_2 +1 x_3 <= 1;

The following script generates an unbiased version of the researchers and seats problem with relaxable constraints as a WBO problem.

That problem can be solved by Sat4j.

$ ./sittingresearcherswbo.py 10 6 >sitting10-6.wbo
$ java -jar sat4j-pb.jar sitting10-6.wbo
$ java -jar sat4j-pb.jar CuttingPlanes sitting10-6.wbo

One can note that in this case, the optimal solution is found quicker when using the Cutting Planes proof system than when using the default resolution proof system.