Tough SAT Project

Henry Yuen and Joseph Bebel

News

Introduction

Welcome to ToughSAT. This site will generate boolean CNF formulas that encode "difficult" problems. For example, you can generate a boolean CNF formula whose satisfying assignment encodes non-trivial factors of an input number. Finding the satisfying assignment for this boolean formula is at least as hard as factoring the input number, a problem which is believed to be difficult for classical computers.

We hope this site is both useful as an education tool (for students learning about NP-completeness and complexity theory, say), or for testing SAT solvers, or for computer science research. Please feel free to contact either of us with comments, questions, suggestions.

The boolean CNF formulas are output in the DIMACS format, which is the standard format that many SAT solvers use. Examples of SAT solvers: MiniSat, Sat4J, RSat, UBCSat, and many others.

Types of Problems

Getting started with MiniSat

Choose a Problem Type, set your parameters, and click "Generate". Save the file to the same folder as your MiniSat executable (for Windows users, we have provided a zip file containing MiniSat and the requisite cygwin libraries here). In the command line, execute the following command:

MiniSat.exe <downloaded SAT file> <output file>

The output file will contain the satisfying assignment to the boolean formula (if it exists).

Other

Author and current maintainer: Joseph Bebel (bebel AT usc DOT edu)

Author: Henry Yuen (hyuen AT DOMAIN.EDU except replace DOMAIN.EDU with usc.edu :)

This site is powered by Python. Questions? Comments? E-mail us!

 

Problem Type:

Factoring (Karatsuba)

Usage: This will generate a CNF formula whose satisfying assignment encodes two non-trivial factors of the product Factor1*Factor2 (if there exist any).

For a source of big primes, consult this website.

This method uses Karatsuba's algorithm for multiplication to reduce the CNF size relative to the classical or schoolbook algorithm.

Include factors in DIMACS file

Discussion

If Factor1 and Factor2 are n-bit numbers, then the product Factor1*Factor2 is (at most) a 2n-bit number. In order to create a formula that encodes this instance of the factoring problem, observe that factorization is the inverse of multiplication. First, we construct the boolean multiplication function as a boolean circuit of AND, OR, and NOT gates using Karatsuba's algorithm.

Second, we implement the boolean indicator function I: {0,1}^2n \rightarrow {0,1} where

I(x) = \begin{cases} 1 \text{if }x = \text{Factor1}\cdot\text{Factor2} \\ 0 \text{otherwise} \end{cases}

again as a boolean circuit. By concatenating the output of the first circuit with the second circuit, we obtain a boolean circuit implementing the function I \circ f: {0,1}^n \times {0,1}^n \rightarrow {0,1} which evaluates to 1 only on input two factors of Factor1*Factor2. As a computational problem, this is an instance of CircuitSAT (a NP-complete language).

To convert this CircuitSAT instance to a CNF-SAT instance, we use an algorithm similar to one described here (http://www.scottaaronson.com/democritus/lec6.html). Our algorithm allows more than 3 literals per clause, which reduces the total number of clauses and variables in the final CNF-SAT instance. The algorithm can also construct a standard 3SAT instance by restricting the number of literals per clause to 3, at the expense of additional variables and clauses.

Finally, we add on extra clauses that prohibit the constant 1 from being a factor, because we are only interested in non-trivial factorizations of Factor1*Factor2.

Factoring

Usage: This will generate a CNF formula whose satisfying assignment encodes two non-trivial factors of the product Factor1*Factor2 (if there exist any).

For a source of big primes, consult this website.

3CNF - every clause will have strictly 3 literals.

Discussion

If Factor1 and Factor2 are n-bit numbers, then the product Factor1*Factor2 is (at most) a 2n-bit number. In order to create a formula that encodes this instance of the factoring problem, observe that factorization is the inverse of multiplication. First, we construct the boolean multiplication function as a boolean circuit of AND, OR, and NOT gates using the standard, grade-school multiplication algorithm.

Second, we implement the boolean indicator function I: {0,1}^2n \rightarrow {0,1} where

I(x) = \begin{cases} 1 \text{if }x = \text{Factor1}\cdot\text{Factor2} \\ 0 \text{otherwise} \end{cases}

again as a boolean circuit. By concatenating the output of the first circuit with the second circuit, we obtain a boolean circuit implementing the function I \circ f: {0,1}^n \times {0,1}^n \rightarrow {0,1} which evaluates to 1 only on input two factors of Factor1*Factor2. As a computational problem, this is an instance of CircuitSAT (a NP-complete language).

To convert this CircuitSAT instance to a CNF-SAT instance, we use an algorithm similar to one described here (http://www.scottaaronson.com/democritus/lec6.html). Our algorithm allows more than 3 literals per clause, which reduces the total number of clauses and variables in the final CNF-SAT instance. The algorithm can also construct a standard 3SAT instance by restricting the number of literals per clause to 3, at the expense of additional variables and clauses.

Finally, we add on extra clauses that prohibit the constant 1 from being a factor, because we are only interested in non-trivial factorizations of Factor1*Factor2.

A similar multiplier-based Factoring-to-3SAT reduction using the Chinese Remainder theorem, rather than a grade-school multiplication algorithm, is in the works. The multiplication algorithm is described here (http://arxiv.org/PS_cache/cs/pdf/9809/9809117v2.pdf)

Factoring 2

Usage: This will generate a CNF formula whose satisfying assignment encodes two non-trivial factors of the Target (if there exist any)..

3CNF - every clause will have strictly 3 literals.

 

Random Subset Sum

Usage: This will generate a random subset sum instance (i.e.the ground set is of size Set Size and whose elements are random positive integers less than Max Number, and the target number is the sum of a random subset of the ground set), and encode the instance as a boolean CNF formula. A satisfying assignment to the formula will encode a subset of the ground set that adds up to the target number.

3CNF - every clause will have strictly 3 literals (otherwise can have up to 4).

Manual Subset Sum

Usage: You can enter your own ground set and target number here.

3CNF - every clause will have strictly 3 literals (otherwise can have up to 4).

Random k-SAT

Usage: This will generate a random k-SAT boolean formula with the specified number of variables and clauses

3CNF - every clause will have strictly 3 literals (otherwise can have up to k).

Commentary

For a rough estimate of the transition from an easy-to-satisfy instance to hard-to-satisfy instance of a k-CNF boolean formula, we can use the following formula: c = log(2k/(2k - 1))-1, where k is the number of literals per clause, c is the critical clause-to-variables ratio, and the logarithm is taken base 2. A given k-CNF instance with a clause-to-variables ratio less than this critical value is very likely to be satisfiable, and one with a ratio higher than this critical value is unlikely to be satisfiable.

Note, the clause-to-variables ratio is simply a heuristic, and not a hard indicator of the number of satisfying assignments.

Here are two plots comparing the difficulty of 4-CNF instances (as measured by the time it takes to solve these instances on MiniSat) against the clause-to-variables ratio:

In both plots there's clearly a peak around the clause-to-variables ratio = 10 (the above estimate give critical threshold c ~ 10.7). However, the vertical axis on the right figure exhibits a much larger range - this is due to the instances on the right hand side were random 4-CNF formulas converted to 3-CNF (so the clause-to-variables ratio reflects that of the converted 3-CNF formulas), whereas the instances on the left-hand side were 4-CNF formulas that remained 4-CNF.

 

Cocktail SAT

Usage: This will generate a composite boolean formula that has several sub-problems embedded in it.

Factoring
Random Subset Sum
Literals per clause:
3CNF - every clause will have strictly 3 literals.