Introduction
Welcome to ToughSAT. This site will generate boolean CNF formulas that encode "difficult" problems. For example, you can generate a boolean CNF formula whose satisfying assignment encodes non-trivial factors of an input number. Finding the satisfying assignment for this boolean formula is at least as hard as factoring the input number, a problem which is believed to be difficult for classical computers.
We hope this site is both useful as an education tool (for students learning about NP-completeness and complexity theory, say), or for testing SAT solvers, or for computer science research. Please feel free to contact either of us with comments, questions, suggestions.
The boolean CNF formulas are output in the DIMACS format, which is the standard format that many SAT solvers use. Examples of SAT solvers: MiniSat, Sat4J, RSat, UBCSat, and many others.
Types of Problems
Getting started with MiniSat
Choose a Problem Type, set your parameters, and click "Generate". Save the file to the same folder as your MiniSat executable (for Windows users, we have provided a zip file containing MiniSat and the requisite cygwin libraries here). In the command line, execute the following command:
The output file will contain the satisfying assignment to the boolean formula (if it exists).
Other
This site is powered by Python. Questions? Comments? E-mail us at either hyuen AT DOMAIN.EDU or bebel AT DOMAIN.EDU - except replace DOMAIN.EDU with usc.edu :)
Problem Type:
Factoring
Usage: This will generate a CNF formula whose satisfying assignment encodes two non-trivial factors of the product Factor1*Factor2 (if there exist any).
For a source of big primes, consult this website.
Discussion
If Factor1 and Factor2 are n-bit numbers, then the product
Factor1*Factor2 is (at most) a 2n-bit number. In order to create a
formula that encodes this instance of the factoring problem, observe
that factorization is the inverse of multiplication. First, we
construct the boolean multiplication function
as a boolean circuit of AND, OR, and NOT
gates using the standard, grade-school multiplication algorithm.
Second, we implement the boolean indicator function
where

again as a boolean circuit. By concatenating the output of the first
circuit with the second circuit, we obtain a boolean circuit
implementing the function
which evaluates to 1 only on input two factors of
Factor1*Factor2. As a computational problem, this is an instance of
CircuitSAT (a NP-complete language).
To convert this CircuitSAT instance to a CNF-SAT instance, we use an algorithm similar to one described here (http://www.scottaaronson.com/democritus/lec6.html). Our algorithm allows more than 3 literals per clause, which reduces the total number of clauses and variables in the final CNF-SAT instance. The algorithm can also construct a standard 3SAT instance by restricting the number of literals per clause to 3, at the expense of additional variables and clauses.
Finally, we add on extra clauses that prohibit the constant 1 from being a factor, because we are only interested in non-trivial factorizations of Factor1*Factor2.
A similar multiplier-based Factoring-to-3SAT reduction using the Chinese Remainder theorem, rather than a grade-school multiplication algorithm, is in the works. The multiplication algorithm is described here (http://arxiv.org/PS_cache/cs/pdf/9809/9809117v2.pdf)
Factoring 2
Usage: This will generate a CNF formula whose satisfying assignment encodes two non-trivial factors of the Target (if there exist any)..
Random Subset Sum
Usage: This will generate a random subset sum instance (i.e.the ground set is of size Set Size and whose elements are random positive integers less than Max Number, and the target number is the sum of a random subset of the ground set), and encode the instance as a boolean CNF formula. A satisfying assignment to the formula will encode a subset of the ground set that adds up to the target number.
Manual Subset Sum
Random k-SAT
Usage: This will generate a random k-SAT boolean formula with the specified number of variables and clauses
Commentary
For a rough estimate of the transition from an easy-to-satisfy instance to hard-to-satisfy instance of a k-CNF boolean formula, we can use the following formula: c = log(2k/(2k - 1))-1, where k is the number of literals per clause, c is the critical clause-to-variables ratio, and the logarithm is taken base 2. A given k-CNF instance with a clause-to-variables ratio less than this critical value is very likely to be satisfiable, and one with a ratio higher than this critical value is unlikely to be satisfiable.
Note, the clause-to-variables ratio is simply a heuristic, and not a hard indicator of the number of satisfying assignments.
Here are two plots comparing the difficulty of 4-CNF instances (as measured by the time it takes to solve these instances on MiniSat) against the clause-to-variables ratio:
In both plots there's clearly a peak around the clause-to-variables ratio = 10 (the above estimate give critical threshold c ~ 10.7). However, the vertical axis on the right figure exhibits a much larger range - this is due to the instances on the right hand side were random 4-CNF formulas converted to 3-CNF (so the clause-to-variables ratio reflects that of the converted 3-CNF formulas), whereas the instances on the left-hand side were 4-CNF formulas that remained 4-CNF.
Cocktail SAT
Usage: This will generate a composite boolean formula that has several sub-problems embedded in it.