Glorified Summery: Automated Theorem Provers

Introduction

In mathematics creating or coming up with a proof is a difficult procedure. The proof we do come up with can contain errors and mistakes that need to be checked and verified for. Computerisation of mathematical proofs could potentially speed up the process of generating and verifying the correctness of proofs. Computerisation of mathematics proofs begins by defining a formal language that can represent complex mathematical statements.

The most common such representation is First Order Logic (FOL). First order logic is also called predicate logic or predicate calculus (Wikipedia The Free Encyclopedia, First Order Logic, 2024). First order logic describes a meta-language using which you could describe the actual language you want to use. First order language used constant symbols, variable symbols (there can be free variable and bound variable), function symbols of n-ary, predicates symbols of n-ary, and logical symbols and syntax and semantic rules defining what makes a well formed statement. For example, the mathematical statement of additive inverse would be represented as

\[\forall x . \exists y . x + y = 0\]

The statement is read as for all x there exists y such that their sum is zero. Here, \(\forall\), \(\exists\) are logical symbols, \(0\) is a constant symbols, \(x\) and \(y\) are bound variable symbols, \(+\) is a binary function in infix notation, and \(=\) can be thought of as a predicate. First order logic requires an additional information regarding the proving machinism, that describes how to construct new valid well formed mathematical statements from existing ones. This proving machinism is in the form of rules of inference (Wikipedia The Free Encyclopedia, Rules of Inference, 2024). An example for a rule of inference commonly used in first order logic is Modus Ponen

\[(p) \land (p \to q) \implies q\]

Given that; if \(p\) then \(q\), and we know that \(p\) is true, then it follows that \(q\) must be true.

First order logic may not be required to represent some simple statements where we could only use Zeroth Order Logic also called propositional logic (Wikipedia The Free Encyclopedia, Propositional calculus, 2024). Similarly, first order logic might be inadequate to represent more complex mathematical statements where Second Order Logic or Higher Order Logic may be using (Wikipedia The Free Encyclopedia, Higher Order Logic, 2024). For example, there is no way in first order logic to identify the set of all cubes and tetrahedrons. But the existence of this set can be asserted in second-order logic as

\[\exists P. \forall x . (P(x) \leftrightarrow Cube(x) \lor Tet(x))\]

Another representation is Type Theory. A Type Theory is the formal presentation of a specific type system (Wikipedia The Free Encyclopedia, Type Theory, 2024). Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of mathematics. Two influential type theories that have been proposed as foundations are:

Typed λ-calculus of Alonzo Church (Helmut Brandl, 2024)
Intuitionistic type theory of Per Martin-Löf (Hall, J. G., ed. Intuitionistic Type Theory)

Given a formal definition for what constitutes as a valid well formed mathematical statement with syntax and semantics to represent it, along with formal definition to construct proofs, we could develop computer programs to verify and generate proofs.

Examples of Automated Theorem Provers that use Formal Logic are

Vampire (Vampire, 2024)
E (The E Theorem Prover, 2024)
Z3 Theorem Prover (GitHub, The Z3 Theorem Prover, 2024)

Examples of Automated Theorem Provers that use Type Theory are

LEAN (Programming Language and Theorem Prover - LEAN, 2024)
Coq (The Coq Proof Assistant, 2024)

LEAN and Coq are not fully automated provers, but can be used to verify proofs and works as an computer assistant for mathematicians using it.

Some of the Methods for Proving

Proof by Refutation

The negation of the statement trying to be proved is appended to the set of axioms along with necessary assumptions, then checking if the resulting set of formulas is unsatisfiable. If it is, then the statement being proved is a logical consequence of the axioms and the assumptions. A proof of unsatisfiability of a negation of formula is sometimes called a refutation of this formula, so such proofs are commonly referred to as proofs by refutation.

The system appends the negation of the statement trying to be proved into the set of axioms, and applied rules of inferences on the axioms till it derives false. If we successfully end up deriving false, then the statement is indeed a consequence of the axioms.

Vampire using proof by Refutation algorithm in its provers. (Kovács, L. & Voronkov, A, 2013)

Resolution is a rule of inference leading to a refutation-complete theorem-proving technique for sentences in zeroth and first order logic. For propositional logic, systematically applying the resolution rule acts as a decision procedure for formula unsatisfiability, solving the (complement of the) Boolean satisfiability problem. (Wikipedia The Free Encyclopedia, Resolution (logic), 2024)

Resolution rule of inference is stated as

\[(p \lor q) \land (\lnot p \lor q) \implies q\]

SLD resolution (Selective Linear Definite clause resolution) is the basic inference rule used in logic programming. It is a refinement of resolution, which is both sound and refutation complete for Horn clauses. (Wikipedia The Free Encyclopedia, Selective Linear Definite clause resolution, 2024)

Model Elimination

The mnemonic label “model elimination” is used because the procedure seeks the truth-functionally contradictory statement associated with the procedures of creating new statements using the axioms by developing statements already “false” under some of the possible truth assignments over its atomic components. The procedure strives to form new statements which increase the percentage of the truth assignments yielding false for the statement. (Loveland, D.W. ,1968)

Method of Analytic Tableau

An analytic tableau is a tree structure computed for a logical formula, having at each node a sub-formula of the original formula to be proved or refuted. Computation constructs this tree and uses it to prove or refute the whole formula. The tableau method can also determine the satisfiability of finite sets of formulas of various logics. It is the most popular proof procedure for modal logics.

A method of truth trees contains a fixed set of rules for producing trees from a given logical formula, or set of logical formulas. Those trees will have more formulas at each branch, and in some cases, a branch can come to contain both a formula and its negation, which is to say, a contradiction. In that case, the branch is said to close. If every branch in a tree closes, the tree itself is said to close. In virtue of the rules for construction of tableaux, a closed tree is a proof that the original formula, or set of formulas, used to construct it was itself self-contradictory, and therefore false. Conversely, a tableau can also prove that a logical formula is tautologous: if a formula is tautologous, its negation is a contradiction, so a tableau built from its negation will close (Colin Howson, 1997).

Quantifier Elimination

The idea behind quantifier elimination is to rewrite the statements containing quantifiers into statements that do not contain any quantifiers. It is easier to deal with statements without quantifiers, proving such statements is an easier task. Informally, a quantified statement “\(\forall x\) such that \(…\)” can be viewed as a question “When is there an \(x\) such that \(…\) ?”, and the statement without quantifiers can be viewed as the answer to that question (Wikipedia The Free Encyclopedia, Quantifier elimination, 2024). For example, the statement “quadratic polynomial has a real root if and only if its discriminant is non-negative” can be represented as

\[\exists x \in R . (a \neq 0 \land ax^2 +bx + c = 0)\]

using quantifiers, and

\[a \neq 0 \land b^2 -4ac > 0\]

without any quantifiers.

Similar idea can be applied when statements are represented in higher order logic. Statements in higher order logic can be transformed or reduced into simpler statements of first order logic. (Manfred Kerber. 1991)

SMT Solvers

There is substantial overlap between first-order automated theorem provers and SMT (# Satisfiability Modulo Theories) solvers and Boolean satisfiability problem. Generally, automated theorem provers focus on supporting full first-order logic with quantifiers, whereas SMT solvers focus more on supporting various theories (interpreted predicate symbols). Automated Theorem Provers excel at problems with lots of quantifiers, whereas SMT solvers do well on large problems without quantifiers. (Wikipedia The Free Encyclopedia, Satisfiability modulo theories, 2024)

Discoveries using Automated Theorem Provers

Robbins Conjecture

Robbins algebra is an algebra containing a single binary operation, usually denoted by \(\lor\) (or), and a single unary operation usually denoted by \(\lnot\) (not) satisfying the following axioms:

for all \(a\), \(b\) & \(c\)

Associativity: \(a \lor (b \lor c) = (a \lor b) \lor c\)
Commutativity: \(a \lor b = b \lor a\)
Robbins equation: \(\lnot (\lnot (a \lor b) \lor \lnot (a \lor \lnot b)) = a\)

For many years, it was conjectured, but unproven, that all Robbins algebras are Boolean algebras. (Robbins Algebras Are Boolean, 2024) William McCune proved the conjecture in 1996, using the automated theorem prover EQP (EQP: Equational Theorem Prover, 2024).

Four Colour Theorem

The four colour theorem, or the four colour map theorem, states that no more than four colours are required to colour the regions of any map so that no two adjacent regions have the same colour.

The theorem is a stronger version of the five color theorem, which can be shown using a significantly simpler argument. Although the weaker five color theorem was proven already in the 1800s, the four color theorem resisted until 1976 when it was proven by Kenneth Appel and Wolfgang Haken.

The Appel-Haken proof proceeds by analyzing a very large number of reducible configurations. This was improved upon in 1997 by Robertson, Sanders, Seymour, and Thomas who have managed to decrease the number of such configurations to 633 – still an extremely long case analysis. In 2005, the theorem was verified by Georges Gonthier using a general-purpose theorem-proving software. (K. Appel. W. Haken, 1977)

Formalisation of Fermat’s Last Theorem

Fermat’s Last Theorem states that no three positive integers \(a\), \(b\), and \(c\) satisfy the equation \(a^n + b^n = c^n\) for any integer value of \(n\) greater than 2. The cases \(n = 1\) and \(n = 2\) have been known since antiquity to have infinitely many solutions. The proposition was first stated as a theorem by Pierre de Fermat around 1637 in the margin of a copy of Arithmetica. After 358 years of effort by mathematicians, the first successful proof was released in 1994 by Andrew Wiles and formally published in 1995.

The Fermat’s Last Theorem Project is a project to formalise the entire proof of Fermat’s Last Theorem. The project is an open project, where anyone could contribute. The LEAN programming language which is also a proof assist-er is used for this process of formalisation. The project is hosted on GitHub. The proof depends on multiple areas of mathematics and the current database of theorems in LEAN is inadequate. The project intends to formalise multiple branches of mathematics over next couple of years. The project will be funded by Engineering & Physical Sciences Research Council for the first 5 years. (The Fermat’s Last Theorem Project, Lean community blog, 2024)

Limitation

Definitions

Completeness: A formal theory (that is rules of inference and starting set of axioms) is said to be complete if we can prove or prove the negation of any statement that can be expressed in that system.

Consistent: A formal theory is said to be consistent if it is free of contradictions. That is, the theory should not be able to prove a statement and also it negation. In such a system, it is possible to prove any and all statements to be true statement.

Decidable: A formal theory is said to be decidable if there exists an algorithm that takes in a statement in the system as its input and returns true or false depending on if the statement is true under the theory or false.

David Hilbert wanted to come up with a formal theory of mathematics that is complete, consistent and decidable, and could express entirety of mathematics. (Stanford Encyclopedia of Philosophy Archive, Hilbert’s Program, 2024)

Gödel’s incompleteness theorems by Kurt Gödel, showed that any system complex enough to do arithmetic on natural numbers cannot be complete, consistent and decidable at the same time. (Kurt Gödel, 1931)

The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an effective procedure (i.e. an algorithm) is capable of proving all truths about the arithmetic of natural numbers. For any such consistent formal system, there will always be statements about natural numbers that are true, but that are unprovable within the system.

The second incompleteness theorem, shows that no system can demonstrate its own consistency, i.e. a system cannot prove that it is itself free of contradictions.

Alan Turing showed that having a decidable algorithm is equivalent to solving the halting problem. (Alan Turing, 1936)

Example Usage

Solving a exercise problem from Discrete Mathematics Textbook

Problem Statement: Determine whether this argument is valid: If Superman were able and willing to prevent evil, he would do so. If Superman were unable to prevent evil, he would be impotent; if he were unwilling to prevent evil, he would be malevolent. Superman does not prevent evil. If Superman exists, he is neither impotent nor malevolent. Therefore, Superman does not exist. (Kenneth Rosen, 2011)

The problem can be solved using zeroth order logic (propositional logic). We first need to represent the question in notations of formal logic.

Let the following be constant symbols representing an atomic statement from the problem statement

\(a\): “Superman is able to prevent evil”
\(b\): “Superman is willing to prevent evil”
\(c\): “Superman is impotent”
\(d\): “Superman is malevolent”
\(e\): “Superman prevents evil”
\(f\): “Superman exists”

Now, the assumptions or the axioms can be written as follows:

\((a \land b) \to e\): “If Superman were able and willing to prevent evil, he would do so”
\(\lnot e \to c\): “If Superman were unable to prevent evil, he would be impotent”
\(\lnot b \to d\): “if he were unwilling to prevent evil, he would be malevolent”
\(\lnot e\): “Superman does not prevent evil.”
\(f \to (\lnot c \land \lnot d)\): “If Superman exists, he is neither impotent nor malevolent”

And the conclusion would be:
\(\lnot f\): “Superman does not exist”

Solving using Vampire

Vampire uses TPTP (TPTP, 2024) syntax to represent mathematical statements. The above mathematical statements will be written as:

fof(id1, axiom,
(a & b) => e).

fof(id2, axiom,
~e => c).

fof(id3, axiom,
~b => d).

fof(id4, axiom,
~e).

fof(id5, axiom,
f => (~c & ~d)).

fof(res, conjecture, ~f).

And running Vampire on the above

...
% Refutation found. Thanks to Tanya!
% SZS status Theorem for superman
% SZS output start Proof for superman
2. ~e => c [input]
4. ~e [input]
5. f => (~d & ~c) [input]
6. ~f [input]
7. ~~f [negated conjecture 6]
8. f [flattening 7]
11. f => ~c [pure predicate removal 5]
12. c | e [ennf transformation 2]
13. ~c | ~f [ennf transformation 11]
14. c | e [cnf transformation 12]
15. ~e [cnf transformation 4]
16. ~c | ~f [cnf transformation 13]
17. f [cnf transformation 8]
19. 1 <=> e [avatar definition]
23. 2 <=> c [avatar definition]
26. 1 | 2 [avatar split clause 14,23,19]
27. ~1 [avatar split clause 15,19]
29. 3 <=> f [avatar definition]
32. ~3 | ~2 [avatar split clause 16,23,29]
33. 3 [avatar split clause 17,29]
34. $false [avatar sat refutation 26,27,32,33]
% SZS output end Proof for superman
...

Solving using logic

Logic is a python library written by Vipul Cariappa (GitHub, Logic, 2024), that uses rules of inference to construct the proof. Logic presently only works with zeroth order logic, therefore can be applied in this case. But the algorithm used is not complete.

from logic import Proposition, IMPLY, prove


# Creating propositional variables
a = Proposition("a") # "Superman is able to prevent evil"
b = Proposition("b") # "Superman is willing to prevent evil"
c = Proposition("c") # "Superman is impotent"
d = Proposition("d") # "Superman is malevolent"
e = Proposition("e") # "Superman prevents evil"
f = Proposition("f") # "Superman exists"

# encoding assumptions
assumptions = [
    IMPLY(a & b, e),
    IMPLY(~e, c),
    IMPLY(~b, d),
    ~e,
    IMPLY(f, ~c & ~d),
]

# encoding conclusion
conclusion = ~f

# printing assumptions
print("Assumptions:")
for i in assumptions:
    print(i)

# printing conclusion
print(f"Conclusion: {conclusion}")

# generating proof
proof, truth = prove(assumptions, conclusion)
assert truth == True # checking if it could be proved

# printing proof
print(proof)

Output:

Assumptions:
(((a ∧ b)) → (e))
((¬ (e)) → (c))
((¬ (b)) → (d))
¬ (e)
((f) → ((¬ (c) ∧ ¬ (d))))
Conclusion: ¬ (f)
                           c                 Modus Ponens {((¬ (e)) → (c)), ¬ (e)}    
                   ¬ (¬ (c))                   Not Of Not {c}                         
     (¬ (¬ (c)) ∨ ¬ (¬ (d)))                     Addition {¬ (¬ (c)), ¬ (¬ (d))}      
         ¬ ((¬ (c) ∧ ¬ (d)))              De'Morgen's Law {(¬ (¬ (c)) ∨ ¬ (¬ (d)))}   
                       ¬ (f)                Modus Tollens {((f) → ((¬ (c) ∧ ¬ (d)))), ¬ ((¬ (c) ∧ ¬ (d)))}

Using Predicates in Vampire

Encoding arithmetic over natural number using Peano Axioms (Wikipedia The Free Encyclopedia, Peano Axioms, 2024) to prove \(2 = 1 + 1\)

fof(
    identity_add,
    axiom,
    ! [X] : add(X, zero) = X
).

fof(
    identity_mul,
    axiom,
    ! [X] : mul(X, zero) = zero
).

fof(
    op_add,
    axiom,
    ! [X, Y] : add(X, succ(Y)) = succ(add(X, Y))
).

fof(
    op_mul,
    axiom,
    ! [X, Y] : mul(X, succ(Y)) = add(X, mul(X, Y))
).

fof(
    successor,
    axiom,
    ! [X, Y] : (succ(X) = succ(Y)) => (X = Y)
).

fof(
    successor_neq_zero,
    axiom,
    ! [X] : ~(succ(X) = zero)
).

fof(
    two_eqs_one_plus_one,
    conjecture,
    succ(succ(zero)) = add(succ(zero), succ(zero))
).

Output

...
% Refutation found. Thanks to Tanya!
% SZS status Theorem for peano
% SZS output start Proof for peano
1. ! [X0] : add(X0,zero) = X0 [input]
3. ! [X0,X1] : add(X0,succ(X1)) = succ(add(X0,X1)) [input]
7. succ(succ(zero)) = add(succ(zero),succ(zero)) [input]
8. ~succ(succ(zero)) = add(succ(zero),succ(zero)) [negated conjecture 7]
11. succ(succ(zero)) != add(succ(zero),succ(zero)) [flattening 8]
16. add(X0,zero) = X0 [cnf transformation 1]
18. add(X0,succ(X1)) = succ(add(X0,X1)) [cnf transformation 3]
22. succ(succ(zero)) != add(succ(zero),succ(zero)) [cnf transformation 11]
31. succ(succ(zero)) != succ(add(succ(zero),zero)) [superposition 22,18]
32. succ(succ(zero)) != succ(succ(zero)) [forward demodulation 31,16]
33. $false [trivial inequality removal 32]
% SZS output end Proof for peano
...

Using LEAN to prove sum of two even numbers is even

LEAN is based on Type Theory. LEAN does not provide a proof, but is a interactive proof assistant, that can detect errors in proof and can verify the proof. If the LEAN compiler does not give any error it means that the proof is a valid proof.

The proof of sum of two even numbers is even in LEAN looks as follows:

def is_even (a : Nat) := ∃ b, a = 2 * b

theorem even_plus_even (h1 : is_even a) (h2 : is_even b) : is_even (a + b) :=
  Exists.elim h1 (fun w1 (hw1 : a = 2 * w1) =>
  Exists.elim h2 (fun w2 (hw2 : b = 2 * w2) =>
    Exists.intro (w1 + w2)
      (calc a + b
        _ = 2 * w1 + 2 * w2 := by rw [hw1, hw2]
        _ = 2 * (w1 + w2)   := by rw [Nat.mul_add])))

Conclusion

Automated theorem provers have significantly advanced the landscape of mathematical proof verification and generation. By leveraging formal logic, type theory, and various inference techniques, automated theorem provers offer robust tools for tackling complex mathematical problems. The development of systems like Vampire, E, Z3, LEAN, and Coq exemplifies the diverse approaches and strengths within the field. Notable achievements, such as the resolution of the Robbins Conjecture and the formal verification of the Four Colour Theorem and formalisation of Fermat’s Last Theorem, highlight the practical and theoretical impact of these technologies. As automated theorem provers continue to evolve, their integration with modern computational frameworks promises to further enhance their efficiency and applicability, paving the way for new discoveries and the formalisation of more complex mathematical theories.

References

Wikipedia The Free Encyclopedia, First Order Logic, viewed 17th July 2024, (https://en.wikipedia.org/wiki/First-order_logic)
Wikipedia The Free Encyclopedia, Propositional calculus, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Propositional_logic)
Wikipedia The Free Encyclopedia, Higher-Order Logic, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Higher-order_logic)
Wikipedia The Free Encyclopedia, Type Theory, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Type_theory)
Helmut Brandl, 2024, Typed Lambda Calculus / Calculus of Constructions
Hall, J. G., ed. Intuitionistic Type Theory: Notes by Giovanni Sambin of a series of lectures given in Padova, June 1980. By Per Martin-Löf (1980). Digital Edition (2021).
Wikipedia The Free Encyclopedia, Rules of Inference, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Rule_of_inference)
Vampire, viewed 17th July 2024, (https://vprover.github.io/index.html)
The E Theorem Prover, viewed 17th July 2024, (https://wwwlehre.dhbw-stuttgart.de/~sschulz/E/E.html)
GitHub, The Z3 Theorem Prover, viewed 17th July 2024, (https://github.com/Z3Prover/z3)
Programming Language and Theorem Prover - LEAN, viewed 17th July 2024, (https://lean-lang.org/)
The Coq Proof Assistant, viewed 17th July 2024, (https://coq.inria.fr/)
Kovács, L. & Voronkov, A. (2013). First-Order Theorem Proving and Vampire . In: Sharygina, N., Veith, H. (eds) Computer Aided Verification. CAV 2013. Lecture Notes in Computer Science, vol 8044. Springer, Berlin, Heidelberg, (https://doi.org/10.1007/978-3-642-39799-8_1)
Wikipedia The Free Encyclopedia, Resolution (logic), viewed 17th July 2024, (https://en.wikipedia.org/wiki/First-order_resolution)
Loveland, D.W. (1968). Mechanical Theorem-Proving by Model Elimination. In: Siekmann, J.H., Wrightson, G. (eds) Automation of Reasoning. Symbolic Computation. Springer, Berlin, Heidelberg, (https://doi.org/10.1007/978-3-642-81955-1_8)
Colin Howson, Logic with Trees, An Introduction to Symbolic Logic, 1997 http://ndl.ethernet.edu.et/bitstream/123456789/77275/1/29.pdf
Wikipedia The Free Encyclopedia, Quantifier elimination, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Quantifier_elimination)
Manfred Kerber. 1991. How to prove higher order theorems in first order logic. In Proceedings of the 12th international joint conference on Artificial intelligence - Volume 1 (IJCAI’91). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 137–142.
Wikipedia The Free Encyclopedia, Satisfiability modulo theories, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Satisfiability_modulo_theories)
Wikipedia The Free Encyclopedia, Selective Linear Definite clause resolution, viewed 17th July 2024, (https://en.wikipedia.org/wiki/SLD_resolution)
The Fermat’s Last Theorem Project, Lean community blog, viewed 17th July 2024, (https://leanprover-community.github.io/blog/posts/FLT-announcement/)
EQP: Equational Theorem Prover, viewed 17th July 2024, (https://www.cs.unm.edu/~mccune/eqp/)
Robbins Algebras Are Boolean, viewed 17th July 2024, (https://www.cs.unm.edu/~mccune/papers/robbins/)
K. Appel. W. Haken. “Every planar map is four colorable. Part I: Discharging.” Illinois J. Math. 21 (3) 429 - 490, September 1977. https://doi.org/10.1215/ijm/1256049011
Kenneth Rosen, Discrete Mathematics and Its Applications, McGraw-Hill Education, 2011
TPTP, viewed 17th July 2024, (https://www.tptp.org/)
GitHub, Logic, Vipul-Cariappa, viewed 17th July 2024, (https://github.com/Vipul-Cariappa/logic)
Wikipedia The Free Encyclopedia, Peano Axioms, viewed 17th July 2024, (https://en.wikipedia.org/wiki/Peano_axioms)
Alan Turing, “On Computable Numbers, With An Application To The Entscheidungsproblem”
Stanford Encyclopedia of Philosophy Archive, Hilbert’s Program, viewed 17th July 2024, (https://plato.stanford.edu/archives/spr2023/entries/hilbert-program)
Kurt Gödel, On Formally Undecidable Propositions of Principia Mathematica and Related Systems I