Gt Ga Notes Np
Consider watching this before you start!
Definitions (NP1)
How do we prove that a problem is computationally difficult? Meaning that it is hard to devise an efficient algorithm to solve it for all inputs.
To do this, we prove that the problem is NP-complete.
- Define what is NP
- What exactly it means for a problem to be NP-complete
We will also look at reductions and formalize this concept of reduction.
computational complexity
- What does NP-completeness mean?
- What is
or mean? - How do we show that a problem is intractable?
- unlikely to be solved efficiently
- Efficient means polynomial in the input size
- To show this, we are going to prove that it is NP complete
- unlikely to be solved efficiently
P = NP
NP = class of all search problems. (In some courses it talks about decision problems instead which has a need for a witness for particular instances, but thats out of scope)
P = class of search problems that are solvable in polynomial time.
The rough definition is the problem where we can efficiently verify solutions, so P is a subset of NP. If I can generate solutions in polynomial time then I can also verify in polynomial time.
So if P = NP, that means I can generate a solution and verify it. If
Search Problems:
- Form: Given instance I (input),
- Find a solution
for if one exists - Output NO if I has no solutions
- Find a solution
- Requirement: To be a search problem,
- If given an instance I and solution S, then we can verify S is a solution to I in polynomial time (Polynomial in
)
- If given an instance I and solution S, then we can verify S is a solution to I in polynomial time (Polynomial in
Let’s take a look at some examples
SAT problem
Input: Boolean formula in CNF with n variables and m clauses output: Satisfying assignment if one exists, NO otherwise
What is the running time to verify a SAT solution?
SAT
Given f and assignment to
time to check that a clause is satisfied total time to verify
Therefore, SAT is a search problem and therefore SAT is in NP.
Colorings
K-colorings problem:
Input: Undirected
K-coloring
Given G and a coloring, in
MST
Input:
Is MST in the class P? - Can you generate a solution in polynomial time?
- MST is a search problem
- Can clearly find a solution, run Krushal’s or pim
Is MST in class NP? Can you verify a given solution in polynomial time?
- Run BFS/DFS to check that T is a tree
- To check if its minimum weight, run Kruskal’s or Prims to check that T has min weight.
- May output different tree, but will have the same minimum weight.
- Same runtime of
Knapsack
Input: n objects with integer values
- Total weight
- Maximum total value
Two variants of knapsack, with and without repetition.
Is Knapsack in the class NP?
Given a solution
The second thing to check is
- The runtime is not polynomial in the input size. (Not in n, logB bits)
- Knapsack not known to be in NP
Knapsack is in NP, is incorrect. We do not know how to show that knapsack is in NP. Is knapsack not in NP? We cannot prove that at the moment, if we could prove that knapsack is not in NP, that would imply that P is not equal to NP. So as far as we know right now knapsack is not known to be in the class NP.
Is Knapsack in the class P? Also No.
Similarly, Knapsack is not known to be in the class P. There might be a polynomial time algorithm for knapsack, P might be equal NP in which case knapsack will lie in NP.
There is a simple variant of the knapsack problem which is in the class NP, what we are going to have to do is drop the optimization (max) and add in another input parameter.
- That additional input parameter is going to be our goal for the total value and then we are going to check check whether our total value of our subset is at least the goal.
- We can do binary search on that additional parameter, the goal for the total value.
Knapsack-Search
Input: weights
Output: subset S with
If no such subset
Notice in the original version we are maximizing this sum of the total value, here we are simply finding a subset whose total value is at least
Well, we can simply do binary search over this
How many rounds in our binary search are we going to have to run? The maximum value is clearly all the items available:
So we need to do binary search between 1 and
Now this version can be solved in NP time, need to check in poly-time:
Given input
Both of the above can be checked in
Special note, you might be confused why the sum is
P=NP revisited
- P stands for polynomial time.
- NP stands for nondeterministic polynomial time
- problems that can be solved in poly-time on a nondeterministic machine
- By nondeterministic time, it means it is allowed to guess at each step
The whole point of this is to tell you that NP does not meant not-polynomial time, but it is nondeterministic polynomial time.
So recall that:
- NP = all search problems
- P = search problems that can be solved in poly-time
The question now is, are there problems lie in NP which do not lie in P? That means
NP-completeness
if
- NP-complete problems
- Are the hardest problems in the class NP
If
The contra positive of this statement is:
If a NP-complete problem can be solved in poly-time, then all problems in NP can be solved in poly time.
Hence to show a problem such as SAT, we have to show that if there is a polynomial time algorithm for SAT then there is a polynomial time algorithm for all problems in the class NP.
- To do that, we are going to take every problem in the class NP, and for each of these problems have a reduction to SAT.
- If we can solve SAT in polynomial time, we can solve every such problem in class NP in polynomial time.
SAT is NP-complete
SAT is NP-complete means:
- SAT
NP - If we can solve SAT in poly-time, then we can solve every problem in NP in poly-time.
- Reduce all problems to SAT!
If
- Because if we can solve SAT in poly-time then we can solve all problems in NP in polynomial time.
- Or if you believe that nobody knows how to prove that
then nobody knows a polynomial time algorithm for SAT.
It might occur to you that a polynomial time to solve SAT does not exists, otherwise we would have proven
Reduction
For problems A and B, a reduction of can be represented as:
Means we are showing that B is at least as hard computationally as A, i.e if we can solve B, then we can solve A.
If we can solve problem B in poly-time, then we can use that algorithm to solve A in poly-time.
How to do a reduction
Colorings
Suppose there is a poly-time algorithm for SAT and use it to get a poly-time algo for Colorings.
You can take any problem
So we need to define
: input for colorings input for SAT, : solution for solution for colorings
Need to prove that if
NP-completeless proof
To show: Independent sets is NP-complete
- Independent Sets (IS)
NP- Solution can be verified in polynomial time
- Reduction of A to Independent Sets
Suppose we know SAT is NP-complete,
Instead, to show IS is NP-complete, we need:
- IS
NP - SAT
IS
3SAT (NP2)
The SAT is NP-complete, and known as the Cook-Levin Theorem (71). It was proved independently in 1971 by Steven Cook and Leonid Levin.
Karp’72 - 21 other problems are NP-complete
3SAT
Input: Boolean formula
Output: Satisfying assignment, if one exists & No otherwise.
Now, to show that 3SAT is NP-complete, need to show:
- 3SAT
NP - SAT
3SAT
Why do we care about this? Because if we an do this, then:
The implications is that if we have a polynomial time algorithm for 3SAT, then we have a polynomial time algorithm for every problem in NP, because we can reduce every problem in NP to 3SAT.
3SAT in NP
Given 3SAT input
To verify this assignment is a Satisfying assignment:
- For each clause
in time can check that at least one literal in is satisfied.- Order
time per clause, there are clauses so takes total time. - It is
because fixed number of literals per clause.
- Order
SAT reduces to 3SAT
To do so, we need to take input
- Need to create
for 3SAT - Transform satisfying
for for satisfies satisfies
Example:
We can re-write
The claim is
Forward direction, take satisfying assignment for
- if
or then set - if
or then set
Backwards direction, take satisfying assignment for
- if
then or - if
, then or
Big Clauses
Given
- Create
new variables, - Note that these variables are distinct for each clause, so if you have 2 clauses you will have two sets of new variables
- So if you have
clauses with literals then you have have order new variables
- So if you have
Now we need to show that
Forward implication: Take assignment to
Let
- Since
clause of is satisfied.- e.g if
then is satisfied- It has
which is the term
- It has
- e.g if
- Set
to satisfy - Set
to satisfy rest
Reverse implication, take assignment to
- Suppose at least one
, then we can just use the original - Suppose otherwise
, we will show it is not possible to satisfy .- From clause
- From clause
- From clause
- From clause
but this last literal is not set to true, it is all False. So the entire C is evaluated to be False.
SAT reduces to 3SAT cont
1
2
3
4
5
6
7
8
9
SAT->3SAT
Consider f for SAT
Consider f' for 3SAT:
For each clause C in f:
if |C| <= 3 then:
add C to f'
if |C| > 3 then:
create k-3 new variables
and add C' as defined as before
It remains to show that
- Then, we need to show given a satisfying assignment to
, how to map it back to .- That will be straight forward because we just ignore these auxiliary variables and the setting for the original variables will give us a satisfying assignment to the original form.
Correctness of SAT->3SAT
Forward direction: Given assignment
- If
then we simply ignore it, otherwise: - Remember that for each C, we define a new set
of variables.- There is an assignment
new variables so that is satisfied
- There is an assignment
Backward direction: Given satisfying assignment for
- For
literal in C is satisfied- Otherwise there would be no way to satisfy this sequence of clauses
- Otherwise there would be no way to satisfy this sequence of clauses
- Ignore the new variables and just look at the setting on the original n variables
Satisfying assignment
Now, how do we get the mapping back?
That is, we need transform our SAT assignment
We can ignore the assignment for the new variables and keep the assignment for the original variables the same, then we get a satisfying assignment for
Size of problem?
has variables in the worst case- We are also replacing every clause by order
clauses. - So we also have
clauses
But this is ok! Because the size of
Graph Problems (NP3)
- Independent sets
- Clique
- Vertex cover
Independent Set
For undirected
An example:
Another example:
Trivial to find small independent sets:
- Empty set
- Singleton vertex
Challenging problem is to find the max independent set.
Question: The maximum independent set problem is known to be in NP, True or False?
- If you give me a solution, no way to verify that it is, in fact, of maximum size.
- The only way to verify if it is of maximum size is it if I can solve this problem in polynomial time.
- Only holds if
- Only holds if
- Assuming
, the max independent set is not in the class NP.- Similar scenario for the optimization version of knapsack problem
Max independent set
Input: undirected
Output: independent set
Theorem: The independent set problem is NP-complete.
Need to show:
- Independent Set
NP- Given input
and solution , verify that is a solution in polynomial time. - in
time can check all pairs and verify - in
time can check
- Given input
- Need to show that the independent set is at least as hard as every problem in the class NP.
- Take something which is known to be at least as hard as everything in the class NP such as SAT or 3SAT.
- Show a reduction from that known NP-complete problem to this new problem.
- Show that
3SAT is considered instead of SAT because it is easier.
3SAT -> IS
Consider 3SAT input
Idea: For each clause
- Since there are
clauses, there is at most vertices in graph - Going to add edges to encode this clause.
- Add additional edges between vertices in different clauses in order to ensure consistent assignment
Clauses edges
Two types of edges, Clauses and Variables.
Clause
- If another vertices shares
, we will have another vertex corresponding to , multiple vertices corresponding to the same literal.
Independent set
Since every independent set has at most one vertex per clause, in order to achieve our goal of an independent set of at least
This one vertex per clause will correspond to this satisfied literal in that clause. Now there may be other satisfied literals in the clause due to other copies of that literal in other clauses, but this property that we have exactly one vertex per clause will ensure that we have at least one satisfied literal in every clause, and therefore, this solution, this independent set will be able to relate it to a satisfying assignment for this original formula.
Now, what happens if i take an independent set containing
So we set
Variable edges
For each
This will ensure that our independent set corresponds to a valid assignment. And then the clause edges will ensure that our independent set corresponds to an assignment which satisfies all the clauses.
Example
Now an example independent set of size four in this graph:
Note that
Now lets prove that in general, that an independent set of size
Correctness 3SAT -> IS
Forward direction: Consider a satisfying assignment for
For each clause
contains exactly one vertex per clause.- So
, and our goal is met.
Since
Therefore, S is an independent set of size equal to
Backward direction: Take independent set
- Now this vertex corresponds to a literal in the original formula so we set that corresponding literal to true.
- Since every clause has a satisfied literal then we know every clause is satisfied and therefore the formula is satisfied. But does this clause belongs to an assignment?
Notice we have no contradicting literal in this set since we added edges between
So we taken an independent set of size at least
This proves that a reduction from 3SAT to independent set is correct and it shows how to take an independent set and construct a satisfying assignment. And if there is no independent set of size at least
This completes the proof that the independent set problem is NP-complete.
NP-HARD
In the Max-independent set problem, it is not known to lie in the class NP. Notice that it is quite straight forward to reduce the independent set problem to max independent set problem. I am looking for an independent set of size at least
So it is quite forward to reduce the search version to the optimization version. That means I have reduction from the independent set problem to the maximum independent set problem and in fact, I have a reduction from every problem in NP to the MAX independent set problem. Either going indirectly through or directly.
Theorem: The Max-Independent Set problem is NP-hard
NP-hard means it is at least as hard as everything in the class NP. So there is a reduction from everything in the class NP to this problem max independent set. So if we can solve max independent set in polynomial time, then we can solve everything in NP in polynomial to be NP-complete.
Clique
For undirected graphs,
- for all
- All pairs of vertices in the subset
are connected by an edge.
- All pairs of vertices in the subset
Want large cliques, small cliques are easy to find:
- For an example the empty set
- Singleton vertex
- Any edge (with two end points)
Now, let’s defined the clique problem:
Input:
Output:
- Find the largest clique possible
Theorem: The clique problem is NP complete
Clique
- Given input
and solution . - For all
, check that- At most takes
for each pair - At most
time to check whether they are connected by an edge - Total of
with a trivial algorithm- You could do it easily in
as well
- You could do it easily in
- At most takes
- Check that
- Order
time
- Order
Now, we need to show that the clique problem is at least as hard as other NP problems. We show reduction from a known NP-complete problem to the clique problem
Choices are SAT, 3SAT, IS - take the one most similar to clique, Independent Sets which is also a graph problem
IS -> Clique
The key idea is for clique, it is all edges within
For
Let
Observation:
Recall that for independent sets, all pairs of vertices are not connected by an edge, which is the opposite for a clique.
IS
Given input
- If we get solution
for clique, then return for IS problem - If we get NO, then return NO.
Vertex cover
For every
It is easy to find a large vertex cover, i.e all vertices in the vertex cover, but in this case we are trying to find the minimum vertex cover.
Input:
Output: vertex cover
Theorem: vertex cover problem is NP-complete
Vertex Cover
- Given input
and proposed solution , can we verify this in polynomial time?- For every
of or are in- Can be done in
time
- Can be done in
- Check that
- Can be done in
time
- Can be done in
- For every
Now, we need to take independent set problem and reduce it to vertex cover
The claim is
Forward direction:
Take vertex cover
- Therefore
of or in no edge contained in since it only contains or , so does not have the edge
- Thus
is an independent set
Reverse direction:
Take independent set
- For every
, of or in- implies that
of or in covers every edge
- implies that
IS -> Vertex cover
Claim:
For input
- Let
- Run vertex cover on
- Run vertex cover on
So now we have:
- Given solution
for vertex cover, return as solution to independent set problem- We know that if
is a vertex cover of size at most , then we know that is an independent set of size at least
- We know that if
- If NO solution for VC return NO for IS problem.
Knapsack (NP4)
Subset-sum
Subset-sum
Input: positive integers
Output: Subset
If using dynamic programming, we can solve this in
This is not in P because this is the same as knapsack, the running time has to be
Subset-sum is NP-complete
First show that subset-sum
Given inputs
To compute this sum, there are are most
Now we need to show the second part, is at least as hard as every problem in class NP. We are going to use the 3SAT problem.
The reduction will really illustrates why the order
3SAT -> Subset sum
Input to subset-sum:
and represents and represents each clause. for the number of literals represents the desired sum
All these numbers are huge, all are
- This is so that if we add up any subset of numbers, there will be no carries between the digits.
- This means
Lets take a deeper look:
corresponds to : corresponds to :
And assignment for the 3-SAT formula either sets
To achieve this, in
This specification ensures that our solution to the subset sum problem is going to correspond to an assignment. Whether it is satisfying or not is another issue.
Example Reduction:
From here, Digit
- if
put a 1 in digit for . - if
put a 1 in digit for . - Put a 3 in digit
of - Use
to be buffer- put a 1 in digit
of
- put a 1 in digit
- Put a 0 in digit
of other numbers
Take
- So to get 3, we need at least one
to be .
Correctness Subset sum
Now is to prove that subset-sum has a solution
Forward direction:
Take solution
- For digit
where , to get a 1 in digit , to include or but not both (or neither) - So the only way to get a sum of exactly 1 in digit
is to include exactly- If
then - If
then
- If
get an assignment
Now we need to show that this assignment is a satisfying assignment
- For digit
where- Corresponds to clause
- To get a sum of 3 in digit
:- Need to use
literal of and include- If we have two
we just need to use either , and so on. - If we satisfy all three
then don’t need either - If none are satisfied there is no way to achieve a sum of 3, therefore no solution exists
- If we have two
- Need to use
is satisfied
- Corresponds to clause
Backward direction:
Take a satisfying assignment for
- If
, add to - If
, add to digit of is correct
- For clause
, at least 1 literal in is satisfied- The numbers corresponding to these literals gives us a sum one, two or three in the digit
.- To get a sum of
in digit , use the numbers corresponding to these literals use to get up to the sum of three.
- To get a sum of
- This ensures that digit
is correct and therefore the last digits are correct and the first digits are correct.
- The numbers corresponding to these literals gives us a sum one, two or three in the digit
- Therefore we have a solution to the subset-sum instance
Knapsack is NP-complete
Prove that the knapsack problem is NP-complete.
- Use the correct version of the knapsack problem, the version which lies in the class NP.
As before, there is two parts:
- Knapsack
NP - Known NP-complete problem
knapsack
We know that all problems in NP reduce to this known NP-complete problem, therefore if we show a reduction from this known NP-complete problem to the knapsack problem, we know all NP problems can reduce to the knapsack problem.
- Try to use the subset-sum to show this
Halting Problem (NP5)
Undecidability
NP-complete means it is computationally difficult, if P
Undecidable means it is computationally impossible, no algorithm solves the problem on every input regardless of the running time (or space) of the algorithm.
In 1936, Alan Turing prove that the halting problem is undecidable!
Halting problem
Input: A program
Output: True if
For example, consider a program P:
1
2
while x mod 2 == 1:
x += 6
If x is initially set to 5, then this program never halts.
HP: undecidable
Theorem: The halting problem is undecidable.
We prove this by contradiction, suppose we had an algorithm that solves the halting problem on every input, call this algorithm the Terminator(P,I). We construct a new program Q and input J, show that Terminator(Q,J) is incorrect.
Since Terminator is incorrect on this pair of inputs, therefore, Terminator does not solve the halting problem on every input which is a contradiction.
Notice that we are assuming the existence of this Terminator program! We can use it as a subroutine, a black box.
HP: paradox
Consider the following “evil” program:
1
2
3
4
Harmful(J):
(1)if Terminator(J,J):
then GOTO(1)
else Return()
Dijkstra - Goto statement considered harmful (Feel free to google this)
The harmful program is just an if then else statement, takes an input J, and run Terminator(J,J). So J is a program and J is the input to the program J.
If Terminator returns true on this input pair then our program goes to one, so we get this loop. If Terminator returns false, then we simply exit the procedure and we exit the program harmful.
HP: What’s harmful?
Terminator(J,J): Runs program on input J
- Returns True if J(J) terminates
- Returns False if J(J) never terminates
Therefore:
- If J(J) terminates then Harmful(J) never terminates
- If J(J) never terminates then Harmful(J) terminates
HP: paradox
To arrive at our paradox, we set input J = Harmful which we just defined.
The question is does Harmful(Harmful) terminate when it takes itself as an input?
So lets consider the two cases and plugging in our above statements when J = Harmful,
- If Harmful(Harmful) terminates, then Harmful(Harmful) never terminates
- This is a contradiction and this cannot be the case.
- If Harmful(Harmful) never terminates, then Harmful(Harmful) terminates
- But this is also a contradiction
Therefore, the assumption that Terminator() exists is impossible! Our initial assumption that the program Terminator which solves the halting problem on every input exists must be impossible