we would like to find a way to encode computational problems such that our mathematical computation models can correctly take inputs and return correct outputs.
this process should be mathematically precise, yet simple and general enough to apply to any problem we can think of.
Decision problem
A problem where the only answers are either yes or no.
Decision problems are special because all decision problems can be encoded as a Language . We feed an input string , and decide whether is in . If , then the answer is yes. Otherwise it’s no.
The difficult part now is how to reword our questions so that they can be answered with a yes or no.
examples
sequence of characters
notice that any instance of a problem, whether it be a text document or a graph, can be encoded by a sequence of characters. for example, 0 and 1.
this leads us to the idea of alphabets. using an alphabet , we can describe any given problem as a Language based on .
character coverage in documents
given a document, check if each of the vowels (a, e, i, o, u) appears at least once in the document. what information should the algorithm track?
simplest way is to have a boolean for each vowel (aka a bit, 0 or 1) and flip it to true when vowel is discovered. in the end we check if all bits are 1.
finite number of states
each bit has two states, so the total number of possible states is . the amount of memory used is constant. it is independent of the input length.
then, one model of computation we can use are Finite-state automata.
language encoding
we can define a Language such that
which describes what will contain. we could easily have also said , and so on for all the vowels.
character count in documents
we want to check if the number of occurrences of characters “a” and “e” are the same. again, we need to keep track of… something
in this case, we keep a counter. we increment the counter if we see an “a”, and decrement if we see an “e.” at the end we check if the counter is 0. this avoids having to track two counters, which uses more memory.
infinite number of states
notice that the value of the counter is unbounded. the amount of information we must keep track of depends on the input size (in this case, the document length). there are infinite possible states.
in other words, if the computing device we use only has finitely many states, then it cannot solve this problem. this means the Language is not regular and no DFA exists.
language encoding
we define a language such that
which says that vowels “a” and “e” have to occur the same number of times in any given string . cannot be regular. we can prove this by showing that an infinite set of strings is pairwise distinguishable w.r.t. to .
syntactic errors for programs
given a program, check if its text adheres to all syntax rules of that language. the compiler is the algorithm we run and the code is the input. the output returns whether there are syntax errors or not.
this problem is (unsurprisingly) solvable, otherwise languages would suck to learn. the compiler simply checks a list of syntax rules and determines if all of them are followed.
semantic errors (bugs) in programs
given a program, verify that its execution cannot lead to a semantic error. for example, does Microsoft Windows version contain no bugs?
it turns out that the problem is provably unsolvable. there DNE a compiler than can certify this type of correctness in programs.
finding the most connected person
given a graph of connections, find the person who has the maximum number of friends.
we could simply iterate over each person and find their number of connections, and at the end return the one with the most connections.
if we have people with connections, then the runtime will roughly linearly scale with respect to and . however, it is still in the realm of solvability.
language encoding
Let’s first reword the question to be a decision question instead.
given a graph of connections among people (numbered through ) and person identifier (between and ), check if person has the highest number of connections.
we can then define an Alphabet where we use bits to represent numbers, and a special symbol as a “spacer” or separator.
we can then define a language
this counts as a very low-level encoding of such a problem and is very tedious to work with, but the point is that it can be done.
it also turns out that this problem uses a non-Regular language. this is because the number of possible states is unbounded (depends on ).
finding the largest mutually connected group
given a graph of friend connections, find the largest clique.
if we know the group of friends to check for, this is trivial. however, for any given clique of size , the number of possible groups to check is . this means the number of groups exponentially grows.
there is no efficient solution to check all the groups.