The CSS algorithm can be viewed as a set of interconnected devices - they literally are in a hardware implementation.
We have two LFSRs, which operate independently, and feed their top 8 bits into an adder. This evaluates the sum of these two
values with a carry, updates the carry and outputs the low 8 bits of the result (LFSRBYTE). This byte feeds into a reversible
transformation which transforms the ciphertext into plaintext (or vice-versa, in the reversed implementation).

If we reverse this machine, and pump plaintext and cipertext into the last device, we can obtain the LFSRBYTE.
Then, given the LFSRBYTE and knowing the output of LFSR1, we can obtain the output of LFSR2. Since this output is actually part
of LFSR2's state, and since LFSR2 is constructed so that the top 8 bits are the only bits which change through an iteration [the
other bits are merely shifted], we can obtain a correct value for LFSR2 after only 4 bytes of plaintext, and a pair of possible
values for LFSR2 after only 3 bytes of plaintext.

Once we know LFSR1 and LFSR2 we can decrypt the rest of the text, but I will show that we can also *backtrack* the LFSRs
to the start of the block, without reference to the plaintext or the ciphertext. Since the LFSR states at the start of a
block are simply the key bits, this allows us to extract the key, and completes the plaintext attack.

Suppose we have a plaintext sequence **P[]** and a ciphertext sequence **C[]**. The decryption proceeds as follows:

P[0] = TABLE[C[0]] ^ LFSRBYTE

Trivially, then, we have:
LFSRBYTE = TABLE[C[0]] ^ P[0]

so we can calculate the output sequence from the LFSR device.
Let's look at the LFSR device - this actually consists of two sub-devices. One is the pair of LFSRs, and the other is the
combiner which takes the top 8 bits, XORs them (depending on the mode) and adds them together. The procedure is:

sum = qponmlkj ^ H1 + YXWVUTSR ^ H2 + CC
CC' = sum[8]
LFSRBYTE = sum[7:0]

Now, assuming we know the value of CC and the value of qponmlkj (and H1, H2 and LFSRBYTE). Then we have:

CC':LFSRBYTE = qponmlkj ^ H1 + YXWVUTSR ^ H2 + CC
YXWVUTSR = (CC':LFSRBYTE - qponmlkj ^ H1 - CC) ^ H2

Examining bit 8 in detail, it must be zero on the left since this equation holds for all bits, not just bits 0 to 7. So if we
*assume* CC' is zero and calculate the left-hand side, if the carry is zero we were right (CC' = zero) and if the carry is one we
were wrong (CC' = 1). Therefore, we have:

CC':YXWVUTSR = (LFSRBYTE - qponmlkj ^ H1 - CC) ^ H2

with the equation valid in 9-bit arithmetic.
So by knowing the starting state of LFSR #1 and CC, and by knowing the plaintext, we have managed to calculate the top 8 bits of
LFSR #2 as they are after the first byte has been decrypted.

Looking at how LFSR #2 changes each step, we see that those 8 bits remain in LFSR #2, unchanged, after the next
two steps. So after only three steps, we know the correct value for all the bits of LFSR #2 except the bottom one. After one more
step, we have a 32-bit "history" of LFSR #2, and the most recent 25 bits are the *exact* value of LFSR #2 after the fourth byte.

With only 4 bytes of plaintext, and knowing the initial state of LFSR #1 and CC, we have obtained the correct state of LFSR #2, and
we can decode the rest of the data from this point.

This is, of course, assuming that all our values of **qponmlkj** were correct. We must verify that this is so. We can do this
most easily by continuing the CSS algorithm, and checking that our bytes all match. It turns out that matching a further 3 bytes
guarantees that your state was correct. It is clear that at least 3 bytes are required, since a search space of 18 bits requires
at least 18 bits of information to select a unique member.

So given 7 bytes of plaintext we can generate the correct CSS state and decrypt everything *after* the plaintext just by searching
the space of LFSR #1 and CC - which is a total of only 18 bits. If your plaintext is at the start of a decrypted section, CC is zero and
bit **i** of LFSR #1 is 1, so the search space is only 16 bits.

If you have even less plaintext, a more efficient alternative to this method is to only attack using 3 bytes of plaintext, and then to guess
the last bit of LFSR #2. This increases the search space to 19 bits, but it uses one less byte for the attack so more bytes will be available
for verification. 6 bytes of plaintext can now select a key exactly, while 5 bytes can give rise to up to 8 different keys, since we have
a search space of 19 bits but only 16 bits of information is available to validate keys.

This 5-byte plaintext attack is important because it can be used to extract a list of all player keys given one player key (since each player
key encodes the same title key, the title key becomes plaintext if you have just one player key). In this case, the plaintext is at the start
of a CSS block, and so the search space is 17 bits. Thus, one DVD can give 2 possibilities for each key. Another DVD gives another set of 2
possibilities for each key. Assuming the intersection of these sets narrows (which I do not prove, but seems reasonable) eventually we will
have either one or zero solutions for each plaintext (zero solutions may occur if a key has been revoked).

You are in the correct state after a 4-byte plaintext attack, but you need to backtrack the state to the start of the block - then you can
extract the original key. You don't, however, need to backtrack CC - since its initial value is known. So you can backtrack the CSS
algorithm to the start, without reference to either the plaintext or the ciphertext!

Backtracking the LFSRs is done as follows in hardware:

old LFSR #1 = p onmlkjih gfedcbaq
^ 0 00000000 0000000n
old LFSR #2 = X WVUTSRQP ONMLKJIH GFEDCBAY
^ 0 00000000 00000000 0000000C
^ 0 00000000 00000000 0000000D
^ 0 00000000 00000000 0000000L

which translates to this after 8 iterations:

old LFSR #1 = i hgfedcba qponmlkj
^ 0 00000000 nmlkjihg
old LFSR #2 = Q PONMLKJI HGFEDCBA YXWVUTSR
^ 0 00000000 00000000 CBAYXWVU
^ 0 00000000 00000000 DCBAYXWV
^ 0 00000000 00000000 LKJIHGFE
^ 0 00000000 00000000 000CBAYX
^ 0 00000000 00000000 000DCBAY
^ 0 00000000 00000000 000LKJIH
^ 0 00000000 00000000 0000CBAY
^ 0 00000000 00000000 0000DCBA
^ 0 00000000 00000000 0000LKJI
^ 0 00000000 00000000 000000CB
^ 0 00000000 00000000 000000DC
^ 0 00000000 00000000 000000LK

This can be expressed in C++ as follows:

// LFSR #1
// extract second term
unsigned long term2 = (LFSR1 >> 6) & 0xFF;
// rotate LFSR #1 left 8 bits
LFSR1 = (LFSR1 << 8) | (LFSR1 >> 9);
// XOR in term 2
LFSR1 ^= term2
// remove junk bits
LFSR1 &= 0x1FFFF;
// LFSR #2
// rotate LFSR #2 left 8 bits
LFSR2 = (LFSR2 << 8) | (LFSR2 >> 17);
// form product term (terms 2 to 4)
unsigned long product = ((LFSR2 >> 3) ^ (LFSR2 >> 4) ^ (LFSR2 >> 12)) & 0xFF;
// XOR in product term for the 4 blocks
LFSR2 ^= product ^ (product >> 3) ^ (product >> 4) ^ (product >> 6);
// remove junk bits
LFSR2 &= 0x1FFFFFF;

If the whole thing has worked, then bit **i** of LFSR #1 and bit **V** of LFSR #2 are both set. The key can now be extracted
from the LFSRs directly.