LCS LeetCode: Solve Longest Common Subsequence Problems
Hey guys! Ever stumbled upon a problem that just makes you scratch your head? The Longest Common Subsequence (LCS) problem is one of those classic computer science puzzles that appears frequently in coding interviews and algorithm challenges like those on LeetCode. In this article, we're going to break down what LCS is all about, how to approach it, and walk through some examples to make sure you've got a solid grasp on it.
What is the Longest Common Subsequence?
So, what exactly is the Longest Common Subsequence? Imagine you have two strings, let's call them text1 and text2. A subsequence is a sequence of characters that appear in the same order in both strings, but they don't necessarily have to be consecutive. The common subsequence is simply a subsequence that's present in both strings. Now, the longest common subsequence is, you guessed it, the longest possible subsequence that both strings share. For example, if text1 = "ABCDGH" and text2 = "AEDFHR", the LCS is "ADH". Notice that the characters 'A', 'D', and 'H' appear in both strings in that order, but not necessarily next to each other.
The LCS problem is a fundamental concept in computer science with applications in various fields like bioinformatics (comparing DNA sequences), text comparison (finding similarities between documents), and data compression. Understanding how to efficiently find the LCS of two strings is a valuable skill for any programmer.
How to Solve the LCS Problem
Alright, let's dive into how we can actually solve this problem. The most common and efficient way to find the Longest Common Subsequence is by using dynamic programming. Dynamic programming is a technique where you break down a larger problem into smaller overlapping subproblems, solve those subproblems, and store their results to avoid recomputing them later. This approach is perfect for LCS because the length of the LCS between prefixes of the two strings can be used to determine the length of the LCS for the entire strings.
Here's the general idea:
- Create a Table: We'll create a 2D table (or matrix) called
dpwheredp[i][j]will store the length of the LCS oftext1[0...i-1]andtext2[0...j-1]. The table will have dimensions(m+1) x (n+1), wheremis the length oftext1andnis the length oftext2. We add 1 to each dimension to accommodate the base case of empty prefixes. - Initialize the Table: The first row and first column of the
dptable are initialized to 0. This is because the LCS of any string with an empty string is always an empty string (length 0). - Fill the Table: Now, we iterate through the table, filling each cell based on the following rules:
- If
text1[i-1] == text2[j-1], it means the characters at the current positions in both strings match. In this case, we increment the length of the LCS found so far by 1. So,dp[i][j] = dp[i-1][j-1] + 1. - If
text1[i-1] != text2[j-1], it means the characters at the current positions don't match. In this case, we take the maximum length of the LCS found by either excluding the current character fromtext1or excluding the current character fromtext2. So,dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
- If
- Result: After filling the entire table, the value in
dp[m][n]will represent the length of the LCS of the entire stringstext1andtext2.
Pseudocode
To make it even clearer, here's some pseudocode:
function longestCommonSubsequence(text1, text2):
m = length(text1)
n = length(text2)
# Create a table to store lengths of LCS for subproblems
dp = a 2D array of size (m+1) x (n+1)
# Initialize first row and column to 0
for i from 0 to m:
dp[i][0] = 0
for j from 0 to n:
dp[0][j] = 0
# Fill the table in bottom-up manner
for i from 1 to m:
for j from 1 to n:
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
# Return the length of LCS
return dp[m][n]
Example Walkthrough
Let's walk through an example to solidify our understanding. Suppose text1 = "ABCBDAB" and text2 = "BDCABA". We'll create our dp table and fill it in step by step.
| B | D | C | A | B | A | ||
|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| A | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| B | 0 | 1 | 1 | 1 | 1 | 2 | 2 |
| C | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
| B | 0 | 1 | 1 | 2 | 2 | 3 | 3 |
| D | 0 | 1 | 2 | 2 | 2 | 3 | 3 |
| A | 0 | 1 | 2 | 2 | 3 | 3 | 4 |
| B | 0 | 1 | 2 | 2 | 3 | 4 | 4 |
Here's how we fill the table:
dp[0][0]todp[0][6]anddp[0][0]todp[7][0]are initialized to 0.- For
dp[1][4],text1[0] = 'A'andtext2[3] = 'A'are equal, sodp[1][4] = dp[0][3] + 1 = 0 + 1 = 1. - For
dp[2][1],text1[1] = 'B'andtext2[0] = 'B'are equal, sodp[2][1] = dp[1][0] + 1 = 0 + 1 = 1. - For
dp[2][5],text1[1] = 'B'andtext2[4] = 'B'are equal, sodp[2][5] = dp[1][4] + 1 = 1 + 1 = 2. - For
dp[3][3],text1[2] = 'C'andtext2[2] = 'C'are equal, sodp[3][3] = dp[2][2] + 1 = 1 + 1 = 2. - For
dp[4][5],text1[3] = 'B'andtext2[4] = 'B'are equal, sodp[4][5] = dp[3][4] + 1 = 2 + 1 = 3. - For
dp[5][2],text1[4] = 'D'andtext2[1] = 'D'are equal, sodp[5][2] = dp[4][1] + 1 = 1 + 1 = 2. - For
dp[6][4],text1[5] = 'A'andtext2[3] = 'A'are equal, sodp[6][4] = dp[5][3] + 1 = 2 + 1 = 3. - For
dp[7][5],text1[6] = 'B'andtext2[4] = 'B'are equal, sodp[7][5] = dp[6][4] + 1 = 3 + 1 = 4.
Finally, dp[7][6] = 4, which means the length of the Longest Common Subsequence of "ABCBDAB" and "BDCABA" is 4. One possible LCS is "BCAB" or "BDAB".
LeetCode Problem Example
Let's look at a typical LeetCode problem:
Problem: Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.
Example 1:
Input: text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace" and its length is 3.
Example 2:
Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" and its length is 3.
Example 3:
Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.
Solution (Python)
Here's a Python solution to the LeetCode problem:
def longestCommonSubsequence(text1: str, text2: str) -> int:
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i - 1] == text2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
This Python code implements the dynamic programming approach we discussed earlier. It creates the dp table, initializes it, fills it based on whether the characters match or not, and then returns the final result.
Optimizations and Considerations
While the dynamic programming approach is efficient, there are a few optimizations we can consider:
- Space Optimization: The
dptable usesO(m*n)space. However, we only need the previous row to calculate the current row. So, we can reduce the space complexity toO(min(m, n))by using only two rows (or even one row with careful updates). - Finding the Actual LCS: The code above only finds the length of the LCS. If you need to find the actual LCS string, you can backtrack through the
dptable fromdp[m][n]todp[0][0], reconstructing the LCS based on the decisions made during the table filling process.
Common Mistakes
- Incorrect Initialization: Forgetting to initialize the first row and column of the
dptable to 0 can lead to incorrect results. - Off-by-One Errors: Be careful with indexing when comparing characters in the strings. Remember that Python uses 0-based indexing, and the
dptable has an extra row and column. - Not Understanding the Recursive Relation: Make sure you fully understand the recursive relation
dp[i][j] = dp[i-1][j-1] + 1(if characters match) anddp[i][j] = max(dp[i-1][j], dp[i][j-1])(if characters don't match).
Conclusion
The Longest Common Subsequence (LCS) problem is a classic example of how dynamic programming can be used to solve complex problems efficiently. By understanding the underlying principles and practicing with examples, you'll be well-equipped to tackle LCS problems on LeetCode and in coding interviews. Keep practicing, and you'll become an LCS master in no time! Good luck, and happy coding!