Create 187. Repeated DNA Sequences.md

2026-07-07 10:55:39 +00:00 · 2019-09-01 15:29:38 +08:00 · 2019-09-01 15:29:38 +08:00 · 1fab8e0087
commit 1fab8e0087
parent 1710b1e2b9
1 changed files with 61 additions and 0 deletions
--- a/solutions/187.
+++ b/solutions/187.
@ -0,0 +1,61 @@
+# [187. Repeated DNA Sequences](https://leetcode.com/problems/repeated-dna-sequences/)
+
+# 思路
+## 思路一
+最直观的思路，从前往后遍历s，并用一个集合substr_set记录所有长度为10的子串集合，遇到重复出现的重复子串就将其加入结果集合output_set中，
+最后转换成数组返回。
+
+> **注意代码for循环中不能写成`for(int i = 0; i <= s.size() - 10; i++)`，
+原因见我之前写的博客[实际应用中关于size_t的一个容易忽略的坑](https://shusentang.github.io/2018/12/08/size-t/)。**
+
+## 思路二
+思路二是针对思路一的改进，因为思路一中的substr_set是字符串型的，所以在进行查找的时候是比较慢的（而且空间复杂度也较大），
+这个集合只是记录某个子串是否重复，所以我们有必要记录完整的子串吗？
+答案当然是不用，如果我们能将每个子串唯一地映射到某个（所占空间更小的）值，那我们只需记录后者就可以了。
+
+由于给定的s中字符只有四种情况，那我们用两个bit就可以把四种情况全部编码了，例如00表示A，01表示C，10表示G，11表示T。
+那么10个字符只需要20位，比一个int值（32bit）所占空间还小，所以我们就可以用一个int值代表某个长度为10的子串，大大减小空间和查找的时间。
+
+# C++
+## 思路一
+``` C++
+class Solution {
+public:
+    vector<string> findRepeatedDnaSequences(string s) {
+        unordered_set<string>substr_set;
+        unordered_set<string>output_set;
+        for(int i = 0; i <= (int)s.size() - 10; i++){ // 不能写成 i <= s.size() - 10
+            string sub_str = s.substr(i, 10);
+            if(substr_set.count(sub_str)) 
+                output_set.insert(sub_str);
+            else
+                substr_set.insert(sub_str);
+        }
+        return vector<string>(output_set.begin(), output_set.end());
+    }
+};
+```
+
+## 思路二
+``` C++
+class Solution {
+public:
+    vector<string> findRepeatedDnaSequences(string s) {
+        unordered_set<int>substr_set;
+        unordered_set<string>output_set;
+        unordered_map<int, int> mp{{'A', 0}, {'C', 1}, {'G', 2}, {'T', 3}}; // 注意学习这种初始化方法
+        
+        int cur = 0; // 当前子串对应的int值
+        for (int i = 0; i < 9; ++i) cur = (cur << 2) | mp[s[i]];
+        for (int i = 9; i < s.size(); i++) {
+            // 先只保留低18位再左移2位, 再更新低两位
+            cur = ((cur & 0x3ffff) << 2) | mp[s[i]];
+            if (substr_set.count(cur)) 
+                output_set.insert(s.substr(i - 9, 10));
+            else substr_set.insert(cur);
+        }
+        return vector<string>(output_set.begin(), output_set.end());
+    }
+};
+```
+