🔑 哈希表与布隆过滤器面试题

Question

1. 什么是哈希表？哈希冲突如何解决？简单

Answer 1

哈希表定义

哈希表是一种通过哈希函数将键映射到数组索引的数据结构，实现O(1)的查找、插入和删除。

哈希函数

将任意大小的数据映射到固定大小的值。好的哈希函数应该：

计算快速
分布均匀（减少冲突）
确定性（相同输入产生相同输出）

哈希冲突解决方法

方法	原理	优点	缺点
链地址法	冲突元素存入链表	简单，删除方便	链表过长性能下降
开放寻址法	寻找下一个空位	缓存友好	删除复杂，容易聚集
再哈希法	使用多个哈希函数	减少冲突	计算开销大

开放寻址法的探测方式

线性探测：h(k, i) = (h(k) + i) % m
二次探测：h(k, i) = (h(k) + i²) % m
双重哈希：h(k, i) = (h1(k) + i * h2(k)) % m

负载因子

负载因子 = 元素数量 / 桶数量。Java HashMap默认负载因子0.75，超过后扩容为2倍。

Answer 2

HashMap底层结构

JDK 1.8之前：数组 + 链表
JDK 1.8及以后：数组 + 链表 + 红黑树

核心参数

初始容量：16（必须是2的幂）
负载因子：0.75
树化阈值：8（链表长度超过8转为红黑树）
树退化阈值：6（红黑树节点少于6退化为链表）

put操作流程

public V put(K key, V value) {
    // 1. 计算hash值
    int hash = hash(key);
    
    // 2. 计算数组索引
    int index = (n - 1) & hash;
    
    // 3. 如果桶为空，直接插入
    if (table[index] == null) {
        table[index] = new Node(hash, key, value, null);
    } else {
        // 4. 桶不为空，处理冲突
        Node e = table[index];
        
        // 4.1 如果key相同，覆盖
        if (e.hash == hash && e.key.equals(key)) {
            e.value = value;
        }
        // 4.2 如果是红黑树，插入树节点
        else if (e instanceof TreeNode) {
            ((TreeNode)e).putTreeVal(this, table, hash, key, value);
        }
        // 4.3 链表插入
        else {
            for (int binCount = 0; ; ++binCount) {
                if (e.next == null) {
                    e.next = new Node(hash, key, value, null);
                    // 链表长度>=8，转为红黑树
                    if (binCount >= TREEIFY_THRESHOLD - 1) {
                        treeifyBin(table, hash);
                    }
                    break;
                }
                if (e.next.hash == hash && e.next.key.equals(key)) {
                    e.next.value = value;
                    break;
                }
                e = e.next;
            }
        }
    }
    
    // 5. 检查是否需要扩容
    if (++size > threshold) {
        resize();
    }
    
    return value;
}

为什么容量是2的幂？

使用位运算 (n-1) & hash 代替取模运算，效率更高。例如：n=16时，n-1=15=0b1111，&运算相当于取hash的低4位。

为什么链表长度8转红黑树？

根据泊松分布，负载因子0.75时，链表长度达到8的概率非常小（约0.00000006）。链表查找O(n)，红黑树O(log n)，8是平衡点。

Answer 3

HashMap vs ConcurrentHashMap

特性	HashMap	ConcurrentHashMap
线程安全	否	是
null键值	允许	不允许
性能	高	稍低
并发度	不支持	支持高并发

ConcurrentHashMap实现原理

JDK 1.7：分段锁（Segment），每个Segment是一个小HashMap，锁粒度为Segment。

JDK 1.8：CAS + synchronized，锁粒度为桶（Node），并发度更高。

JDK 1.8 put操作

public V put(K key, V value) {
    int hash = spread(key.hashCode());
    
    for (Node[] tab = table;;) {
        Node f; int n, i, fh;
        
        // 1. 如果表为空，初始化
        if (tab == null || (n = tab.length) == 0)
            tab = initTable();
        // 2. 如果桶为空，CAS插入
        else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
            if (casTabAt(tab, i, null, new Node(hash, key, value, null)))
                break;
        }
        // 3. 如果正在扩容，帮助扩容
        else if ((fh = f.hash) == MOVED)
            tab = helpTransfer(tab, f);
        // 4. 桶不为空，synchronized锁住桶
        else {
            synchronized (f) {
                // 链表或红黑树插入
                ...
            }
        }
    }
    
    addCount(1L, binCount);
    return null;
}

为什么不允许null？

并发环境下，无法区分"键不存在"和"键对应的值为null"，会产生二义性。

Answer 4

问题描述

给定数组和目标值，找到两个数的索引，使它们的和等于目标值。

解法：哈希表

public int[] twoSum(int[] nums, int target) {
    Map map = new HashMap<>();
    
    for (int i = 0; i < nums.length; i++) {
        int complement = target - nums[i];
        if (map.containsKey(complement)) {
            return new int[] {map.get(complement), i};
        }
        map.put(nums[i], i);
    }
    
    return new int[] {};
}

// 时间复杂度：O(n)
// 空间复杂度：O(n)

为什么用哈希表？

哈希表提供O(1)的查找时间，避免了暴力法的O(n²)时间复杂度。

Answer 5

解法1：哈希表

public int firstUniqChar(String s) {
    Map count = new HashMap<>();
    
    // 统计每个字符出现次数
    for (char c : s.toCharArray()) {
        count.put(c, count.getOrDefault(c, 0) + 1);
    }
    
    // 找到第一个出现一次的字符
    for (int i = 0; i < s.length(); i++) {
        if (count.get(s.charAt(i)) == 1) {
            return i;
        }
    }
    
    return -1;
}

// 时间复杂度：O(n)
// 空间复杂度：O(1) - 最多26个字母

解法2：数组（仅小写字母）

public int firstUniqChar(String s) {
    int[] count = new int[26];
    
    for (char c : s.toCharArray()) {
        count[c - 'a']++;
    }
    
    for (int i = 0; i < s.length(); i++) {
        if (count[s.charAt(i) - 'a'] == 1) {
            return i;
        }
    }
    
    return -1;
}

Answer 6

字母异位词定义

两个字符串包含相同的字符，但顺序不同。例如："anagram"和"nagaram"。

解法1：哈希表

public boolean isAnagram(String s, String t) {
    if (s.length() != t.length()) return false;
    
    Map count = new HashMap<>();
    
    for (char c : s.toCharArray()) {
        count.put(c, count.getOrDefault(c, 0) + 1);
    }
    
    for (char c : t.toCharArray()) {
        count.put(c, count.getOrDefault(c, 0) - 1);
        if (count.get(c) < 0) return false;
    }
    
    return true;
}

// 时间复杂度：O(n)
// 空间复杂度：O(1)

解法2：排序

public boolean isAnagram(String s, String t) {
    if (s.length() != t.length()) return false;
    
    char[] sArr = s.toCharArray();
    char[] tArr = t.toCharArray();
    Arrays.sort(sArr);
    Arrays.sort(tArr);
    
    return Arrays.equals(sArr, tArr);
}

// 时间复杂度：O(n log n)
// 空间复杂度：O(1)

Answer 7

问题描述

给定未排序数组，找出最长连续序列的长度。要求O(n)时间复杂度。

例如：[100, 4, 200, 1, 3, 2] → 最长连续序列[1,2,3,4]，长度4。

解法：哈希集合

public int longestConsecutive(int[] nums) {
    Set set = new HashSet<>();
    for (int num : nums) {
        set.add(num);
    }
    
    int maxLen = 0;
    
    for (int num : set) {
        // 只从序列起点开始计数
        if (!set.contains(num - 1)) {
            int currentNum = num;
            int currentLen = 1;
            
            while (set.contains(currentNum + 1)) {
                currentNum++;
                currentLen++;
            }
            
            maxLen = Math.max(maxLen, currentLen);
        }
    }
    
    return maxLen;
}

// 时间复杂度：O(n)
// 空间复杂度：O(n)

关键点

只从序列起点（num-1不存在）开始计数，避免重复计算，保证O(n)时间复杂度。

Answer 8

一致性哈希定义

一种特殊的哈希算法，在节点增减时，只需要重新映射少量数据，而不是全部数据。

传统哈希的问题

使用 hash(key) % N 分配节点，当N变化时，几乎所有数据都需要重新映射。

一致性哈希原理

将哈希空间组织成一个环（0 ~ 2³²-1）
将节点和数据都映射到环上
数据顺时针找到第一个节点
节点增减只影响相邻节点的数据

虚拟节点

为解决数据分布不均问题，每个物理节点对应多个虚拟节点。

Java实现

public class ConsistentHash {
    private TreeMap circle = new TreeMap<>();
    private int virtualNodes = 150;  // 每个节点的虚拟节点数
    
    public void addNode(String node) {
        for (int i = 0; i < virtualNodes; i++) {
            String virtualNode = node + "#" + i;
            int hash = hash(virtualNode);
            circle.put(hash, node);
        }
    }
    
    public void removeNode(String node) {
        for (int i = 0; i < virtualNodes; i++) {
            String virtualNode = node + "#" + i;
            int hash = hash(virtualNode);
            circle.remove(hash);
        }
    }
    
    public String getNode(String key) {
        if (circle.isEmpty()) return null;
        
        int hash = hash(key);
        // 找到第一个大于等于hash的节点
        Map.Entry entry = circle.ceilingEntry(hash);
        
        // 如果没找到，返回第一个节点（环形）
        if (entry == null) {
            entry = circle.firstEntry();
        }
        
        return entry.getValue();
    }
    
    private int hash(String key) {
        return Math.abs(key.hashCode());
    }
}

应用场景

分布式缓存：Memcached、Redis Cluster
负载均衡：Nginx、Dubbo
分布式存储：Cassandra、DynamoDB

Answer 9

布隆过滤器定义

一种空间效率极高的概率型数据结构，用于判断元素是否在集合中。

特点

可能误判：说存在可能不存在（假阳性）
不会漏判：说不存在一定不存在
不支持删除：删除可能影响其他元素
空间小：不存储元素本身

工作原理

1. 使用k个哈希函数
2. 插入时，将k个位置置为1
3. 查询时，检查k个位置是否都为1

Java实现

public class BloomFilter {
    private BitSet bitSet;
    private int size;
    private int hashFunctions;
    
    public BloomFilter(int size, int hashFunctions) {
        this.size = size;
        this.hashFunctions = hashFunctions;
        this.bitSet = new BitSet(size);
    }
    
    public void add(String element) {
        for (int i = 0; i < hashFunctions; i++) {
            int hash = hash(element, i);
            bitSet.set(hash % size);
        }
    }
    
    public boolean mightContain(String element) {
        for (int i = 0; i < hashFunctions; i++) {
            int hash = hash(element, i);
            if (!bitSet.get(hash % size)) {
                return false;
            }
        }
        return true;
    }
    
    private int hash(String element, int seed) {
        int hash = 0;
        for (char c : element.toCharArray()) {
            hash = hash * seed + c;
        }
        return Math.abs(hash);
    }
}

参数选择

位数组大小 m = -n * ln(p) / (ln(2))²
哈希函数个数 k = (m / n) * ln(2)
其中n是元素数量，p是误判率。

应用场景

缓存穿透：判断key是否存在，避免查询不存在的数据
垃圾邮件过滤：判断邮件地址是否在黑名单
爬虫去重：判断URL是否已爬取
比特币：快速查询交易是否存在

Answer 10

RandomizedSet设计

使用哈希表 + 动态数组实现。

class RandomizedSet {
    private Map map;  // 值 -> 索引
    private List list;         // 存储值
    private Random random;
    
    public RandomizedSet() {
        map = new HashMap<>();
        list = new ArrayList<>();
        random = new Random();
    }
    
    public boolean insert(int val) {
        if (map.containsKey(val)) {
            return false;
        }
        map.put(val, list.size());
        list.add(val);
        return true;
    }
    
    public boolean remove(int val) {
        if (!map.containsKey(val)) {
            return false;
        }
        
        // 将要删除的元素与最后一个元素交换
        int index = map.get(val);
        int lastVal = list.get(list.size() - 1);
        
        list.set(index, lastVal);
        map.put(lastVal, index);
        
        // 删除最后一个元素
        list.remove(list.size() - 1);
        map.remove(val);
        
        return true;
    }
    
    public int getRandom() {
        return list.get(random.nextInt(list.size()));
    }
}

// 所有操作：O(1)

关键点

删除时，将要删除的元素与最后一个元素交换，然后删除最后一个元素，保证O(1)时间复杂度。

Answer 11

LFU（Least Frequently Used）

淘汰访问频率最低的元素。频率相同时，淘汰最久未使用的。

数据结构设计

keyToVal：key → value
keyToFreq：key → 频率
freqToKeys：频率 → 该频率的key列表（LinkedHashSet保持插入顺序）
minFreq：当前最小频率

class LFUCache {
    private Map keyToVal;
    private Map keyToFreq;
    private Map> freqToKeys;
    private int capacity;
    private int minFreq;
    
    public LFUCache(int capacity) {
        this.capacity = capacity;
        keyToVal = new HashMap<>();
        keyToFreq = new HashMap<>();
        freqToKeys = new HashMap<>();
        minFreq = 0;
    }
    
    public int get(int key) {
        if (!keyToVal.containsKey(key)) {
            return -1;
        }
        increaseFreq(key);
        return keyToVal.get(key);
    }
    
    public void put(int key, int value) {
        if (capacity <= 0) return;
        
        if (keyToVal.containsKey(key)) {
            keyToVal.put(key, value);
            increaseFreq(key);
            return;
        }
        
        if (keyToVal.size() >= capacity) {
            removeMinFreqKey();
        }
        
        keyToVal.put(key, value);
        keyToFreq.put(key, 1);
        freqToKeys.computeIfAbsent(1, k -> new LinkedHashSet<>()).add(key);
        minFreq = 1;
    }
    
    private void increaseFreq(int key) {
        int freq = keyToFreq.get(key);
        keyToFreq.put(key, freq + 1);
        
        freqToKeys.get(freq).remove(key);
        if (freqToKeys.get(freq).isEmpty()) {
            freqToKeys.remove(freq);
            if (minFreq == freq) {
                minFreq++;
            }
        }
        
        freqToKeys.computeIfAbsent(freq + 1, k -> new LinkedHashSet<>()).add(key);
    }
    
    private void removeMinFreqKey() {
        LinkedHashSet keys = freqToKeys.get(minFreq);
        int deletedKey = keys.iterator().next();
        keys.remove(deletedKey);
        
        if (keys.isEmpty()) {
            freqToKeys.remove(minFreq);
        }
        
        keyToVal.remove(deletedKey);
        keyToFreq.remove(deletedKey);
    }
}

// get和put：O(1)

Answer 12

滑动窗口 + 哈希表

public List findAnagrams(String s, String p) {
    List result = new ArrayList<>();
    if (s.length() < p.length()) return result;
    
    int[] pCount = new int[26];
    int[] sCount = new int[26];
    
    // 统计p的字符频率
    for (char c : p.toCharArray()) {
        pCount[c - 'a']++;
    }
    
    // 滑动窗口
    for (int i = 0; i < s.length(); i++) {
        sCount[s.charAt(i) - 'a']++;
        
        // 窗口大小超过p，移除左边字符
        if (i >= p.length()) {
            sCount[s.charAt(i - p.length()) - 'a']--;
        }
        
        // 比较两个数组
        if (Arrays.equals(sCount, pCount)) {
            result.add(i - p.length() + 1);
        }
    }
    
    return result;
}

// 时间复杂度：O(n)
// 空间复杂度：O(1)

Answer 13

跳表实现（已在链表章节详细讲解）

跳表是一种基于有序链表的数据结构，通过多级索引加速查找。

核心操作

查找：从最高层开始，逐层向下
插入：随机决定层数，插入各层
删除：从各层删除节点

时间复杂度

查找、插入、删除：O(log n)

应用

Redis有序集合（Sorted Set）
LevelDB/RocksDB的MemTable

Answer 14

功能要求

发推文
关注/取关用户
获取最新10条推文（自己和关注的人）

设计思路

用户 → 关注列表（Set）
用户 → 推文列表（List）
推文包含：id、userId、时间戳

class Twitter {
    private static int timestamp = 0;
    private Map> followMap;  // userId → 关注列表
    private Map> tweetMap;    // userId → 推文列表
    
    class Tweet {
        int id, time, userId;
        Tweet(int id, int userId) {
            this.id = id;
            this.userId = userId;
            this.time = timestamp++;
        }
    }
    
    public Twitter() {
        followMap = new HashMap<>();
        tweetMap = new HashMap<>();
    }
    
    public void postTweet(int userId, int tweetId) {
        tweetMap.computeIfAbsent(userId, k -> new ArrayList<>())
                .add(new Tweet(tweetId, userId));
    }
    
    public List getNewsFeed(int userId) {
        PriorityQueue pq = new PriorityQueue<>((a, b) -> b.time - a.time);
        
        // 添加自己的推文
        if (tweetMap.containsKey(userId)) {
            pq.addAll(tweetMap.get(userId));
        }
        
        // 添加关注用户的推文
        Set followees = followMap.getOrDefault(userId, new HashSet<>());
        for (int followee : followees) {
            if (tweetMap.containsKey(followee)) {
                pq.addAll(tweetMap.get(followee));
            }
        }
        
        // 取最新10条
        List result = new ArrayList<>();
        for (int i = 0; i < 10 && !pq.isEmpty(); i++) {
            result.add(pq.poll().id);
        }
        
        return result;
    }
    
    public void follow(int followerId, int followeeId) {
        if (followerId == followeeId) return;
        followMap.computeIfAbsent(followerId, k -> new HashSet<>()).add(followeeId);
    }
    
    public void unfollow(int followerId, int followeeId) {
        if (followMap.containsKey(followerId)) {
            followMap.get(followerId).remove(followeeId);
        }
    }
}

Answer 15

AllOne数据结构

支持O(1)时间的inc、dec、getMaxKey、getMinKey操作。

设计思路

双向链表：按计数排序的桶
每个桶存储相同计数的所有key
哈希表：key → 所在桶的指针

class AllOne {
    class Bucket {
        int count;
        Set keys;
        Bucket prev, next;
        
        Bucket(int count) {
            this.count = count;
            this.keys = new HashSet<>();
        }
    }
    
    private Map keyToBucket;
    private Bucket head, tail;
    
    public AllOne() {
        keyToBucket = new HashMap<>();
        head = new Bucket(0);
        tail = new Bucket(0);
        head.next = tail;
        tail.prev = head;
    }
    
    public void inc(String key) {
        if (keyToBucket.containsKey(key)) {
            Bucket bucket = keyToBucket.get(key);
            bucket.keys.remove(key);
            
            Bucket nextBucket;
            if (bucket.next.count == bucket.count + 1) {
                nextBucket = bucket.next;
            } else {
                nextBucket = new Bucket(bucket.count + 1);
                insertAfter(bucket, nextBucket);
            }
            
            nextBucket.keys.add(key);
            keyToBucket.put(key, nextBucket);
            
            if (bucket.keys.isEmpty()) {
                remove(bucket);
            }
        } else {
            Bucket firstBucket;
            if (head.next.count == 1) {
                firstBucket = head.next;
            } else {
                firstBucket = new Bucket(1);
                insertAfter(head, firstBucket);
            }
            
            firstBucket.keys.add(key);
            keyToBucket.put(key, firstBucket);
        }
    }
    
    public void dec(String key) {
        if (!keyToBucket.containsKey(key)) return;
        
        Bucket bucket = keyToBucket.get(key);
        bucket.keys.remove(key);
        
        if (bucket.count == 1) {
            keyToBucket.remove(key);
        } else {
            Bucket prevBucket;
            if (bucket.prev.count == bucket.count - 1) {
                prevBucket = bucket.prev;
            } else {
                prevBucket = new Bucket(bucket.count - 1);
                insertAfter(bucket.prev, prevBucket);
            }
            
            prevBucket.keys.add(key);
            keyToBucket.put(key, prevBucket);
        }
        
        if (bucket.keys.isEmpty()) {
            remove(bucket);
        }
    }
    
    public String getMaxKey() {
        return tail.prev == head ? "" : tail.prev.keys.iterator().next();
    }
    
    public String getMinKey() {
        return head.next == tail ? "" : head.next.keys.iterator().next();
    }
    
    private void insertAfter(Bucket bucket, Bucket newBucket) {
        newBucket.prev = bucket;
        newBucket.next = bucket.next;
        bucket.next.prev = newBucket;
        bucket.next = newBucket;
    }
    
    private void remove(Bucket bucket) {
        bucket.prev.next = bucket.next;
        bucket.next.prev = bucket.prev;
    }
}

// 所有操作：O(1)

🔑 哈希表与布隆过滤器面试题

📚 哈希表核心知识

哈希表定义

哈希函数

哈希冲突解决方法

开放寻址法的探测方式

负载因子

HashMap底层结构

核心参数

put操作流程

为什么容量是2的幂？

为什么链表长度8转红黑树？

HashMap vs ConcurrentHashMap

ConcurrentHashMap实现原理

JDK 1.8 put操作

为什么不允许null？

问题描述

解法：哈希表

为什么用哈希表？

解法1：哈希表

解法2：数组（仅小写字母）

字母异位词定义

解法1：哈希表

解法2：排序

问题描述

解法：哈希集合

关键点

一致性哈希定义

传统哈希的问题

一致性哈希原理

虚拟节点

Java实现

应用场景

布隆过滤器定义

特点

工作原理

Java实现

参数选择

应用场景

RandomizedSet设计

关键点

LFU（Least Frequently Used）

数据结构设计

滑动窗口 + 哈希表

跳表实现（已在链表章节详细讲解）

核心操作

时间复杂度

应用

功能要求

设计思路

AllOne数据结构

设计思路