Algorithms, Part I 8-ElementarySymbolTables

8-ElementarySymbolTables

符号表

以 Key-value pair 的形式存储数据

  • 插入 Key-value
  • 给定 Key 搜索对应的 value

API

  • get() 如果键不存在则返回 null
  • put() 方法会用新值覆盖旧值
  • delete() 只需要将值设为 null 就行

image-20220415164541242

Key 特性

  • Comparable,需要实现compareTo()
  • key 是泛型,实现 equals()来比较不同的 key 是否等价

equals()方法的实现

除了要考虑等价关系:自反、对称、传递以外还要求键和值是非空的

例如,实现日期Date类的equals()方法:

image-20220415164621753

通用的设计法则:

image-20220415164639055

符号表的基本实现

无序链表

  • 查找需要遍历链表
  • 插入时先遍历,如果没有键则在开头加上这个键
  • 毫无疑问这样的实现效率很低

有序数组

  • rank()通过二分查找找到给定 key 按顺序排第几(从0开始)
  • 插入,需要移动所有较大的键,效率较低
  • image-20220415164750277

有序符号表 API 扩展

image-20220415164857141

复杂度:

image-20220415164909957

  • 可以看到插入删除的复杂度仍然不够低

二叉搜索树 BST

  • 定义:一棵二叉查找树(BST)是一棵二叉树,其中每个结点都含有一个 Comparable 的键(以
    及相关联的值)且每个结点的键都大于其左子树中的任意结点的键而小于右子树的任意结点
    的键
  • 和 MaxPQ / MinPQ 的区别:最大最小堆只要求某个节点大于或小于它的两个子结点。

搜索

    public Value get(Key key) {
        return get(root, key);
    }

    private Value get(Node x, Key key) {
        if (key == null) throw new IllegalArgumentException("calls get() with a null key");
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) return get(x.left, key);
        else if (cmp > 0) return get(x.right, key);
        else              return x.val;
    }

插入

通过递归实现,注意根结点为空的特殊情况

    public void put(Key key, Value val) {
        root = put(root, key, val);
    }

    private Node put(Node x, Key key, Value val) {
        if (x == null) return new Node(key, val, 1);
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left  = put(x.left,  key, val);
        else if (cmp > 0) x.right = put(x.right, key, val);
        else              x.val   = val;
        x.size = 1 + size(x.left) + size(x.right);
        return x;
    }

BST 不平衡的问题

二叉搜索树不能保证树的平衡,某些极端情况下效率等同于链表

image-20220923145900696

复杂度

  • 命题 :在由 N 个随机键构造的二叉查找树中,查找命中平均所需的比较次数为∼ 2lnN2lnN(约
    1.39lgN1.39lgN),树的高度为 ~ 4.311lnN4.311lnN
  • 证明。由于二叉搜索树和快速排序中的分组有对应关系,所以复杂度也相同(指每次分组的复杂度)。PS. 树的高度最坏情况下为 NN

BST 实现细节

最小值和最大值

最左结点和最右结点

image.png

向上向下取整 Floor and ceiling

Floor

image.png

代码:

    /**
     * Returns the largest key in the symbol table less than or equal to {@code key}.
     *
     * @param  key the key
     * @return the largest key in the symbol table less than or equal to {@code key}
     * @throws NoSuchElementException if there is no such key
     * @throws IllegalArgumentException if {@code key} is {@code null}
     */
    public Key floor(Key key) {
        if (key == null) throw new IllegalArgumentException("argument to floor() is null");
        if (isEmpty()) throw new NoSuchElementException("calls floor() with empty symbol table");
        Node x = floor(root, key);
        if (x == null) throw new NoSuchElementException("argument to floor() is too small");
        else return x.key;
    } 

    private Node floor(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp <  0) return floor(x.left, key);
        Node t = floor(x.right, key); 
        if (t != null) return t;
        else return x; 
    } 

Ceiling

floor()类似

    /**
     * Returns the smallest key in the symbol table greater than or equal to {@code key}.
     *
     * @param  key the key
     * @return the smallest key in the symbol table greater than or equal to {@code key}
     * @throws NoSuchElementException if there is no such key
     * @throws IllegalArgumentException if {@code key} is {@code null}
     */
    public Key ceiling(Key key) {
        if (key == null) throw new IllegalArgumentException("argument to ceiling() is null");
        if (isEmpty()) throw new NoSuchElementException("calls ceiling() with empty symbol table");
        Node x = ceiling(root, key);
        if (x == null) throw new NoSuchElementException("argument to floor() is too large");
        else return x.key;
    }

    private Node ceiling(Node x, Key key) {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp < 0) { 
            Node t = ceiling(x.left, key); 
            if (t != null) return t;
            else return x; 
        } 
        return ceiling(x.right, key); 
    } 

子树结点计数 Count

便于实现 rank ,增加一个变量用于记录结点及其子树的结点总个数

put()中进行修改即可

image.png

秩 Rank

删除最小值和最大值

删除最小值的步骤:

  • 找到最左的左结点为null的结点
  • 将这个结点替换为它的右子结点
  • 更新子树的count
    /**
     * Removes the smallest key and associated value from the symbol table.
     *
     * @throws NoSuchElementException if the symbol table is empty
     */
    public void deleteMin() {
        if (isEmpty()) throw new NoSuchElementException("Symbol table underflow");
        root = deleteMin(root);
        assert check();
    }

    private Node deleteMin(Node x) {
        if (x.left == null) return x.right;
        x.left = deleteMin(x.left);
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }

删除最大值的步骤同理。

删除某个特定的值 Hibberd Deletion

步骤:

  • 如果没有子结点,直接删除
  • 如果有一个子结点,这个结点删除,替换成它的唯一子树
  • 如果有两个子结点,删除左子树的最大值或者右子树的最小值
    /**
     * Removes the specified key and its associated value from this symbol table   
     * (if the key is in this symbol table).  
     *
     * @param  key the key
     * @throws IllegalArgumentException if {@code key} is {@code null}
     */
    public void delete(Key key) {
        if (key == null) throw new IllegalArgumentException("calls delete() with a null key");
        root = delete(root, key);
        assert check();
    }

    private Node delete(Node x, Key key) {
        if (x == null) return null;

        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left  = delete(x.left,  key);
        else if (cmp > 0) x.right = delete(x.right, key);
        else { 
            if (x.right == null) return x.left;
            if (x.left  == null) return x.right;
            Node t = x;
            x = min(t.right);
            x.right = deleteMin(t.right);
            x.left = t.left;
        } 
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    } 

问题

这样的操作最终会导致二叉树不对称,删除的复杂度将上升至 N\sqrt{N}

改进方案:下一节将要讲的平衡二叉树

复杂度总结

img