Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database&ADS system Project,homework #238

Merged
merged 5 commits into from
Jun 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added 数据库系统原理/quiz/chenganglaoshi.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
20 changes: 20 additions & 0 deletions 数据库系统原理/作业/homework/homework1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
1.7

淘宝, QQ, 微信, 学在浙大

1.8

1. 这两个系统都包含一个数据集合和一组访问该数据的程序。数据库管理系统协调对数据的物理和逻辑访问,而文件处理系统仅协调物理访问。
2. DBMS 可以让数据方便多个程序共享,而文件处理系统中一个程序编写的数据可能无法被另一个程序读取。
3. 数据库管理系统旨在允许对数据的灵活访问(即查询),而文件处理系统旨在允许对数据的预定访问(即已编译程序)。
4. 数据库管理系统可以用事务等方法控制多个用户同时访问相同数据。文件处理系统一般不允许多个程序同时访问一个数据文件。

1.15

用户名 加密后的密码

用户名 关注的用户

用户名 用户信息(性别, 生日)

批改( 1.15列了同一张表的三个属性,实际应该列三张表)
88 changes: 88 additions & 0 deletions 数据库系统原理/作业/homework/homework10 查询处理.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
15.2

Consider the bank database of Figure 15.14, where the primary keys are underlined, and the following SQL query:

```sql
select T.branch name
from branch T, branch S
where T.assets > S.assets and S.branch city = “Brooklyn”
```

Write an efficient relational-algebra expression that is equivalent to this query. Justify your choice.

Query:

![image-20220514172121480](/Users/juyilin/Library/Application Support/typora-user-images/image-20220514172121480.png)

我们要对尽可能少的数据执行 theta 连接。 先筛选出布鲁克林的 branch然后再连接,消除不需要的属性。

15.3

Let relations r1(A, B, C) and r2(C, D, E) have the following properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples of r2 fit on one block. Estimate the number of block transfers and seeks required using each of the following join strategies for r1 ⋈ r2:

r1 需要 800 个块,r2 需要 1500 个块。 设 M 页内存。 M > 800,则可以在 1500 + 800 个磁盘中直接连接访问,甚至使用普通的嵌套循环连接。 所以我们只考虑 M ≤ 800 页。最坏情况.

a. Nested-loop join. 嵌套循环连接

r1 作为外部关系,我们需要 nr * bs + br = 20000 ∗ 1500 + 800 =30,000,800 次块传输, seek = nr+ br = 20800 次.

如果 r2 是外部关系,我们需要 nr * bs + br = 45000 ∗800 + 1500 = 36,001,500 块传输。 nr +br = 46500 次seek 磁盘搜索.

b. Block nested-loop join.

内存有M块, 每次读取M-1块.

块传输如下面照片所示. 下面用[] 来表示向上取整.

如果r1 为外, 磁盘搜索次数为 2 * [ 800/(M-1) ]

如果r2 为外, 磁盘搜索次数为 2 * [ 1500/(M-1) ]

c. Merge join.

排序后, 块传输br+bs次 . 磁盘搜索 [br/bb] + [bs/bb]次

![image-20220514174011881](/Users/juyilin/Library/Application Support/typora-user-images/image-20220514174011881.png)



d. Hash join.

不需要递归的话就是

3(br + bs) +4 * nh次块传输 4nh的代价和br + bs 相比是很小的, 可以忽略.

磁盘搜索需要 2[br/bb] +[bs/bb] + 2nh 次磁盘搜索

如果需要递归, 那么需要.

2(br + bs) [logM–1(bs) – 1] + br + bs 次块传输.

磁盘搜索为2 ([br/bb] + [bs/bb])[log(m-1)(bs) -1]次

![image-20220514175540055](/Users/juyilin/Library/Application Support/typora-user-images/image-20220514175540055.png)

15.6

Consider the bank database of Figure 15.14, where the primary keys are underlined. Suppose that a B+-tree index on branch city is available on relation branch, and that no other index is available.

List different ways to handle the following selections that involve negation:

a. σ ¬(branch city<“Brooklyn”)(branch)

定位到第一行branch_city字段有Brooklyn , 然后往下找到所有

b. σ ¬(branch city=“Brooklyn”)(branch)

索引没有任何作用, 找到所有除了branch_city字段=brooklyn的 行.

c. σ ¬(branch city<“Brooklyn” ∨ assets<5000)(branch)

![image-20220514180725722](/Users/juyilin/Library/Application Support/typora-user-images/image-20220514180725722.png)

15.20

Estimate the number of block transfers and seeks required by your solution to Exercise 15.19 for r1 ⋈ r2, where r1 and r2 are as defined in Exercise 15.3.

15.19是Design a variant of the hybrid merge-join algorithm for the case where both relations are not physically sorted, but both have a sorted secondary index on the join attributes 为以下情况设计一个混合合并-连接算法的变体: 关系不是物理排序的,在连接属性上有一个排序的二级索引

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
15.6

Consider the bank database of Figure 15.14, where the primary keys are underlined. Suppose that a B+-tree index on branch city is available on relation branch, and that no other index is available. List different ways to handle the following selections that involve negation:

a. σ ¬(branch city<“Brooklyn”)(branch)

定位到第一行branch_city字段有Brooklyn , 然后往下找到所有

b. σ ¬(branch city=“Brooklyn”)(branch)

索引没有任何作用, 找到所有除了branch_city字段=brooklyn的 行.

c. σ ¬(branch city<“Brooklyn” ∨ assets<5000)(branch)

This query is equivalent to the query σ(branch city≥′Brooklyn′ ∧ assets>5000)(branch)

找到布鲁克林, 然后利用pointer 链可以找到所有branch citycity≥′Brooklyn′ 的行 , 同时判断assets >=5000的结果

16.5

Consider the relations r1(A, B, C), r2(C, D, E), and r3(E, F), with primary keys A, C, and E, respectively. Assume that r1 has 1000 tuples, r2 has 1500 tuples, and r3 has 750 tuples. Estimate the size of r1 ⋈ r2 ⋈ r3, and give an efficient strategy for computing the join

考虑关系r1(A, B, C), r2(C, D, E), 和r3(E, F), 分别具有主键A, C, 和E。假设r1有1000行,r2有1500行,r3有750行。估计r1⋈r2⋈r3的大小,并给出计算连接的有效策略。

因为有交换律, 所以先join 哪个都是可以的.

比如先r1 自然连接r2 , 最多产生1000行的关系, 因为C是r2的key.

然后自然连接r3, 因为连接的E是r3的key, 所以最后也是最多1000行.

有效策略:

在关系r2的属性C和r3的E上建立一个索引.

遍历r1, 然后根据索引从r2中寻找对应的C, 然后根据r2对应的那一行, 从r3中寻找对应的E.

16.16

Suppose that a B+-tree index on (dept name, building) is available on relation department. What would be the best way to handle the following selection?

σ(building < “Watson”) ∧ (budget < 55000) ∧ (dept name = “Music”)(department)

首先定位到第一个dept name = “Music”, 然后 按pointer链 取出下面所有 building < “Watson” 的, 对于每个tuple , 判断是否budget < 55000.

```
branch(branch name, branch city, assets)
customer (customer name, customer street, customer city)
loan (loan number, branch name, amount)
borrower (customer name, loan number)
account (account number, branch name, balance )
depositor (customer name, account number)

Figure 16.9 Banking database.
```

16.20

Explain how to use a histogram to estimate the size of a selection of the form σA≤v(r).

解释如何使用直方图来估计形式为σ A≤v (r)的选择的大小。



如果知道最小值和最大值就是 nr *(v-min)/(max-min)

那么根据直方图, 可以定位v所在的取件, 然后修改估算式子, 用区间的频度

就是nr*(v所在区间到最小区间的频率之和)/(所有区间的频率之和)
35 changes: 35 additions & 0 deletions 数据库系统原理/作业/homework/homework12 事务管理.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
17.6

Consider the precedence graph of Figure 17.16. Is the corresponding schedule conflict serializable? Explain your answer.

这是一个可串行化的schedule, 因为没有环.

可以通过拓扑排序来获得schedule, 比如T1, T2, T3,T4,T5

17.7

What is a cascadeless schedule? Why is cascadelessness of schedules desirable? Are there any circumstances under which it would be desirable to allow noncascadeless schedules? Explain your answer.

cascadeless schedule, 不允许cascading rollback,事务Tj读的时候如果Ti之前写了, 那么会阻塞直到Ti提交.

所有cascadeless schedule都是可恢复的.

好处是一个事务abort不会导致其他的回滚.

缺点是并发性差.

如果failures 很少, 那么我们应该允许 noncascadeless schedules..



17.12

List the ACID properties. Explain the usefulness of each.

A : atomic。原子性, 事务要么成功, 要么失败全部回滚, 不会有中间态.

C。 consistency , 一致性. A转账到B, A+B的总金额不变.

I。isolation。隔离性, 每个事务的操作对于其他事务应该是不可见的.

D。durable , 持久性, 事务的操作在crash之后应该是可以恢复的.
101 changes: 101 additions & 0 deletions 数据库系统原理/作业/homework/homework13 并发控制.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
18.1

Show that the two-phase locking protocol ensures conflict serializability and that transactions can be serialized according to their lock points.

假设两阶段所并不保证可序列化,那么就存在T0,T1::Tn-1这样遵循 2PL 而且非可序列化的时间表。

一个非可序列化的时间表意味着在前序图中有环,而我们将证明2PL 不会产生环。

在不丧失一般性的前提下,假设以下的环存在于前序图中:T0T1T2...Tn-1 T0.让我们来看看Ti 获得最后一个锁的时间ai(即Ti的lock point)

a0 < a1 < a2 < ... < an-1<a0

所以a0<a0矛盾, 所以不存在环。

在这个过程中,Ti Tj,i<j, lock point顺序也就是拓扑排序 顺序,而lock point顺序是序列化的。所以事务可以按lock point串行化。

18.2

Consider the following two transactions:

```
T34: read(A);
read(B);
if A = 0 then B := B + 1;
write(B).
T35: read(B);
read(A);
if B = 0 then A := A + 1;
write(A).
```

Add lock and unlock instructions to transactions T31 and T32 so that they observe the two-phase locking protocol. Can the execution of these transactions result in a deadlock?

```
T34
lock-S(A)
read(A)
lock-X(B)
read(B)
if A = 0
then B:= B+1
write(B)
unlock(A)
unlock(B)
T35
lock-S(B)
read(B)
lock-X(A)
read(A)
if A = 0
then B:= B+1
write(A)
unlock(B)
unlock(A)
```

可能会死锁,

| T31 | T32 |
| --------- | --------- |
| lock-S(A) | |
| | lock-S(B) |
| | read(B) |
| read(A) | |
| lock-X(B) | |
| | lock-X(A) |

如上面的顺序操作就会死锁。

18.7

Consider a database system that includes an atomic increment operation, in addition to the read and write operations. Let V be the value of data item X. The operation increment(X) by C

sets the value of X to V + C in an atomic step. The value of X is not available to the transaction unless the latter executes a read(X). Assume that increment operations lock the item in increment mode using the compatibility matrix in Figure 18.25.

a. Show that, if all transactions lock the data that they access in the corresponding mode, then two-phase locking ensures serializability.

b. Show that the inclusion of increment mode locks allows for increased concurrency

a.可序列化通过观察两个事务有一个I mode锁on the same item来显示,其increment操作是可以交换的,就像读操作一样。然而,任何冲突操作必须按照对应事务的lock点的顺序序列化。

b。

increment lock mode与它本身允许多个扩增机构同时进行扩增兼容,从而改进了协议的并发性。 没有这种模式,一个想要增加数据值的事务必须对这个数据项采取exclusive mode。 exclusive lock 是不兼容的, 会增加锁的等待时间, 阻碍整个并发过程。

总的来说, 增加true entries 在兼容性矩阵中可以增加并发性和吞吐量。

18.18

Most implementations of database systems use strict two-phase locking. Suggest three reasons for the popularity of this protocol.

严格两阶段锁(strict 2PL):transaction直到执行结束(commit/abort)后才统一释放所有的锁。

1. 这种算法保证了strictness,避免了一个transaction abort就导致其他transaction产生cascading abort。 rollback少。
2. 比较容易实现
3. 并发度不算太小。





Loading