sqlite3 统计一千万条信息表数据重复次数

Cool_Breeze 发表于 2021-4-23 19:22

本帖最后由 Cool_Breeze 于 2021-4-23 20:01 编辑

数据库表视图
number name local yuwen shuxue english
1    N4358    三年二班    61    2    65
2    D7005    三年一班    60    35    7
3    F9246    三年五班    36    26    6
4    N8857    三年三班    79    35    40
5    F4186    三年四班    50    95    71
6    J8401    三年二班    6    32    6
7    A7756    三年二班    6    57    6
8    D2809    三年二班    10    45    4
9    C0035    三年三班    25    37    55
10    I5499    三年三班    40    82    94

统计结果：
name,次数
A0000,43
A0001,39
A0002,41
A0003,35
A0004,28
A0005,31
A0006,34
A0007,36
A0008,28
A0009,46
统计每个 name 出现的次数
代码：
begin = time.monotonic()
cur.execute("select name, count(name) from student groupby name")
# with open("res.csv", "w", newline="") as f:
# writer = csv.writer(f)
# for n in cur.fetchall():
   # writer.writerow(n)
print(f"耗时：{time.monotonic() - begin} 秒", )

耗时：6.037000000011176 秒

建索引后耗时： 0.0秒但是数据库由原来的 324MB 扩大到 459MB
建索引代码：cur.execute("create index name_index on student (name)")

刚学习数据库，这个速度算快吗（Intel(R) Celeron(R) CPU G1840 @ 2.80GHz）！或者还有更快的统计方法？

测试代码：
import sqlite3
import datetime
import random
import csv
import time

con = sqlite3.connect("big_table.db")
cur = con.cursor()

# 建表
# cur.execute("""create table student(
# number integer primary key autoincrement,
# name char(5) not null,
# local char(5),
# yuwen float,
# shuxue float,
# english float)""")

#324

x = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
x_n = "0123456789"
local = ["三年一班","三年二班","三年三班","三年四班","三年五班"]

def name():
temp = ""
temp += x
for i in range(4):
   temp += x_n
return temp

# 插入数据
# for n in range(10000000):
# cur.execute("insert into student(name, local, yuwen, shuxue, english) values(?,?,?,?,?)",
   # (name(),local,random.randint(0,100),random.randint(0,100),random.randint(0,100)))

# 建索引
# cur.execute("create index name_index on student (name)")

begin = time.monotonic()
cur.execute("select name, count(name) from student groupby name")
# for n in cur.fetchall():
# print(n)
# with open("res.csv", "w", newline="") as f:
# writer = csv.writer(f)
# for n in cur.fetchall():
   # writer.writerow(n)
print(f"耗时：{time.monotonic() - begin} 秒", )
# 提交
con.commit()
con.close()

Clarksh 发表于 2021-4-23 19:38

已经很快了, 有索引吧.

hate 发表于 2021-4-23 19:39

建索引了吗

Cool_Breeze 发表于 2021-4-23 19:54

hate 发表于 2021-4-23 19:39
建索引了吗

卧槽，建了索引耗时：0.0 秒
太快了吧！
但是数据库由原来的 324MB 扩大到 459MB

Cool_Breeze 发表于 2021-4-23 19:55

Clarksh 发表于 2021-4-23 19:38
已经很快了, 有索引吧.

没有建索引，建了索引只需要 0.0 秒太快了。牛逼啊！

RoyPenn 发表于 2021-4-23 19:56

这个速度可以了，

Cool_Breeze 发表于 2021-4-23 20:03

RoyPenn 发表于 2021-4-23 19:56
这个速度可以了，

和其它数据库差不多吗？没有学过其它数据库！

RoyPenn 发表于 2021-4-23 20:04

Cool_Breeze 发表于 2021-4-23 20:03
和其它数据库差不多吗？没有学过其它数据库！

这个跟数据结构也有关系，不太好比较，相对而言，这个速度很快了

Cool_Breeze 发表于 2021-4-23 20:05

RoyPenn 发表于 2021-4-23 20:04
这个跟数据结构也有关系，不太好比较，相对而言，这个速度很快了

好的。感谢解答！

richens 发表于 2021-4-23 21:17

学习了，谢谢！

页: [1] 2

吾爱破解 - 52pojie.cn's Archiver

sqlite3 统计一千万条信息表 数据重复次数

sqlite3 统计一千万条信息表数据重复次数