pandas时间日期数据处理
本帖最后由 天空宫阙 于 2024-5-14 20:04 编辑# pandas时间日期数据处理
### 将字符串转变为datetime类型`pd.to_datetime`
> 该函数的方法既可以是字符串,也可以是列表,也可以是series
>
```python
pd.to_datetime('2018-10-26 12:00 -0500')
pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
df['WorkingDate'] = pd.to_datetime(df['WorkingDate'])
```
### 按指定要求生成时间series
```python
date_series = pd.date_range(start='2024-5-14 8:20',end='2024-5-14 19:20',freq='10min')
```
>freq别名见文末
### 按时间筛选
1. 对于`DatetimeIndex`可以直接使用loc索引
2. 非时间类型的索引(非`DatetimeIndex`)可以使用`between`筛选时间
```
import pandas as pd
import numpy as np
date_series = pd.date_range(start='2024-5-14 8:20',end='2024-5-14 19:20',freq='10min')
df = pd.DataFrame(np.ones((67,2)),
index=date_series, columns=['A', 'B'])
# 时间类型的index
df_result = df.loc['2024-5-14 8:20':'2024-5-14 9:20']
print(df_result)
df2 = df.reset_index() # df2 非时间类型的index
filter1 = df2['index'].between('2024-5-14 8:20','2024-5-14 9:20')
filter1_df = df2.loc]
print(filter1_df)
```
### 数据按时间降采样 `resample`
> 当数据采样过于密集,统计需要按小时,按天,按月等聚合时可以使用resample
```python
import pandas as pd
import numpy as np
date_series = pd.date_range(start='2024-5-14 8:20',end='2024-5-14 19:20',freq='10min')
df = pd.DataFrame(np.ones((67,2)),
index=date_series, columns=['A', 'B'])
#降采样
result1 = df['A'].resample('H').sum() # 按小时降采样
result2 = df['A'].resample('H').count()
print(result1)
print(result2)
```
### 日期格式转字符串
```
df['time'].apply(lambda x:x.strftime('%Y-%m-%d'))
```
常见freq
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
```
Alias Description
B business day frequency
C custom business day frequency
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter end frequency
QS quarter start frequency
BQS business quarter start frequency
A, Y year end frequency
BA, BY business year end frequency
AS, YS year start frequency
BAS, BYS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds
``` 稳定时间日期数据处理 虽然但是,这个放GPT不是能直接生成么 玩机小白丶王 发表于 2024-5-14 23:13
虽然但是,这个放GPT不是能直接生成么
有时不知道相应的功能,不能说出准确的提示词,自然无法生成稳定的数据处理方法 感觉不错收藏一波 为什么python 3.10用不了pandas? 数据集有日期和数据两个字段,均已经转为dtype: float64,但列表格式却为int
so有啥办法给int转换数据类型,不做就没法做下一步的做时间序列预测 感觉不错顶一波
页:
[1]