字
字节笔记本
2026年5月30日
八步搭建股票预测模型,核心不是预测准不准
API中转
¥120
用机器学习预测股票价格,听起来高大上,实际上核心流程只有八步。而且大部分代码是模板化的,你只需要替换股票代码就能跑起来。
数据获取是用 yfinance 从雅虎财经拉取:
python
import yfinance as yf
import pandas as pd
def load_data(ticker):
stock = yf.Ticker(ticker)
df = stock.history(period="1y")
df['Date'] = df.index
return df特征工程这一步,技术指标是关键。布林带和 RSI 是最常用的两个:
python
def bollinger_bands(close, window=20, num_std=2):
rolling_mean = close.rolling(window=window).mean()
rolling_std = close.rolling(window=window).std()
upper_bb = rolling_mean + (rolling_std * num_std)
lower_bb = rolling_mean - (rolling_std * num_std)
return upper_bb, lower_bb
def compute_rsi(close, window=14):
delta = close.diff()
gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
rs = gain / loss
return 100 - (100 / (1 + rs))模型选择部分,一个实用的策略是不押注单一模型,而是用投票回归器组合多个模型:
python
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
import xgboost as xgb
import lightgbm as lgb
models = {
'Random Forest': RandomForestRegressor(random_state=42),
'Gradient Boosting': GradientBoostingRegressor(random_state=42),
'XGBoost': xgb.XGBRegressor(random_state=42),
'LightGBM': lgb.LGBMRegressor(random_state=42),
}
def train_and_evaluate(X, y, selected_models):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
voting = VotingRegressor(estimators=[(n, models[n]) for n in selected_models])
model = Pipeline([('scaler', scaler), ('regressor', voting)])
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
return mean_absolute_error(y_test, y_pred), ...组合模型通常比任何一个单独模型表现都好。训练完成后用 joblib 保存,下次直接加载使用:
python
import joblib
joblib.dump(model, 'stock_model.pkl')
# 加载
model = joblib.load('stock_model.pkl')
当然,股票预测不是一个"预测对了就赚钱"的游戏。市场有太多的不可控因素。但这个项目真正的价值不在于预测准确率,而在于它让你亲手搭建了一个完整的机器学习流水线:数据采集、特征工程、模型选择、集成训练、评估保存、预测部署。这套能力,换到任何其他时序预测场景都能复用。
Google Colab 在线运行:https://colab.research.google.com/drive/1-LnvPHVLzM-sIDSUvyTxQFGQe8cN_PRN
分享: