在当今的软件开发世界中,Rust语言因其内存安全、高性能和并发能力而备受关注。同时,决策树作为最基础且直观的机器学习算法之一,非常适合初学者理解模型如何做“决策”。本文将带你从零开始,使用Rust语言实现一个简单的决策树分类器,即使你是编程小白,也能轻松上手!
决策树是一种树形结构模型,它通过一系列“是/否”问题对数据进行分割,最终到达叶节点并给出预测结果。例如,判断一个人是否会购买某商品,可能基于年龄、收入、是否已婚等特征。

Rust不仅安全高效,还拥有日益完善的生态系统。通过用Rust实现决策树算法,你可以深入理解算法原理,同时掌握Rust的基本语法、结构体、枚举和递归等核心概念。这也是学习Rust编程入门的绝佳项目!
首先,确保你已安装Rust。打开终端,运行:
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh然后创建新项目:
$ cargo new rust_decision_treecd rust_decision_tree我们先定义几个关键结构:
Sample:表示一条训练数据,包含特征向量和标签。Split:表示一次特征分割(如“年龄 > 30”)。Node:决策树的节点,可以是内部节点或叶节点。在 src/main.rs 中添加以下代码:
#[derive(Debug, Clone)]pub struct Sample { pub features: Vec, pub label: String,}#[derive(Debug, Clone)]pub struct Split { pub feature_index: usize, pub threshold: f64,}#[derive(Debug)]pub enum Node { Leaf { prediction: String }, Internal { split: Split, left: Box<Node>, right: Box<Node>, },} 决策树通常使用信息熵来衡量数据的“混乱度”。熵越低,数据越纯净。我们实现一个函数来计算一组样本的熵:
use std::collections::HashMap;fn entropy(samples: &[Sample]) -> f64 { if samples.is_empty() { return 0.0; } let total = samples.len() as f64; let mut label_counts = HashMap::new(); for sample in samples { *label_counts.entry(&sample.label).or_insert(0) += 1; } let mut entropy = 0.0; for &count in label_counts.values() { let p = count as f64 / total; entropy -= p * p.log2(); } entropy}我们需要遍历所有特征和可能的阈值,找到使信息增益最大的分割:
fn find_best_split(samples: &[Sample]) -> Option<(Split, f64)> { if samples.is_empty() { return None; } let n_features = samples[0].features.len(); let mut best_gain = 0.0; let mut best_split = None; for feature_index in 0..n_features { // 获取该特征的所有唯一值并排序 let mut thresholds: Vec<f64> = samples.iter() .map(|s| s.features[feature_index]) .collect(); thresholds.sort_by(|a, b| a.partial_cmp(b).unwrap()); thresholds.dedup(); // 尝试每个可能的阈值(取相邻值的中点) for i in 0..thresholds.len() - 1 { let threshold = (thresholds[i] + thresholds[i + 1]) / 2.0; let split = Split { feature_index, threshold }; let (left, right) = split_samples(samples, &split); let gain = information_gain(samples, &left, &right); if gain > best_gain { best_gain = gain; best_split = Some((split, gain)); } } } best_split}fn split_samples(samples: &[Sample], split: &Split) -> (Vec<Sample>, Vec<Sample>) { let mut left = Vec::new(); let mut right = Vec::new(); for sample in samples { if sample.features[split.feature_index] <= split.threshold { left.push(sample.clone()); } else { right.push(sample.clone()); } } (left, right)}fn information_gain(parent: &[Sample], left: &[Sample], right: &[Sample]) -> f64 { let p_entropy = entropy(parent); let left_weight = left.len() as f64 / parent.len() as f64; let right_weight = right.len() as f64 / parent.len() as f64; let weighted_entropy = left_weight * entropy(left) + right_weight * entropy(right); p_entropy - weighted_entropy}现在我们可以递归地构建整棵树了:
fn build_tree(samples: &[Sample]) -> Node { // 停止条件:如果所有样本属于同一类,返回叶节点 let labels: std::collections::HashSet<_> = samples.iter().map(|s| &s.label).collect(); if labels.len() == 1 { return Node::Leaf { prediction: samples[0].label.clone(), }; } // 如果无法再分割,返回多数类 if let Some((best_split, _)) = find_best_split(samples) { let (left_samples, right_samples) = split_samples(samples, &best_split); let left_child = build_tree(&left_samples); let right_child = build_tree(&right_samples); Node::Internal { split: best_split, left: Box::new(left_child), right: Box::new(right_child), } } else { // 返回出现次数最多的标签 let mut label_counts = HashMap::new(); for sample in samples { *label_counts.entry(sample.label.clone()).or_insert(0) += 1; } let prediction = label_counts .into_iter() .max_by_key(|&(_, count)| count) .map(|(label, _)| label) .unwrap(); Node::Leaf { prediction } }}最后,我们实现预测函数:
fn predict(node: &Node, features: &[f64]) -> String { match node { Node::Leaf { prediction } => prediction.clone(), Node::Internal { split, left, right } => { if features[split.feature_index] <= split.threshold { predict(left, features) } else { predict(right, features) } } }}在 main 函数中,我们用一个简单数据集测试模型:
fn main() { // 示例数据:[年龄, 收入], 标签:"买" 或 "不买" let training_data = vec![ Sample { features: vec![25.0, 3000.0], label: "不买".to_string() }, Sample { features: vec![35.0, 8000.0], label: "买".to_string() }, Sample { features: vec![45.0, 7000.0], label: "买".to_string() }, Sample { features: vec![20.0, 2000.0], label: "不买".to_string() }, ]; let tree = build_tree(&training_data); println!("构建的决策树: {:?}", tree); // 预测新样本 let new_sample = vec![40.0, 6000.0]; let result = predict(&tree, &new_sample); println!("预测结果: {}", result); // 应输出 "买"}恭喜你!你已经用Rust语言从零实现了一个完整的决策树算法。这个项目不仅帮助你理解了机器学习的基本原理,也锻炼了你在Rust中处理数据结构、递归和模式匹配的能力。虽然这个实现较为简化(未处理缺失值、连续/离散特征混合等),但它为你打下了坚实的基础。
如果你对Rust机器学习教程感兴趣,可以继续探索更复杂的算法,如随机森林或梯度提升树。同时,这也是一个优秀的Rust编程入门实践项目!
希望这篇教程对你有帮助。动手试试吧,代码是最好的老师!
本文由主机测评网于2025-12-06发表在主机测评网_免费VPS_免费云服务器_免费独立服务器,如有疑问,请联系我们。
本文链接:https://www.vpshk.cn/2025123656.html