Introduction
DL学习路线
Case Study
- 经典网络
- 残差网络
- 1*1卷积
- Inception网络
- Seq2Seq
- 注意力模型
- Memory Netowrk
- 指针网络
- 循环网络
- RNN的各种应用
- Highway Network
CV
- 目标定位
- Landmark Detection
- 目标检测
- 风格迁移
- 人脸验证与人脸识别
  - Siamese Network
  - 三元组损失函数
- Domain-Adversarial Training
- 人体姿态估计HPE
  - DeepPose
- 实践
  - 读入VGG16预训练模型
  - 基于预训练模型的图像重构
  - 基于预训练模型的风格重构
NLP
- 语言模型
- 词汇表征
  - One-hot Embedding
  - Word Embedding
  - 类比推理
  - 用语言模型学习Embedding Matrix
  - Word2Vec算法
  - skip-grams算法
  - 负采样算法
  - 词向量算法
  - 词嵌入除偏
  - Embedding
- 情绪分类
- NMT
语音
通用问题
生成对抗网络 GAN
增强学习 RL
- 增强学习
- Policy Based
- Value Based
- Q-Learning Vs Policy Based
- Q-Learning结合Policy Based
- 稀疏奖励
- 模仿学习
其它
- 异常侦测
  - case 1：labelled data
  - case 3：polluted unlabelled data
- 对抗模型 attack ML models
  - 攻击
  - 防御
- 模型的可解释性 Explainable ML
- Life Long Learning
- Meta Learning
Published with GitBook

Pathwise Derivative Policy Gradient

$\pi$ 采取一个action后，action不止告诉它action好不好，还会提供比较好的action。

后面的没听懂。

results matching ""

No results matching ""