本文更新了南栖仙策为挑战赛提供的基线方案,对基线方案做了更详细说明,以及说明了如何评估和改善基线训练出的环境和模型策略。欢迎大家下载查阅。
1、基线方案基于Polixir Revive SDK 【 下载地址 】 2、基线方案详细文档见附件
3、基线代码,样例提交代码,以及用于指导参赛者学习一遍完整流程的jupyter notebook文件参见比赛所提供的starting_kit.zip
notebook的目录如下:
Starting Kit Guidance Requirements Data organization Step 1: Derive user states from offline data Step 2: Learn a virtual environment Evaluate the virtual environment (a) Automatic evaluation with specific metrics (b) Manual evaluation with histogram and rollout image Get the model parameters for virtual environment (a) Use the model with best metric (b) Use the model with specific trail id Step 3: Learn a fair promotion policy from virtual environment Environment and MDP Setup (a) Action space (b) Observation space (c) Reward Training of the policy Evaluation of the policy Step 4: Generate a submission bundle File structure PolicyValidation file Metadata Create a submission