author: Intelligent Internet
II-Thought
We introduce II-Thought-RL-v0, our first iteration to develop a large-scale, multi-domain Reinforcement Learning (RL) dataset. By providing a high-quality, large-scale dataset on RL question-answer pairs, we aim to advance reasoning research. This foundational step will pave the way for future iterations incorporating more complex reasoning traces. In recent months, several