Join us

Call for Participation

In recent years, machine learning has made great progress in several fundemental tasks such as image recognition and speech recognition, but there are still many problems to be explored in the field of video content understanding.

Short video APPs have received widely acclaim from users around the world. In the meantime, it has also always been an ideal objective for artificial intelligence system of our company to better understand video content and recommend what you like.

Here, we invite researchers from academia and industry to participate in this competition.

Challenge Description

This challenge provides multi-modal video features, including visual features, text features and audio features, as well as user interactive behavior data, such as click, like, and follow. Each participant needs to model the user's interest through a video and user interaction behavior data set, and then predict the user's click behavior on another video dataset.

The rank of our challenge accords to the model and predicted results submitted by the participants, based on a predefined score specified in the evaluation criteria.

Dataset/APIs/Library URL

  • Train Dataset:
    • We collect an incremental dataset, named Byte-Recommend100M, consisting of tens of thousands of different users and 100 Millions of different videos. It will be accessible to the participants soon.
    • Multi-modal features in Byte-Recommend100M include face features, video content features, title features and BGM features, which are all in form of embedding vector. Participants are able to combine them for better recommendation.

  • Test Dataset:
    • Same distribution with Train Dataset.
    • Same users with Train Dataset.

     The interaction data between user and video, the specific meaning of each field is as

Field name Field description Data type Remarks
uid User id int Desensitized
user_city User's city int Desensitized
item_id Video id int Desensitized
author_id Author id int Desensitized
item_city Video city int Desensitized
channel Channel of the video int Desensitized
finish Whether finish watching the video bool
like Whether like the video bool
music_id Music id int Desensitized
device Device id int Desensitized
time Video release time int Desensitized
duration_time Video duration int Unit: seconds

Evaluation Criteria

Predict the click (finish+like) through probability on each item of test dataset. We use AUC (area under ROC curve) as our challenge metric.

In our competition, one must output a score for the probability of clicking (finish+like) each item. Then AUC is calculated as follows:

$$\int_{- \infty}^{+ \infty} {TPR (T)FPR^{'}(T)dT} $$

where T is a varying threshold parameter which is used to calculate the TPR and FPR, given by $TPR(T)=\frac{True\ Positive}{Condition\ Positive}$ and $FPR(T)=\frac{False\ Positive}{Condition\ negative}$ . The leaderboard is ranked according to the AUC. The higher the AUC, the higher the ranking.

Deadline of Submission

Submission Guidelines

  • Participants should send the click probability of each item in the test dataset as their result, the format of each result must be user_id \t video_id \t probability.

  • Results can be submitted through this website (the leaderboard will be opened in Jan 2019) or via email to

  • Participants should send result obtained on Validation set and Test set by their trained models.

  • The participant will also be requested to provide a brief write-up describing the algorithm used via email.


If you have any questions or requests, or need further clarifications, please contact the organizers.


Third slide

Contact info: