Quick Guide to NUS-BSN
Social media platforms, such as Twitter, Facebook, and Sina Weibo, have become essential real-time information resources with a wide range of users and applications. The rapidly increasing amount of live information in social media platforms requires the development of effective techniques for data harvesting and media content analysis. Consumers normally provide positive or negative comments when they post brand related information in microblog platforms, and such comments may spread quickly and widely across the entire social network. Such knowledge and insights have important marketing values for enterprises which need to know about brand exposure and acceptance by users. Even for individual users, such insights are extremely useful to help them make purchase decisions on brands and products.
It is noted that there is no large scale public dataset on brand-related social media data. In order to address this emerging research area, we plan to release the NUS microblog dataset on brand information (Brand-Social-Net dataset) with several well defined tasks. The objective of this dataset is to provide a public research and evaluation platform on live social media analysis.
Yue Gao, Fanglin Wang, Huanbo Luan, and Tat-Seng Chua. "Brand Data Gathering From Live Social Media Streams", ACM International Conference on Multimedia Retrieval. UK. April 1-4, 2014. [pdf] [Bibitex entry]
The researchers interested in the dataset should download and fill up the Agreement and Disclaimer Form and send it back to us. We will then email you the instructions to download the dataset at our discretion.
Logo annotation. For each image, the logo region is enclosed by a bounding box to identify the accurate logo position.
Brand relevance annotation. For each microblog, the relevance of the text content and the image content (if available) for one brand is annotated separately as 1 and 0.
1. The text is annoted as 1 if the text content is relevant to the brand; otherwise 0.
2. The image is annotated as 1 if the image content is relevant to the brand; otherwise 0.
3. The microblog is annotated as 1 if either the text content or the image content is related to the brand; otherwise 0.
Product relevance annotation. For each microblog, the relevance of the text content and the image content (if there is any) for one product is annotated separately as 1 and 0
1. The text is annoted as 1 if the text content is relevant to the product; otherwise 0.
2. The image is annotated as 1 if the image content is relevant to the product; otherwise 0.
3. The microblog is annotated as 1 if either the text content or the image content is related to the product; otherwise 0.
Object annotation. If there are relevant objects for a given brand or product, the bounding boxes of these objects are labeled.
Figure 1 (a) and (b) show the data statistics of microblogs and corresponding images over time and the distribution of posted microblogs for users.
Figure 1 The statistics of (a) microblogs and corresponding images over time on the whole dataset, (b) microblogs for users
The distributions of microblogs and corresponding logos are presented in Figures 2 and 3.
Figure 2 The number of relevant microblogs for each brand.
Figure 3 The number of logos for each brand.
There are 20 brand/product-related events in this dataset. These events happened in June and July of 2012, and they are listed in Table 1.
|The Apple Worldwide Developers Conference (WWDC)|
|Window 8 Release|
|Windows Office 2013 Release|
|Nokia Lumia Release|
|Pepsi Limited Edition Michael Jackson Cans|
|Sumsang Galaxy 3 I9300 Release|
|HTC ONE Release|
|Dior Addict Release|
|Hyundai MD Avante Release|
|Ferrari Berlinetta Release|
|Chrysler 300C Release|
|Honda CR-Z Release|
|Honda Elysion Release|
|Mazda CX-5 Release|
|Audi Q3 Release|
|Highlander 2012 Release|
|Shenzhen, Hongkong, Macao Auto Expo|
|Chongqing Auto Expro|
|Changchun Auto Expro|
The challenging tasks that can be performed on this dataset include, but not limited to, the following:
- Logo/Product/Brand detection and search task. This dataset includes 100 logos and 300 products, with groundtruth on positions of logos/products and relevant objects. This task can be done using text, visual, social and combination of all features.
- Brand/Product data gathering task. One key challenge in social media platforms is how to gather representative set of data related to a brand or product.
- Social event analysis task. Over 20 brand-related events are defined for event detection and tracking research.
- Social media related research: This dataset contains social information to support research on sentiment analysis, social network analysis, key users and hot tweets/events analysis etc.