Brand-Social-Net


A Brand-Social-Net Dataset from National University of Singapore

Quick Guide to NUS-BSN

Social media platforms, such as Twitter, Facebook, and Sina Weibo, have become essential real-time information resources with a wide range of users and applications. The rapidly increasing amount of live information in social media platforms requires the development of effective techniques for data harvesting and media content analysis. Consumers normally provide positive or negative comments when they post brand related information in microblog platforms, and such comments may spread quickly and widely across the entire social network. Such knowledge and insights have important marketing values for enterprises which need to know about brand exposure and acceptance by users. Even for individual users, such insights are extremely useful to help them make purchase decisions on brands and products.

It is noted that there is no large scale public dataset on brand-related social media data. In order to address this emerging research area, we plan to release the NUS microblog dataset on brand information (Brand-Social-Net dataset) with several well defined tasks. The objective of this dataset is to provide a public research and evaluation platform on live social media analysis.

Brand-Social-Net Citation:

Yue Gao, Fanglin Wang, Huanbo Luan, and Tat-Seng Chua. "Brand Data Gathering From Live Social Media Streams", ACM International Conference on Multimedia Retrieval. UK. April 1-4, 2014. [pdf] [Bibitex entry]

Contact

GAO, Yue: kevin.gaoy@gmail.com

WANG, Fanglin: hardegg@gmail.com

Downloads

The researchers interested in the dataset should download and fill up the Agreement and Disclaimer Form and send it back to us. We will then email you the instructions to download the dataset at our discretion.

Groundtruth Annotation

Logo annotation. For each image, the logo region is enclosed by a bounding box to identify the accurate logo position.

Brand relevance annotation. For each microblog, the relevance of the text content and the image content (if available) for one brand is annotated separately as 1 and 0.

1. The text is annoted as 1 if the text content is relevant to the brand; otherwise 0.
2. The image is annotated as 1 if the image content is relevant to the brand; otherwise 0.
3. The microblog is annotated as 1 if either the text content or the image content is related to the brand; otherwise 0.

Product relevance annotation. For each microblog, the relevance of the text content and the image content (if there is any) for one product is annotated separately as 1 and 0

1. The text is annoted as 1 if the text content is relevant to the product; otherwise 0.
2. The image is annotated as 1 if the image content is relevant to the product; otherwise 0.
3. The microblog is annotated as 1 if either the text content or the image content is related to the product; otherwise 0.

Object annotation. If there are relevant objects for a given brand or product, the bounding boxes of these objects are labeled.

Data Statistics

Figure 1 (a) and (b) show the data statistics of microblogs and corresponding images over time and the distribution of posted microblogs for users.


Figure 1 The statistics of (a) microblogs and corresponding images over time on the whole dataset, (b) microblogs for users

The distributions of microblogs and corresponding logos are presented in Figures 2 and 3.


Figure 2 The number of relevant microblogs for each brand.
Figure 3 The number of logos for each brand.

Event Data

There are 20 brand/product-related events in this dataset. These events happened in June and July of 2012, and they are listed in Table 1.

Table 1: Events in the Brand-Social-Net dataset
The Apple Worldwide Developers Conference (WWDC)
Window 8 Release
Windows Office 2013 Release
Nokia Lumia Release
Pepsi Limited Edition Michael Jackson Cans
Sumsang Galaxy 3 I9300 Release
HTC ONE Release
Dior Addict Release
Hyundai MD Avante Release
Ferrari Berlinetta Release
Chrysler 300C Release
Honda CR-Z Release
Honda Elysion Release
Mazda CX-5 Release
Audi Q3 Release
Highlander 2012 Release
Shenzhen, Hongkong, Macao Auto Expo
Chongqing Auto Expro
Changchun Auto Expro

Challenging Tasks

The challenging tasks that can be performed on this dataset include, but not limited to, the following:

  • Logo/Product/Brand detection and search task. This dataset includes 100 logos and 300 products, with groundtruth on positions of logos/products and relevant objects. This task can be done using text, visual, social and combination of all features.
  • Brand/Product data gathering task. One key challenge in social media platforms is how to gather representative set of data related to a brand or product.
  • Social event analysis task. Over 20 brand-related events are defined for event detection and tracking research.
  • Social media related research: This dataset contains social information to support research on sentiment analysis, social network analysis, key users and hot tweets/events analysis etc.