Text fields

Bilby Quant Data provides comprehensive text content for each document in both the original language (typically Chinese) and English translation. This allows you to work with either version depending on your analytical needs.

Document titles

title

  • Type: String
  • Description: The title of the document in its original language.
  • Nullable: Yes

title_en

  • Type: String
  • Description: The title of the document translated into English.
  • Nullable: Yes

Subheadings

subhead

  • Type: String
  • Description: The subheading of the document in its original language.
  • Nullable: Yes
  • Note: Not all documents have subheadings.

subhead_en

  • Type: String
  • Description: The subheading of the document translated into English.
  • Nullable: Yes

Body text

body

  • Type: String
  • Description: The complete body text of the document in its original language.
  • Nullable: Yes
  • Note: This is the full text content of the document, which can be substantial for longer policy documents.

body_en

  • Type: String
  • Description: The complete body text of the document translated into English.
  • Nullable: Yes

Summaries

summary

  • Type: String
  • Description: A summary of the document in its original language.
  • Nullable: Yes

translated_summary

  • Type: String
  • Description: A summary of the document translated into English.
  • Nullable: Yes
  • Note: This provides a concise overview of the document's content without requiring you to process the full body text.

Working with text fields

Language considerations

  • Original language fields contain text as published by the source, typically in Chinese.
  • English translation fields (*_en and translated_summary) are machine-generated translations optimised for accuracy and readability.
  • For most analytical purposes, the English fields provide sufficient quality for text mining, NLP, and semantic analysis.

Field availability

Not all documents will have content in every text field:

  • Some documents may lack subheadings
  • Summaries may not be available for all documents
  • In rare cases, body text or titles may be missing due to source formatting issues