Reference Strings Dataset
=========================
Quick Introduction
------------------
*Reference Strings Dataset* is a collection of synthetically generated bibliographies that comes with annotations on each token. For example, a citation rendered in *APA* style:
.. code-block::
:caption: Plain text render of a citation
:name: plain-text-render
Watson, J. D., & Crick, F. H. C. (1953). Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature, 171(4356), 737–738. https://doi.org/10.1038/171737a0
The corresponding annotated form encloses contigious segment of tokens with :abbr:`XML (Extensible Markup Language)`-like tags:
.. code-block::
:caption: Plain text render of a citation with annotation
:name: plain-text-render-with-annotation
Watson, J. D., & Crick, F. H. C. (1953).
Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature, 171 (4356), 737–738. https://doi.org/10.1038/171737a0
These XML-like tags are based on :abbr:`CSL (Citation Style Language)` `Variables `_.
For more information about the method which the dataset is synthesized and the data format, read :doc:`details/index`.
Use Cases
^^^^^^^^^
Sequence tagging/labeling
Assignment of categorical label to each member of the sequence.
How to obtain the dataset
-------------------------
Visit the :doc:`obtaining-data/downloads` page for instructions.
Citing
------
This dataset is part of a Master project in NUS.
If you are using the dataset for scientific work, please cite the following:
.. literalinclude:: ../citation.bib
:language: bibtex
Content
-------
.. toctree::
:maxdepth: 2
usage/requirements
details/index
obtaining-data/downloads
references/index
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`