Details¶
This page describe how the dataset is derived and the description of it.
Data Sources¶
The citations are obtained from the following sources:
CrossRef via DOI obtained from Open Academic Graph
JSTOR Sample Dataset (not accessible anymore)
CSL Styles¶
In total, 17 styles have been employed. The table below summarises the number of reference strings available in the dataset for each style.
CSL Style |
Number of Reference Strings |
---|---|
Annual Reviews |
placeholder |
APA 6th edition |
placeholder |
Cambridge University Press |
placeholder |
Chicago |
placeholder |
Current Opinion |
placeholder |
Elsevier (Harvard) |
placeholder |
Elsevier (Vancouver) |
placeholder |
IEEE |
placeholder |
MLA 7th edition |
placeholder |
Nature |
placeholder |
University of New South Wales (Oxford) |
placeholder |
Springer Humanities |
placeholder |
Springer MathPhys |
placeholder |
Springer (Vancouver) |
placeholder |
Taylor and Francis (Harvard) |
placeholder |
Wiley-VCH Books |
placeholder |
BibTeX Entry Types¶
The table below summarises the number of reference strings available for each BibTeX entry type.
Entry Type |
Number of Reference Strings |
---|---|
article |
placeholder |
book |
placeholder |
inbook |
placeholder |
incollection |
placeholder |
inproceedings |
placeholder |
misc |
placeholder |
phdthesis |
placeholder |
techreport |
placeholder |
Data Format¶
The data are stored as JSON lines in each file. Each line of the files represents a citation rendered in a specified CSL style with its corresponding annotated sequence.
{
"style": "apa",
"doc_type": "article",
"source": "crossref",
"data": "<author>Watson, J. D., & Crick, F. H. C.</author> <year>(1953).</year> <title>Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid.</title> <container-title>Nature,</container-title> <volume>171</volume> <issue>(4356),</issue> <page>737\\u2013738.</page> <DOI>https://doi.org/10.1038/171737a0</DOI>"
}
Important
Not all tokens are enclosed within the tags. These should be labelled as O (according to tagging scheme).