TY - JOUR
T1 - The student records of the University of Coimbra (1537-1919)
T2 - an Open Data Science approach
AU - de Carvalho, Joaquim Ramos
N1 - Publisher Copyright:
© 2022 Imprensa da Universidade de Coimbra. All rights reserved.
PY - 2022
Y1 - 2022
N2 - The University of Coimbra keeps to this day the academic records of students since 1537. In the years 1940-50 a card file of student information was created, known as the “Ficheiro de Alunos”. The catalogue contains records from 1537, when the university was relocated from Lisbon to Coimbra, up to 1908. The amount of information varied over time, containing name, first and last date on record, place of origin, school, years of enrolment, degrees obtained, and results of exams or other proofs of proficiency. Many records also contain notes such as titles (for instance indicating nobility), religious order, and college of residence. In the years 2013-15, the contents of the card files were input into an archival management system, giving the old records a new digital life. Currently, around 105,000 records are available online, and reachable through search engines. This paper addresses two limitations of the current online catalogue: first, the academic information in the paper cards was transcribed as a single text field preventing the usage of structured queries and any type of non-trivial data analyses; secondly, the opportunity of the University Archive to improve the catalogue through the cooperation of its users lacks a collaborative model that can scale. This article contributes solutions to both issues: we present algorithms to extract information from the records and produce representations in line with current data science paradigms, allowing a wide range of interesting analysis of the data; we also demonstrate how tools and cooperation models developed in the open-source community can provide an environment for collaborative efforts ranging from the notification of simple errors to the addition of semantic web representations for linked data, harnessing the knowledge dispersed by many researchers working on this unique repository of data. All source code and data analysis produced for this paper are available in a public repository at https://github.com/joaquimrcarvalho/ fauc1537-1919.
AB - The University of Coimbra keeps to this day the academic records of students since 1537. In the years 1940-50 a card file of student information was created, known as the “Ficheiro de Alunos”. The catalogue contains records from 1537, when the university was relocated from Lisbon to Coimbra, up to 1908. The amount of information varied over time, containing name, first and last date on record, place of origin, school, years of enrolment, degrees obtained, and results of exams or other proofs of proficiency. Many records also contain notes such as titles (for instance indicating nobility), religious order, and college of residence. In the years 2013-15, the contents of the card files were input into an archival management system, giving the old records a new digital life. Currently, around 105,000 records are available online, and reachable through search engines. This paper addresses two limitations of the current online catalogue: first, the academic information in the paper cards was transcribed as a single text field preventing the usage of structured queries and any type of non-trivial data analyses; secondly, the opportunity of the University Archive to improve the catalogue through the cooperation of its users lacks a collaborative model that can scale. This article contributes solutions to both issues: we present algorithms to extract information from the records and produce representations in line with current data science paradigms, allowing a wide range of interesting analysis of the data; we also demonstrate how tools and cooperation models developed in the open-source community can provide an environment for collaborative efforts ranging from the notification of simple errors to the addition of semantic web representations for linked data, harnessing the knowledge dispersed by many researchers working on this unique repository of data. All source code and data analysis produced for this paper are available in a public repository at https://github.com/joaquimrcarvalho/ fauc1537-1919.
KW - Data Science
KW - Database
KW - Open Science
KW - Students
KW - University of Coimbra
UR - http://www.scopus.com/inward/record.url?scp=85140290288&partnerID=8YFLogxK
U2 - 10.14195/2182-7974_35_1_1
DO - 10.14195/2182-7974_35_1_1
M3 - Article
AN - SCOPUS:85140290288
SN - 0872-5632
VL - 35
SP - 11
EP - 58
JO - Boletim do Arquivo da Universidade de Coimbra
JF - Boletim do Arquivo da Universidade de Coimbra
IS - 1
ER -