Ian Johnson |
This presentation is about the development and trajectory of Heurist (HeuristNetwork.org), a shared, integrated, extensible data infrastructure (model, build, manage, analyse, visualise, share, publish via integrated CMS) for Humanities research capable of handling the needs of many heterogeneous projects on a single standalone service*, with optional integration across multiple servers by a coordinating index (itself based on Heurist). Humanities data are interesting (both technically and to the public). They are rich in text, images, objects, people and events, heterogeneous, eminently linkable and sparse-matrix. Personal computers, the internet and other accessible technologies have spawned an exploding field (or fad?) known as Digital Humanities (DH), and opened exciting new horizons for research and public engagement. However, this technological turn has created many problems for a poorly funded research culture with 1-3 year grant funding cycles - choice of appropriate technology, finding and retaining technical staff, initial and ongoing costs, sustainability ... The outcome is often least-effort and inadequate technology (eg. spreadsheets) or ad hoc development, incomplete functionality, maintenance nightmares, data silos and rapid end-of-funding decay; only rich or statutory organisations can maintain a multi-component system for long. Heurist aims to overcome these problems by mutualised Open Source development, schemas stored as editable data rather than fixed structures, demand-driven priority development, and free centralised services and maintenance. In this presentation I will outline the evolution of our development process, from haphazard experimentation and many costly unused features (2005 - 2009) to a coherent, stable but evolving structure and Extreme Programming (aka living dangerously!), driven by immediate user requirements and incremental daily interface refinement. I will outline some of the fundamental principles we use to maintain backwards compatibility, stability, rapid development and low cost of maintenance for such a complex beast and for so many projects, on a self-funding staff of just 3 FTE. I also hope to attract some technical collaborators, as most of our users are (by design) non-technical. We maintain a central index and two free services (based in Australia and France), plus some institution-based servers, currently supporting a couple of hundred research projects in different fields, ranging from doctoral students to networks of researchers. |
Josquin Debaz, Waldir Lisboa Rocha |
Developed from 1995 onward, Prospero is a framework for longitudinal analysis of text corpora. Based on dictionaries and semi-automatic classification, it mainly allows its user to combine approaches of statistical computation, co-occurrence network and search for nested patterns. Inspired by pragmatic sociology, it focuses on the multiple forms of expression and argumentation used by actors, on language regimes and on the identification of transformations occurring in the research case. Initially distributed commercially, then from 2011, by the Doxa association, as shareware under a non-profit and ethical charter, it is now hosted by the Corpora association and developed under the aGPLV3 CECILL variant Affero compliant license. In this presentation, we will discuss more specifically the question of the permanence of a research-targeted software approach, through its evolution over almost 30 years. During this period, evolving expectations and technical developments have led to a client/server step (which remained in the prototype stage) and now to the transition to SaaS. Based on this experience, we will also discuss the conditions we consider relevant for the durability of the software in a new interconnected phase. The broadening of its audience of users and developers calls for ever greater interoperability, on the technical level, but with an approach that combines non-profit and academic models (with limited resources) and business uses. Josquin Debaz With a PhD in history of science, he has worked more than 10 years on contemporary controversies in health, environment and energy at GSPR (Pragmatic and Reflexive Sociology Group, EHESS). He is now developer at Finsit. With F. Chateauraynaud, he published Aux bords de l'irréversible. Sociologie pragmatique des transformations (Paris, Pétra, 2017). Waldir Lisboa Rocha With a degree in Environmental Engineering, he co-founded Luminae, an energy efficiency company, where he served as Chief Operating Officer between 2008 and 2012, before deciding to make a turn in his career and dedicate himself to the Social Sciences. He holds a Master's degree in Sociology from the École des Hautes Études en Sciences Sociales - EHESS, and is currently working on his PhD at the same institution, in which he is focusing on the relations between media, inquiry and democracy. In parallel to his academic research, he has been dedicated to the conception and structuring of Prefigura, an experimental institution, and of Enumera, an operating ecosystem. |
John Boy |
I am a social scientist who mostly teaches and conducts qualitative research, but I am also a programmer. Over the years, I have contributed to a variety of free and open source software projects, and since 2019, I have developed and maintained textnets, a Python package for text analysis that represents collections of texts as networks of documents and words, providing novel possibilities for the visualization and analysis of texts. In my field, such software development efforts are not usually rewarded, but I have been very fortunate. My academic superiors have been supportive of my endeavors, and a publication in the Journal of Open Source software also helped me get official recognition for this work in the standard currency of my field. While I developed textnets to scratch my own itch, I seek to make the package widely available by providing extensive documentation and making it easily installable across multiple platforms. This part of my software development work -- learning the intricacies of version control, package managers, continuous integration testing, and dependency management -- puts me in a position to learn not just about the technical side of coding, but about the social side of the choices developers make. At least in the Python world, the way you learn about what dependencies to use, if any, and how many, is informed by norms more than by technical considerations, and the same is true for much else. By engaging in software development work, I engage in a version of the research method of participant observation -- learning by taking part -- that sociologists have called observant participation -- becoming part of what you want to learn about. In my case, I want to learn not just about software development and its culture and norms, but the wider world of free software, hacker culture, artistic practice based on FOSS tools, and more. In my talk, I reflect on my experiences engaging in observant participation as well as some of the insights I have gained and still hope to gain. |
Robin De Mourat |
The writing of web publications mixing data visualization and textual prose opens novel opportunities for connecting evidence, arguments and narrative in social sciences communities. Such a practice poses a variety of challenges in terms of website design and development ; but also and maybe more importantly, it asks for experimenting specific workflows for coordinating a variety of expertises ranging from social sciences disciplines (history, sociology, etc.) to data science, information design and web-related skills. It also reconfigures, for the research processes themselves, the relationships between activities of (data-related) enquiry and (communication-oriented) writing, creating a renewed space for discovery, invention and verification for the data sustaining a given argument or narrative. Relying on recent experiments in making collective digital publications grounded in sociology of technology (https://medialab.github.io/carnet-algopresse/#/publication/en) and history of economy (https://medialab.github.io/portic-storymaps-2021/), this talk accounts for the diverse challenges arising from such activities of “data visualization-driven writing”, and some strategies we used to cope with them. It describes and compares the technical and methodological workflows we developed in order to simultaneously develop text, datasets and visualizations, taking into account a variety of aims, data materials, and distribution of skills. Doing so, it advocates for an extended understanding of the notion of “academic writing”, encompassing the practices of writing software, data and diagrams. Such an extended understanding, we argue, is necessary to design and develop writing workflows allowing to foster a multimodal and scientifically productive dialogue between these heterogeneous practices, taking full advantage of the web publication format as a research situation. |
Evgeny Karev |
This talk will show a new Python tool called Livemark, which is designed for data journalism software education, and documentation writing. Using Livemark, you can collect and present data with interactive tables, charts, and other elements without leaving a text editor. You can also write documentation with live script execution similar to a lightweight version of a Jupiter Notebook. This talk will demo Livemark and will be well-suited for a technical and non-technical audience that is interested in learning about data storytelling. No prior knowledge is required although a basic knowledge of Markdown and Python scripting will help understand in-depth sections. |
... |
... |
Damien Goutte-Gattat |
In biomedical sciences, ontologies are used to annotate and organize data stored in knowledge databases and facilitate their exploitation. Following the pioneering work of the Gene Ontology at the turn of the century, the Open Biomedical and Biological Ontologies (OBO) Foundry was created to coordinate the development of a family of interoperable ontologies sharing a core set of principles. The Foundry now includes more than 150 ontologies. The Ontology Development Kit (ODK) [1] was developed to facilitate the implementation of standardized ontology development practices across the Foundry. It takes the form of a Docker image that provides ontology editors with all the command-line tools they need to manage, edit, build, and test their ontologies, as well as standardized and carefully crafted Makefile rules to pilot all steps of the ontology life cycle. In recent years, many ontologies such as the Uberon multi-species anatomy ontology, the Cell Ontology (CL), or the Unified Phenotype Ontology (uPheno) have been converted to use the ODK. By moving most of the management, building, and testing logic from the individual ontologies to the ODK, the kit aims to make the life of ontology editors easier, by allowing them to focus solely on actual ontology editing, all the while contributing to the standardisation of the various ontologies. [1] Nicolas Matentzoglu, Chris Mungall, and Damien Goutte-Gattat (2021). Ontology Development Kit. doi:10.5281/zenodo.5762512 |
Lozana Rossenova, Dragan Espenschied |
The Wikibase Stakeholder Group is a new initiative testing alternative approaches to governance, decision-making and community-building for open source digital knowledge management. It aims to facilitate collaboration across various institutional and individual partners in order to ensure the continued development and long-term sustainability of Wikibase, a suite of tools for data management within a linked open data environment. Wikibase is currently developed and maintained by Wikimedia Germany, a chapter of the non-profit Wikimedia Foundation. Wikibase is vital infrastructure for the public linked data project Wikidata, but since its open release in 2015 it has been increasingly taken up in research, cultural and institutional contexts due to its flexible, open and collaborative architecture. Rhizome have been piloting the use of Wikibase within GLAM contexts since its release, and have co-organized the first set of public meetups and events around the emerging Wikibase community and ecosystem of decentralized Wikibase instances. Following the success in bringing the community together through these events Rhizome and a few early adopters started the Wikibase Stakeholder Group at the end of 2020. In this talk, we will present the activities of the Group to date, lessons learned from our experiences in collective decision-making, funding for collaborative development efforts, and negotiating between individual project requirements towards a common roadmap in line with ongoing efforts of the Wikimedia team. Expected prior knowledge / intended audience: No prior knowledge is required for this talk, except general familiary with open source community contexts and open data management tools. The intended audience is other practitioners actively involved in open source communities, in the governance and organization of communities, and/or the development of tools for linked open data management. |
Patricia Herterich |
Funders, publishers and scientific organizations have highly endorsed the adoption of FAIR principles (Findable, Accessible, Interoperable, and Reusable) to promote research data reusability and reproducibility. However, FAIR principles are high-level guidelines without explicit requirements for their implementation. Practical solutions such as metrics and associated tools are required to support the assessment of FAIR compliance of research artefacts such as services and datasets. This talk will introduce an open-source tool named F-UJI which was mainly developed to support trustworthy data repositories committed to FAIR data provision to programmatically measure datasets for their level of FAIRness over time. The talk will provide an overview the development and application of F-UJI and use cases it has supported so far. |
Yanina Bellini Saibene |
Argentina's National Institute of Agricultural Technology (INTA) conducts research and development for the agricultural sector. Environmental conditions influence agricultural activity; in particular, climatic conditions have a favorable or detrimental effect on production. Thus, it is essential to monitor and analyze the different agro-meteorological variables to describe these conditions and their impact on agricultural and livestock production. With this approach, INTA has an extensive ground network of conventional and automatic weather stations. In addition, there is an information system (http://siga.inta.gob.ar) with predefined queries and visualization on this data for internal and external use. All the information generated by the institution is openly shared under a CC-BY-NC license. INTA is a decentralized institution and generates research, analysis, and reports at different scales (national to local). The processes to perform these tasks use various software tools and different methodologies. Moreover, these processes are in the computer and the head of the researchers. Developing internal packages or libraries has great potential to promote reproducible analysis frameworks, improve an organization's code quality, enhance knowledge management (Riederer, 2021), standardize and make processes transparent, and open software and data to society. The {agromet} package includes a series of functions that can be used regularly for the calculation of agrometeorological indices and statistics. The input meteorological data works under the tidy data philosophy, so the package functions are generic. They can be applied to any tabular dataset regardless of its origin, order, or column names. However, according to INTA's internal requirements, the package also incorporates tools to read data in an INTA format. This package has implemented functions for calculating indexes and variables of agricultural interest, standardizing how these computations are made. It also incorporates mapping functions with scale and reports templates. The package {siga} downloads and reads data from INTA's Agrometeorological Information and Management System programmatically. This talk will discuss the decision process to generate a series of internal packages designed to be used by INTA users but with enough generality to be helpful to a broad community. Their development, current use, and this experience encouraged the generation of similar packages for soil data. Organization on GitHub: https://github.com/AgRoMeteorologiaINTA |
Clemens Lange |
The experiments at the Large Hadron Collider (LHC) at CERN have been running for more than a decade. The data recorded by the detectors such as the CMS experiment are analysed by thousands of physicists all over the world. The CMS Collaboration has made openly available more than 2.5 petabytes of data on the CERN Open Data Portal, containing billions of recorded and simulated events. Open Data are, however, only useful when accompanied by realistic usage examples. The sheer amount of data as well as the fact that the software used to analyse them is often more than ten years old poses several challenges. In this presentation, Clemens will discuss how the CMS Data Preservation and Open Access group tries to overcome these challenges so that potentially everyone could use the data to unveil hidden physics. |