This file was created by the TYPO3 extension bib --- Timezone: CEST Creation date: 2024-07-04 Creation time: 03-22-41 --- Number of references 1 article 2023_hauser_technical-documentation Tool: Automatically Extracting Hardware Descriptions from PDF Technical Documentation Journal of Systems Research 2023 10 31 3 1 The ever-increasing variety of microcontrollers aggravates the challenge of porting embedded software to new devices through much manual work, whereas code generators can be used only in special cases. Moreover, only little technical documentation for these devices is available in machine-readable formats that could facilitate automating porting efforts. Instead, the bulk of documentation comes as print-oriented PDFs. We hence identify a strong need for a processor to access the PDFs and extract their data with a high quality to improve the code generation for embedded software. In this paper, we design and implement a modular processor for extracting detailed datasets from PDF files containing technical documentation using deterministic table processing for thousands of microcontrollers. Namely, we systematically extract device identifiers, interrupt tables, package and pinouts, pin functions, and register maps. In our evaluation, we compare the documentation from STMicro against existing machine-readable sources. Our results show that our processor matches 96.5 % of almost 6 million reference data points, and we further discuss identified issues in both sources. Hence, our tool yields very accurate data with only limited manual effort and can enable and enhance a significant amount of existing and new code generation use cases in the embedded software domain that are currently limited by a lack of machine-readable data sources. https://www.comsys.rwth-aachen.de/fileadmin/papers/2023/2023-hauser-technical-documents.pdf eScholarship Publishing 2770-5501 10.5070/SR33162446 1 NiklasHauser JanPennekamp