A Comparative Study of NER Methods for Ownership Structure Extraction from M&A Due Diligence Documents

Main Article Content

Hanfei Zhang

Abstract

Cross-border mergers and acquisitions require efficient extraction of ownership structures from due diligence documentation. This study compares named-entity recognition methodologies for extracting equity structures from corporate governance documents. We construct an annotated dataset from authentic materials and evaluate six NER approaches spanning traditional sequence labeling (CRF, BiLSTM-CRF), general-purpose transformers (BERT, RoBERTa), and domain-adapted models (FinBERT-MRC, Legal-BERT). Legal-BERT achieves an overall F1 score of 87.3% while encountering challenges in multilingual entity names and nested ownership structures. Error analysis reveals three primary failure modes-cross-lingual recognition ambiguities, percentage-quantity confusion, and challenges in representing complex structures-providing actionable guidance for implementing automated equity analysis systems in time-sensitive M&A transactions.

Article Details

Section

Articles

How to Cite

A Comparative Study of NER Methods for Ownership Structure Extraction from M&A Due Diligence Documents. (2026). Journal of Sustainability, Policy, and Practice, 2(1), 71-86. https://schoalrx.com/index.php/jspp/article/view/78

References

1. A. Shah, A. Gullapalli, R. Vithani, M. Galarnyk, and S. Chava, "FiNER-ORD: Financial named entity recognition open research dataset," arXiv preprint arXiv:2302.11157, 2023.

2. X. Zhang, X. Luo, and J. Wu, "A Roberta-globalpointer-based method for named entity recognition of legal documents," In 2023 International Joint Conference on Neural Networks (IJCNN), 2023, pp. 1-8. doi: 10.1109/ijcnn54540.2023.10191275

3. L. Hillebrand, T. Deußer, T. Dilmaghani, B. Kliem, R. Loitz, C. Bauckhage, and R. Sifa, "KPI-BERT: A joint named entity recognition and relation extraction model for financial reports," In 2022 26th International Conference on Pattern Recognition (ICPR), 2022, pp. 606-612.

4. B. Aejas, A. Belhi, H. Zhang, and A. Bouras, "Deep learning-based automatic analysis of legal contracts: A named entity recognition benchmark," Neural Computing and Applications, vol. 36, no. 23, pp. 14465-14481, 2024. doi: 10.1007/s00521-024-09869-7

5. I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, "LEGAL-BERT: The muppets straight out of law school," arXiv preprint arXiv:2010.02559, 2020. doi: 10.18653/v1/2020.findings-emnlp.261

6. K. Guo, T. Jiang, and H. Zhang, "Knowledge graph enhanced event extraction in financial documents," In 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 1322-1329. doi: 10.1109/bigdata50022.2020.9378471

7. F. Ariai, J. Mackenzie, and G. Demartini, "Natural language processing for the legal domain: A survey of tasks, datasets, models, and challenges," ACM Computing Surveys, 2024. doi: 10.1145/3777009

8. D. Hendrycks, C. Burns, A. Chen, and S. Ball, "CUAD: An expert-annotated NLP dataset for legal contract review," arXiv preprint arXiv:2103.06268, 2021.

9. S. Skylaki, A. Oskooei, O. Bari, N. Herger, and Z. Kriegman, "Named entity recognition in the legal domain using a pointer generator network," arXiv preprint arXiv:2012.09936, 2020.

10. R. Lu, and L. Li, "Named entity recognition method of Chinese legal documents based on parallel instance query network," International Journal of Digital Crime and Forensics (IJDCF), vol. 16, no. 1, pp. 1-19, 2024. doi: 10.4018/ijdcf.367470

11. H. Hamad, A. K. Thakur, N. Kolleri, S. Pulikodan, and K. Chugg, "FIRE: A dataset for financial relation extraction," In Findings of the Association for Computational Linguistics: NAACL 2024, 2024, pp. 3628-3642. doi: 10.18653/v1/2024.findings-naacl.230

12. E. Leitner, G. Rehm, and J. M. Schneider, "A dataset of German legal documents for named entity recognition," In Proceedings of the Twelfth Language Resources and Evaluation Conference, 2020, pp. 4478-4485.

13. Y. Chen, Y. Sun, Z. Yang, and H. Lin, "Joint entity and relation extraction for legal documents with legal feature enhancement," In Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 1561-1571. doi: 10.18653/v1/2020.coling-main.137

14. Y. Zhang, and H. Zhang, "FinBERT-MRC: Financial named entity recognition using BERT under the machine reading comprehension paradigm," Neural Processing Letters, vol. 55, no. 6, pp. 7393-7413, 2023. doi: 10.1007/s11063-023-11266-5

15. S. F. Mohsin, S. I. Jami, S. Wasi, and M. S. Siddiqui, "An automated information extraction system from the knowledge graph based annual financial reports," PeerJ Computer Science, vol. 10, p. e2004, 2024. doi: 10.7717/peerj-cs.2004