Analysing Unified Embedding with Morphological Insight for Multilingual Text Representation
DOI :
https://doi.org/10.5269/bspm.82246Résumé
The increasing complexity of multilingual text representation challenges the design of embedding techniques that can effectively capture linguistic nuances across different languages. In this paper, we propose a unified embedding model that integrates morphological insights for multilingual text representation, specifically focusing on a dataset that spans English, Marathi, and Konkani languages. We explore various embedding methods, such as Fused Embeddings, Character-Level Embeddings, Word-Level Embeddings, and Contextualized Multilingual Embeddings, and demonstrate how these methods can be combined with morphological information for enhanced language understanding. Our approach is illustrated with a real-world dataset, and we evaluate its performance on downstream tasks like text summarization, highlighting the gains from fused embeddings. The comparative analysis of different embedding methods, along with a detailed workflow and mathematical modeling, provides insights into the strengths and weaknesses of each approach
Téléchargements
Publié
Numéro
Rubrique
Licence
© Boletim da Sociedade Paranaense de Matemática 2026

Cette œuvre est sous licence Creative Commons Attribution 4.0 International.
When the manuscript is accepted for publication, the authors agree automatically to transfer the copyright to the (SPM).
The journal utilize the Creative Common Attribution (CC-BY 4.0).



