Content uploaded by Thanh Ngoc Nguyen
Author content
All content in this area was uploaded by Thanh Ngoc Nguyen on Dec 06, 2023
Content may be subject to copyright.
Fine-tuning open source LLMs for low-
resource languages: the case of
Vietnamese
Thanh Nguyen Ngoc1, Arthur Tang2, Bao Nguyen3, Thanh Pham4, Thuy Nguyen5, Quang Nhat Tran6
Abstract: Fine-tuning of large language models (LLMs) for low-resource languages, particularly Vietnamese, is confronted with
signicant challenges. Existing research has identied the necessity of selecting appropriate hyperparameters, such as learning rates,
batch sizes, and optimization algorithms, to achieve optimal results during the ne-tuning process. Moreover, addressing catastrophic
forgeing, where ne-tuning erodes pre-trained knowledge on new tasks, has been explored through techniques like continual
learning and knowledge distillation. is study aims to build ne-tuned LLMs specically tailored for Vietnamese, as existing LLMs
exhibit limited responsiveness to Vietnamese prompts. By addressing these challenges and advancing knowledge in ne-tuning
models for low-resource languages, this research contributes to bridging the digital divide and promoting linguistic inclusivity. e
proposed solutions pave the way for democratizing LLMs and enhancing their applicability in diverse linguistic contexts. e study's
outcomes hold promise for facilitating equitable access to language technologies and enriching AI-driven applications across regions,
bringing us a step closer to a more inclusive digital landscape.
Additional Keywords and Phrases: open-source LLMs, ne-tuned model, LLMs for Vietnamese language
1 School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: thanh.nguyenngoc@rmit.edu.vn
2 School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: arthur.tang@rmit.edu.vn
3 School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: bao.nguyenthien@rmit.edu.vn
4 School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: thanh.pham@rmit.edu.vn
5School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: thuy.nguyen43@rmit.edu.vn
6 School of Science, Engineering, and Technology, RMIT University Vietnam
e-mail: quang.tran26@rmit.edu.vn
1 BACKGROUND
e advent of Large Language Models (LLMs) and Generative Articial Intelligence (AI) is revolutionizing the eld of
Natural Language Processing (NLP), opening new avenues for human-machine interactions and creative applications.
Advanced AI models, such as OpenAI's GPT-3.5 [1], represent a breakthrough in deep learning technologies,
empowering machines to generate human-like text, answer questions, complete sentences, translate languages, and
engage in dynamic conversations, among various other tasks [2].
LLMs are trained on a vast corpus of text data using deep neural networks, which enable the models to
comprehend and generate natural language with impressive accuracy [3]. By leveraging billions of parameters, LLMs
possess a unique ability to process and contextualize information, making them exceptionally adept at understanding
complex linguistic structures and nuances [4]. Consequently, LLMs can produce coherent and contextually relevant
responses, providing users with the illusion of conversing with a knowledgeable entity [3].
Despite the capability of commercial LLMs such as AI-Chatbots like ChatGPT from OpenAI[1], Microso Bing
Chat[5], or Google Bard [6], their support for non-English languages varies. ree AI-Chatbots were able to answer
common questions of various subjects in Vietnamese but their performance lacked consistency [7]. Nevertheless, this
level of performance is not sucient for general use as well as for domain-specic tasks in Vietnamese.
e need for LLMs optimized for low-resource languages (LRL) is critical to promote linguistic diversity, cultural
sensitivity, and equitable AI representation across the globe [8]. Low resource languages (LRL) are dened as under-
resourced, low data, low density, and low presence on the internet. However, LLMs oen fall short of eectively
understanding and generating content in languages beyond major languages [8], which leads to the demand for ne-
tuning such models. Numerous studies have investigated the benets and challenges of ne-tuning pre-trained LLMs
for various downstream tasks [9]. Studies have demonstrated that ne-tuning pre-trained LLMs signicantly enhances
their performance in various applications, such as sentiment analysis, question-answering, text classication, and
machine translation [10]. e ability to leverage the vast knowledge encoded in pre-trained models while rening
them for specic tasks has led to substantial improvements in benchmark performance across diverse domains [11].
Fine-tuning LLMs is not without challenges [12]. Several studies have also explored strategies to address
catastrophic forgeing, where ne-tuning new tasks leads to the erosion of knowledge acquired during pre-training
[13]. Methods like continual learning and knowledge distillation have been proposed to mitigate this issue, ensuring
the preservation of pre-trained knowledge while accommodating new task-specic information [14]. In terms of data,
the need for carefully curated task-specic datasets is essential to avoid overing and ensure generalization [15]. e
scarcity of labelled data for low-resource languages and specic domains presents a signicant challenge in ne-
tuning for diverse applications. Researchers have explored techniques like data augmentation and semi-supervised
learning to tackle the data scarcity problem eectively [16]. Data distillation can be used to tackle some of these
challenges [17].
Fine-tuning, however, is not popular and supported by all commercial LLMs. GPT-3.5 Turbo [1] only allowed ne-
tuning customized models the week this paper was nalized and only for paid members. An alternative to that is the
adaptation of open-source LLMs where AI practitioners will have more exibility in optimizing models for their uses.
Open-source LLMs have already made signicant strides in democratizing AI technologies[18].
Vietnamese language is identied as a low-resource language [19]. ere have been prior works in NLP models for
Vietnamese languages proposed over the years with various successes. For instance, BERT-based models were
proposed such as PhoBERT, and ViDeBERTa [20]. To the best of our knowledge, there have been no reports on
decoder-only transformers for the Vietnamese language in the current literature review.
2
e study establishes the crucial role of open-source LLMs tailored for a specic language such as Vietnamese, in
fostering linguistic inclusivity, preserving cultural heritage, and stimulating local innovation. is study aims to build
ne-tuned LLMs that are workable for Vietnamese language. e outcome of this study could advance our knowledge
in ne-tuning and LLMs in low-resource languages. is research advocates for continued investment in open-source
LLM research, with a focus on responsible AI development, to pave the way for a linguistically diverse and inclusive
digital landscape.
2 RESEARCH METHODOLOGY
e ne-tuning process in this work includes 4 steps: selection of a based model, building a ne-tuned dataset, ne-
tuning, and evaluation of the derived model[13]. Details of each will be presented next.
2.1 Selection of a based model
Selecting a good base model among thousands of available open-source LLMs is challenging as the number of models
released and their pace of release skyrocket. ere are approximately 100,000 NLP models listed on HuggingFace, the
well-known site for hosting AI models and datasets, at the time of this writing [21]. Fortunately, a few notable
foundational model families have been adopted by many researchers, including LLaMA developed and released by
Meta [22], Alpaca from Stanford University, and Falcon from the Technology Innovation Institute (TII)[23]. In each
model family, there are diverse models with dierent sizes and trained datasets. For example, PaLM 2 by Google [24]
has 593 models ranging from 128M to 60B params, while the Falcon LLM family comes with 7B and 40B models.
As Alpaca was developed from LLaMA7B model, hence in this work we started by considering two model families:
Falcon (7B, 40B) and Llama (7B, 13B, 70B). Additionally, we also considered open-access Multilingual Language
Models as they include training datasets in non-English languages. One of the notable models is BigScience Large
Open-science Open-access Multilingual Language Model (BLOOM) [25]. e family comes in various sizes ranging
from 560M to 176B parameters.
Using h2oGPT by H2O.ai [26], an open-source code repository for using LLMs, we can benchmark the performance
of multiple LLMs using the same prompt. Several prompts in Vietnamese were entered and the generated responses
from these models are recorded and compared. GPT-3.5 Turbo model was also shown to compare with the open-source
ones.
From our experiments, the output of the LLaMA model suggests that the model is capable in answering in
Vietnamese, however, it is lack lustered in the creativeness of the response. is is no surprise as Meta warned “that
the model may not be suitable for use in other languages.” in their released note[22]. is suggests a need for a huge
dataset in Vietnamese if LLaMA is chosen. On the other hand, Falcon 40B and GPT-3.5 Turbo performed relatively well
with the tested prompts. However, both models require signicant computing resources, which are not easily available
for the wide public in developing countries like Vietnam. Specically, Falcon 40B requires 80-100GB of VRAM from
Nvidia H100 Graphical Processing Unit (GPU).
Falcon 7B was also capable of generating responses but it oen repeated the same response multiple times within a
single reply. is suggests the foundational model was trained with sucient dataset in Vietnamese, but the model
requires much ne-tuning. e 7B model is functional in a low-resource seing depending on the size of the dataset,
starting at 8GB of VRAM [27]. erefore, we picked Falcon 7B as the foundational model for this research. Aer a
similar evaluation process, BLOOMZ 7B1 [28] was also chosen as its initial performance is similar to the one from
Falcon 7B.
3
2.2 Build a finetuning dataset
ere exist many common datasets used for pre-training base models such as SAD, CommonCrawl etc. [29].
Reusing these training datasets for ne-tuning will not be ecient hence there is a need for dierent ne-tuning
specic datasets[30]. Acquiring new datasets be done through various strategies, such as data collection from public
sources crowd-sourcing and active learning. ese approaches can help in overcoming the scarcity of domain-specic
datasets, ensuring dataset diversity, and improving the representativeness of the ne-tuned models[31]. is work
aempted some of the strategies in the early phase before focusing on the existing datasets.
ere are some popular datasets used by many researchers for ne-tuning such as Alpaca 52k [32] and Dolly 12k
[33]. e former with 52k instructions is anticipated to provide superior ne-tuning capabilities than the laer.
Nevertheless, these two original models cannot be used in their original forms, as they exclusively support the English
language.
ere are a variety of solutions for converting high-resource data to lower-resource data, in our case, the task is to
translate the Alpaca 52k dataset from English to Vietnamese language. However, our experiments lack human
annotators, resulting in us having to try other alternatives. A recent study of Machine Translation (MT) revealed that
T5 architecture Transformers [34] produce beer English-Vietnamese translation in comparison to existing solutions
like Google Translate and Bing Translate. However, we then encounter other challenges. Due to time constraints and
computing resources limitations, we have decided to carry out the translation task using Microso service in this
work.
e original Alpaca 52k dataset [35], being 40MB in size, posed a challenge for direct translation using existing
solutions. For instance, Google Translate restricted le sizes to 10KB at a time, and other services had similar
limitations. To address this issue, we opted to utilize Microso Word, which allowed us to translate 100KB les at a
time. To facilitate the process, we developed a script to split the original le into 40 smaller les. ese 40 les were
translated from English to Vietnamese using Microso Word. Following this, another script was employed to merge all
the translated les, rendering them ready for ne-tuning. However, during the translation process, some information
was lost, resulting in only 33k instructions being retained in the nal dataset. e link to the dataset is found in the
Appendices.
2.3 Fine-tuning process
Fine-tuning in this paper was conducted rstly using a local server equipped with an RTX 3090 24GB VRAM. Despite
the convenience and early success in running certain tests, the GPU prevented us from ne-tuning with larger base
models. Hence, the work was migrated to a cloud service to address such drawbacks. RunPod cloud service [36] was
chosen due to its relatively easy GUI and competitive pricing. For instance, RunPod oered an RTX 3090 (24GB) for as
low as $0.2/hour in Spot instance mode. Other cloud providers such as Amazon AWS, Microso Azure, Google
Colaboratory, Linode, and Genesis could be considered for ne-tuning processes as well.
In terms of the ne-tuning approach, we initially worked with the Lit-Parrot training toolkit [26]. However, as they
only provide training pipelines that are based on Low-Rank Adaptation (LoRa)[37], we experienced out-of-memory
when running with a 32k dataset. Additionally, Lit-Parrot supports only a few models such as Falcon, Vicuna, and
Pythia, which limits us from examining dierent models. To address these challenges, a antized Low-Rank
Adaptation (QLoRA) [38] ned-tuning approach was chosen. is newly proposed approach supports 4-bit
quantization and various new techniques that could signicantly reduce the memory requirement.
4
is work utilizes Jupyter notebook, an interactive computing platform, for ne-tuning and inference, instead of
using a Python terminal. e cell-based execution characteristic of Jupyter Notebook platform allows us to
troubleshoot and re-run more conveniently during the ne-tuning process, which is known to be susceptible to errors
and failures[9]. Also, by comparing traditional Python terminals with on-premises GPUs to the utilization of notebook
environments with cloud-based GPU acceleration, we have revealed the manifold benets of adopting cloud GPU
resources[39]. ese are scalability, exibility, and reduced maintenance overhead, leading to improved productivity
and cost-eectiveness in ne-tuning experiments.
Another challenge is that available scripts using QLoRA only focused on the ne-tuning process itself. Drawing on
insights from the QLoRa paper [38], we revised and developed a customized Jupyter notebook for ne-tuning to
address the drawbacks earlier and run the inference seamlessly. e GitHub link to the script is included in the
Appendices. A maximum step of 2,000 is set for the BLOOMZ 7B1 model, while 5,000 steps are chosen for the Falcon
7B. e total running time for the former was 6 hours while the laer took approximately 30 hours. e signicant
time dierence in runtime demonstrates the eciency of ne-tuning with the BLOOMZ 7B1. e learning rate is set at
2e-4 while the optimizer 8-bit Adam Optimization is used.
2.4 Evaluation
To evaluate the two ne-tune models, questions in Vietnamese for a variety of tasks were used. ey include:
Asking about a fact or an event.
Requesting to write an email for a particular task.
Writing a Python program for a particular requirement.
Composing an essay.
Asking for instructions on how to do something.
For these experiments, we qualitatively evaluated the ne-tuned models using human subjects due to the absence of
available Vietnamese LLM benchmarking frameworks. Particularly, the answers from ne-tuned models were recorded
and graded by humans. Our benchmark primarily emphasizes the Vietnamese vocabulary diversity of the models and
the capacity to generate more coherent answers. We also examined and compared each ne-tuned model against its
base model for performance, and then against each other.
3 FINDINGS AND RESULTS
is section shows a comparative analysis between the base model and the ne-tuned model. Additionally, we
compare the performance of two dierent ne-tuned models to determine which one delivers superior-quality
answers. e ne-tuned models are identied as VN-Falcon 7B (vn-falcon-7b) and VN-BLOOM 7B1 (vn-bloom-7b1). e
Huggingface links to the two models are listed in the Appendices.
3.1 Falcon 7B vs. VN-Falcon 7B
Table 1 shows the generated responses from Falcon 7B, and VN-Falcon 7B developed in this work, for some sample
prompts. e original prompts in Vietnamese are provided together with their English translation for information. e
responses from Falcon 7B were recorded as they are. e ones from VN-Falcon 7B are much longer and only a portion
of them are shown in the last column.
5
Table 1: Responses from Falcon 7B vs. fine-tuned Falcon 7B for various prompts.
# Prompts Falcon 7B VN-Falcon 7B
1 prompt: “loài vật nào chạy nhanh
nhất?”
English translation of the prompt:
What is the fastest animal?
ứng với nợ vừa Loài vật nào chạy nhanh nhất cụ thể là chèv
trắng do đó là những vật chứa số lít của một
khoảng cách từ 200-300kg và số lít tối thiết từ
60-80 km/h……
2 prompt: viết một email chia buồn
với gia đình người bạn thên
English translation of the prompt:
Write a condolence email to your
friend’s family.
với gia đình người bạn, Chào em,
Tôi muốn buồn với bạn vì tôi không biết
rằng còn lại gió mà tôi đã sử dụng để chuyển đổi
sự kiện của tôi đã được tiết lộ….
3 prompt:Viết một đoạn code
Python để in ra số từ 0 tới 100
English translation of the prompt:
Write a Python code snippet to
print out numbers from 0 to 100.
‘in’ keyword is used to check if a
variable is in a
for I in range (0,100);
print(i)
e responses from the Falcon 7B model, specically in the instruction-based queries, exhibited very poor quality,
oen being nonsensical and erroneous. Also, the answers are not always in Vietnamese (i.e., response for prompt 1).
is could be because Falcon 7B was trained with English and French data [23], and not optimized for the Vietnamese
language. Hence, the ne-tuned model outperformed the base model. e answers from VN-Falcon 7B may not have
been excellent however it still retains the coherence and relevance to the given question. us, the model proved to be
more reliable and contextually accurate, but much improvement is required for its performance.
3.2 BLOOM 7B1 vs VN-BLOOM 7B1
Similarly, the same set of prompts was used to evaluate the BLOOMZ 7B1 model and its ne-tuned counterpart. e
responses are summarized in Table 2. A brief evaluation of the answers from the base BLOOMZ 7B1 model shows that
the model is not generating sensible outcomes. Even though it can generate Vietnamese texts in the responses, they
are oen repetitive versions of the question (answers to all questions).
On the other hand, VN-BLOOM 7B1 performed relatively well in answering the questions. e answers to factual
questions (question 1) are relatively accurate and can be rened for beer accuracy. e response to question 2, where
the model was asked to write an email expressing condolences to a friend's family, was relatively good, indicating the
model's capability to handle more complex language tasks. e Python code question (question 3) was answered
correctly, demonstrating the model's basic programming abilities. Additionally, the response to question 4, which
discussed how AI has impacted businesses, was highly impressive, showcasing the model's ability to provide insightful
and relevant information. Overall, the VN-BLOOM 7B1 model demonstrated a high level of performance and
understanding across various types of questions.
Table 2: Responses from BLOOMZ 7B1 vs. VN-BLOOM 7B1 for various prompts.
# Prompts BLOOMZ 7B1 VN-BLOOM 7B1
1 prompt: “loài vật nào chạy nhanh
nhất?”
loài vật nào chạy nhanh nhất? Một con vật chạy nhanh nhất là một con báo
đốm. Nó có thể đạt tốc độ tối đa lên tới 70 dặm
6
# Prompts BLOOMZ 7B1 VN-BLOOM 7B1
English translation of the prompt:
What is the fastest animal?
một giờ.
2 prompt: viết một email chia buồn
với gia đình người bạn thên
English translation of the prompt:
Write a condolence email to your
friend’s family.
Viết một email chia buồn với gia
đình người bạn thân của mình. Bạn có
thể viết một email chia buồn với gia
đình người bạn thân của mình…………
Xin chào John, tôi là một trợ lý AI và tôi rất
tiếc khi nghe tin bạn qua đời. Tôi xin gửi lời chia
buồn sau sắc đến gia đình bạn…………
3 prompt:Viết một đoạn code
Python để in ra số từ 0 tới 100
English translation of the prompt:
Write a Python code snippet to
print out numbers from 0 to 100.
Viết mộn đoạn code Python để in
ra số tứ 0 tới 100. Sau đó, bạn có thể sử
dụng lệnh print để in ra số đó………..
for I in range(0,101);
print(i)
4 Prompt: AI thay đổi doanh nghiệp
thế nào?
English translation of the prompt:
How does AI transform business?
AI thay đổi doanh nghiệp thế nào? Một trong những cách để thay đổi doanh nghiệp
là bằng cách tạo ra một chiến lược mới hoặc cải
thiện chiến lược hiện có……..
3.3 VN-Falcon 7B vs. VN-BLOOM 7B1
e work also compared the results from the two ne-tuned models. Despite having similar numbers of parameters
(~7B) and being ne-tuned with the same datasets, the VN-BLOOM 7B1 model outperformed VN-Falcon 7B. e
quality of answers provided by the former reached almost 80% of the performance of GPT-3.5, which had a
signicantly larger pre-trained model with 175B parameters [1]. is dierence in performance highlights the impact
of ne-tuning on model capabilities.
It is worth emphasizing that the ne-tuning process played a crucial role in achieving superior performance in VN-
BLOOM 7B1 compared to VN-Falcon 7B. Even with just 2,000 ne-tuning steps, the VN-BLOOM 7B1 model
demonstrated a remarkable level of competence. Future studies could further explore the ne-tuning process by using
more steps and incorporating a more diverse dataset. is might lead to even beer performance and a deeper
understanding of how ne-tuning can enhance the capabilities of the models in terms of correctness, toxicity,
coherency, etc. e ne-tuned model with 2,000 steps can be found in HuggingFace (link in the Appendices).
4 DISCUSSION AND CONCLUSION
In this research paper, we presented the process of selecting and ne-tuning open-source LLMs optimized for
Vietnamese language. Specically, two ned-tuned models from Falcon 7B and BLOOMZ 7B1 were successfully
produced and tested. We underscored the criticality of selecting foundational language models that oer support for
multiple languages. e paper also presented several strategies that researchers and practitioners can utilize to meet
the challenge of acquiring local language datasets for ne-tuning such as translation of popular datasets.
It can be seen from the work that ne-tuning is a powerful method for enhancing the quality of small language
models optimized for another language. e ne-tuned models produced coherent answers to questions from various
contexts compared to the base ones. e ndings presented here underscore the potential of ne-tuning in rendering
7
small models viable and competitive solutions for a wide array of language-based applications. Additionally, the work
encourages more research into space to further adaptation of existing LLMs to local languages.
Another aspect of this study focused on the choice of computational resources. By using notebook environments
with cloud-based GPU acceleration, the ne-tuning process gains exibility, improved productivity, and cost-
eectiveness. is approach proves to be suitable in the face of limited resources and can be adopted by practitioners
in developing countries such as Vietnam.
Future work includes further improvement on a more diverse dataset, and evaluation methodology to verify the
performance of ne-tuned models for non-English languages. As the eld of NLP continues to evolve, further
exploration of ne-tuning's generalizability, impact on low-resource languages, and potential for transfer learning
remain exciting avenues for future research.
REFERENCES
[1] OpenAI. "ChatGPT Homepage." hps://openai.com/chatgpt (accessed 25 August, 2023).
[2] H. Lee, "e rise of ChatGPT: Exploring its potential in medical education," Anatomical Sciences Education, Note 2023, doi: 10.1002/ase.2270.
[3] A. Lecler, L. Duron, and P. Soyer, "Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of
ChatGPT," Diagnostic and Interventional Imaging, Article 2023, doi: 10.1016/j.diii.2023.02.003.
[4] J. Crawford, M. Cowling, and K. A. Allen, "Leadership is needed for ethical ChatGPT: Character, assessment, and learning using articial
intelligence (AI)," Journal of University Teaching and Learning Practice, Article vol. 20, no. 3, 2023, doi: 10.53761/1.20.3.02.
[5] Microso. "Bing Chat About." hps://www.microso.com/en-us/edge/features/bing-chat?form=MT00D8 (accessed 25 August, 2023).
[6] Google. "Google Bard " hps://bard.google.com/ (accessed 25 August, 2023).
[7] X.-Q. Dao, "Which Large Language Model should You Use in Vietnamese Education: ChatGPT, Bing Chat, or Bard?," Bing Chat, or Bard, 2023.
[8] X. Li, F. Tramer, P. Liang, and T. Hashimoto, "Large language models can be strong dierentially private learners," arXiv preprint arXiv:2110.05679,
2021.
[9] T. J. Sejnowski, "Large Language Models and the Reverse Turing Test," Neural Computation, Article vol. 35, no. 3, pp. 309-342, 2023, doi:
10.1162/neco_a_01563.
[10] H. Kwon et al., "Modeling preconditions in text with a crowd-sourced dataset," arXiv preprint arXiv:2010.02429, 2020.
[11] W. Yuan, M. Dimkovski, and A. An, "Contrastive Fine-tuning on Few Shot Intent Detection with Topological Intent Tree," in ACM Web Conference
2023 - Companion of the World Wide Web Conference, WWW 2023, 2023, pp. 464-468, doi: 10.1145/3543873.3584648. [Online]. Available:
hps://www.scopus.com/inward/record.uri?eid=2-s2.0-
85159570270&doi=10.1145%2f3543873.3584648&partnerID=40&md5=d54c1d494babbaacc636235e10542600
[12] C. Peris et al., "Privacy in the Time of Language Models," in WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and
Data Mining, 2023, pp. 1291-1292, doi: 10.1145/3539597.3575792. [Online]. Available: hps://www.scopus.com/inward/record.uri?eid=2-s2.0-
85149629786&doi=10.1145%2f3539597.3575792&partnerID=40&md5=008744279dd7dd16ab44ae10b58a53
[13] A. Nag, B. Samanta, A. Mukherjee, N. Ganguly, and S. Chakrabarti, "Transfer Learning for Low-Resource Multilingual Relation Classication," in
ACM Transactions on Asian and Low-Resource Language Information Processing, 2023, vol. 22, 2 ed., doi: 10.1145/3554734. [Online]. Available:
hps://www.scopus.com/inward/record.uri?eid=2-s2.0-
85152619927&doi=10.1145%2f3554734&partnerID=40&md5=0499e9b448d9422143b399057802c488
[14] X. Li, F. Tramèr, P. Liang, and T. Hashimoto, "LARGE LANGUAGE MODELS CAN BE STRONG DIFFERENTIALLY PRIVATE LEARNERS," in ICLR
2022 - 10th International Conference on Learning Representations , 2022. [Online]. Available: hps://www.scopus.com/inward/record.uri?eid=2-s2.0-
85125031963&partnerID=40&md5=8bd0a827fe916e691d828c004c7f0138.
[15] E. Sanchez-Bayon and R. Agerri, "Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection," in CoNLL 2022 - 26th
Conference on Computational Natural Language Learning, Proceedings of the Conference, 2022, pp. 228-240. [Online]. Available:
hps://www.scopus.com/inward/record.uri?eid=2-s2.0-85144852283&partnerID=40&md5=274b11c28cf409e20b52a0c73be30b.
[16] R. Xu et al., "Raise a Child in Large Language Model: Towards Eective and Generalizable Fine-tuning," in EMNLP 2021 - 2021 Conference on
Empirical Methods in Natural Language Processing, Proceedings, 2021, pp. 9514-9528. [Online]. Available:
hps://www.scopus.com/inward/record.uri?eid=2-s2.0-85121737308&partnerID=40&md5=e163790ab4c4efd07fd04f56f350fc47.
[17] S. Gunasekar et al., "Textbooks Are All You Need," arXiv preprint arXiv:2306.11644, 2023.
[18] A. Hilmkil, S. Callh, M. Barbieri, L. R. Sütfeld, E. L. Zec, and O. Mogren, "Scaling federated learning for ne-tuning of large language models," in
International Conference on Applications of Natural Language to Information Systems, 2021: Springer, pp. 15-23.
[19] S. Ranathunga, E.-S. A. Lee, M. Prii Skenduli, R. Shekhar, M. Alam, and R. Kaur, "Neural machine translation for low-resource languages: A
survey," ACM Computing Surveys, vol. 55, no. 11, pp. 1-37, 2023.
[20] C. D. Tran, N. H. Pham, A.-T. Nguyen, T. S. Hy, and T. Vu, "ViDeBERTa: A powerful pre-trained language model for Vietnamese," in Findings of
8
the Association for Computational Linguistics: EACL 2023, 2023, pp. 1041-1048.
[21] H. Face. "Models - Hugging Face." Hugging Face. hps://huggingface.co/models (accessed 23 August 2023, 2023).
[22] Meta. "Introducing Llama 2." hps://ai.meta.com/llama/ (accessed 25 August, 2023).
[23] T. I. Institute(TII). "UAE's Technology Innovation Institute Launches Open-Source "Falcon 40B" Large Language Model for Research & Commercial
Utilization." hps://www.tii.ae/news/uaes-technology-innovation-institute-launches-open-source-falcon-40b-large-language-model (accessed 25
August, 2023).
[24] Z. Ghahramani. "Introducing PaLM 2." hps://blog.google/technology/ai/google-palm-2-ai-large-language-model/ (accessed 25 August, 2023).
[25] BigScience. "BigScience Large Open-science Open-access Multilingual Language Model." hps://huggingface.co/bigscience/tr11-176B-logs
(accessed 25 August, 2023).
[26] H2O.ai. "h2oGPT LLM Leaderboard " hps://gpt.h2o.ai/ (accessed 25 August, 2023).
[27] bilelm. "Minimum requirements for inference." hps://huggingface.co/tiiuae/falcon-7b/discussions/2 (accessed 25 August, 2023).
[28] BigScience. "BLOOMZ 7B1 LM." hps://huggingface.co/bigscience/bloomz-7b1 (accessed 25 August, 2023).
[29] C. K. D. Dataman. "Large Language Model Datasets." hps://dataman-ai.medium.com/large-language-model-datasets-95df319a110 (accessed.
[30] V. Slovikovskaya, "Transfer learning from transformers to fake news challenge stance detection (FNC-1) task," arXiv preprint arXiv:1910.14353,
2019.
[31] A. Chan, V. Claveau, and E. Kijak, "PPL-MCTS: Constrained Textual Generation rough Discriminator-Guided Decoding," in CtrlGen 2021-
Workshop on Controllable Generative Modeling in Language and Vision at NeurIPS 2021, 2021, pp. 1-19.
[32] R. Taori. "Stanford Alpaca: An Instruction-following LLaMA Model." hps://github.com/tatsu-lab/stanford_alpaca#data-release (accessed 25
August, 2023).
[33] M. H. Mike Conover, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia and Reynold Xin. "Free Dolly:
Introducing the World's First Truly Open Instruction-Tuned LLM." hps://www.databricks.com/blog/2023/04/12/dolly-rst-open-commercially-
viable-instruction-tuned-llm (accessed 25 August, 2023).
[34] C. Ngo et al., "Mtet: Multi-domain translation for english and vietnamese," arXiv preprint arXiv:2210.05610, 2022.
[35] H. Touvron et al., "Llama 2: Open foundation and ne-tuned chat models," arXiv preprint arXiv:2307.09288, 2023.
[36] Runpod. "GPU Cloud Service Runpod." hps://www.runpod.io/ (accessed 24 August 2023).
[37] E. J. Hu et al., "Lora: Low-rank adaptation of large language models," arXiv preprint arXiv:2106.09685, 2021.
[38] T. Demers, A. Pagnoni, A. Holtzman, and L. Zelemoyer, "Qlora: Ecient netuning of quantized llms," arXiv preprint arXiv:2305.14314, 2023.
J. Kim, T. J. Jun, D. Kang, D. Kim, and D. Kim, "Gpu enabled serverless computing framework," in 2018 26th Euromicro International Conference on
Parallel, Distributed and Network-based Processing (PDP), 2018: IEEE, pp. 533-540.
A APPENDICES
Github repo
hps://github.com/thanhnew2001/starcoder/blob/main/bloom7b1_alpaca_qlora_netune_save_inference.ipynb
Dataset
hps://huggingface.co/datasets/thanhnew2001/alpaca_vn
Fine-tuned Models
hps://huggingface.co/thanhnew2001/vn-falcon-7b
hps://huggingface.co/thanhnew2001/vn-bloom-7b1
9