import React from 'react';

const Tutorial = () => {
    return (
        <div className="bg-gray-100 py-12">
            <div className="max-w-4xl mx-auto px-6">
                <h1 className="text-4xl font-bold text-gray-800 mb-6">
                    Training Your Own Machine Translation Model with Gaia
                </h1>
                <div className="bg-white rounded-lg shadow-md p-8 text-gray-700">
                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Introduction</h2>
                    <p className="mb-4">
                        This tutorial will guide you through the process of training your own machine translation model using a no-code tool. 
                        By following these steps, you can create a custom model to translate between a source and target language of your choice, 
                        actually, it is designed to perform specially good with low-resource languages. 
                        The tool leverages the power of the <a target='_blank' href='https://ai.meta.com/research/no-language-left-behind/' className='px-1 text-blue-500 hover:text-blue-600 cursor-pointer'>NLLB (No Language Left Behind)</a> developed by Meta AI.
                    </p>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Prerequisites</h2>
                    <ul className="list-disc pl-6 mb-6">
                        <li>A parallel corpus of sentences in the source and target languages. Make sure the parallel sentences are in the same line as their translations.</li>
                        <li>Machine learning and/or Neural Machine Translation prior knowledge is a big plus. Though you can get started with basic skills.</li>
                    </ul>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Step 1: Prepare Your Training Data</h2>
                    <ol className="list-decimal pl-6 mb-6">
                        <li>
                            Collect a parallel corpus of sentences in your desired source and target languages. Aim for at least a few thousand sentence pairs, but the more data you have, the better your model will perform. Besides, if you want to train the model in a specific case scenario then make sure you gather data about that specific scenario.
                        </li>
                        <li>Ensure the sentences are properly aligned and cleaned. Remove any noise, such as HTML tags or irrelevant characters.</li>
                    </ol>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Step 2: Login / Register</h2>
                    <ol className="list-decimal pl-6 mb-6">
                        <li>Create an account or log in if you already have one. Go to "Gaia" section.</li>
                    </ol>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Step 3: Configure Your Model</h2>
                    <ol className="list-decimal pl-6 mb-6">
                        <li>Upload your source language dataset and your target language dataset</li>
                        <li>Write down the names of the source and target languages in their respective input fields.</li>
                        <li>Set the number of training epochs. Start with a value between 300 and 500, and increase if needed.</li>
                        <li>Adjust the batch size based on your data and computational resources. A batch size of 16 or 32 is a good starting point.</li>
                        <li>Set the learning rate. A value of 0.0001 is recommended for most scenarios.</li>
                        <li>Specify the maximum sequence length for input and output. This will truncate longer sentences to the specified length.</li>
                        <li>Choose the validation split ratio. The default of 0.1 (10%) is usually adequate.</li>
                        <li>Select the output directory where your trained model will be saved.</li>
                        <li>Write down your model name</li>
                        <li>It is recommended that you leave GPU checked in since it will train the models faster.</li>
                        <li>Optionally, set the random seed for reproducibility.</li>
                    </ol>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Step 4: Train Your Model</h2>
                    <ol className="list-decimal pl-6 mb-6">
                        <li>Click on "Start Training" to begin the fine-tuning process.</li>
                        <li>Monitor the training progress, including the loss and validation metrics.</li>
                        <li>The tool will automatically save the best model checkpoint based on the validation performance.</li>
                        <li>Training may take several hours, depending on the size of your data and the number of epochs.</li>
                    </ol>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Output</h2>
                    <ol className="list-decimal pl-6 mb-6">
                        <li>Once training is complete Gaia will generate translations for your test set and display the BLEU and ChrF++ scores.</li>
                        <li>Also Gaia will output your neural machine translation model in .py extension so you can use it somewhere else or upload it to Huggingface</li>
                    </ol>
                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Conclusion</h2>
                    <p className="mb-4">
                        By following this tutorial, you have learned how to train your own machine translation model using Gaia, a no-code tool. 
                        This approach makes it accessible for users without deep technical expertise to create custom translation models. 
                        Remember that the quality of your model depends on the size and quality of your training data, so aim to collect as much parallel text as possible. Happy translating!
                    </p>

                    <h2 className="text-2xl font-bold text-gray-800 mb-4">Tips and Best Practices</h2>
                    <ul className="list-disc pl-6 mb-6">
                        <li>Preprocess your data to handle inconsistencies, such as replacing non-standard punctuation or normalizing characters.</li>
                        <li>Start with a smaller model and data subset to validate your approach before scaling up.</li>
                        <li>Experiment with different hyperparameters, such as the learning rate and batch size, to find the optimal configuration for your specific use case.</li>
                        <li>Regularly monitor the training progress and validation metrics to detect issues like overfitting or underfitting.</li>
                        <li>Fine-tune your model incrementally by training for a few epochs, evaluating, and then training further if needed.</li>
                        <li>Augment your training data with techniques like backtranslation to improve performance, especially for low-resource languages.</li>
                        <li>Consider using domain-specific data if your use case requires translations in a particular domain, such as legal or medical text.</li>
                        <li>Collaborate with the community and share your models and data to contribute to the development of translation resources for low-resource languages.</li>
                    </ul>

                    <p className="mb-4">
                        By leveraging the power of transfer learning and the NLLB models, this no-code tool empowers users to create their own custom machine translation models easily. As you iterate on your models and data, you can achieve significant improvements in translation quality, even for languages with limited existing resources. The field of machine translation is rapidly evolving, and tools like this make it accessible to a broader audience, fostering innovation and breaking down language barriers.
                    </p>
                </div>
            </div>
        </div>
    );
};

export default Tutorial;