Photo Credit: Meta
Meta released a new open-source artificial intelligence (AI) tool on Sunday that will take on the Google NotebookLM. Dubbed NotebookLlama, the tool is an AI-powered podcast generator where users can upload a PDF file and the tool will turn it into an audio podcast with two AI characters. The tool uses three different Llama 3.1 AI models to complete the entire process. Just like Google's tool, NotebookLlama's podcast also follows a back-and-forth conversation between two AI hosts in a free-flowing manner.
The Meta NotebookLlama AI tool uses three large language models to generate audio podcasts from blocks of text. Currently, the tool only accepts PDF files as input, so users will have to convert whatever text format they have into PDF.
NotebookLlama first uses Llama 3.2 1B instruct model to pre-process the PDF file and save it in a '.txt' file. Then the Llama 3.1 70B instruct model is used to write a podcast transcript using the source dataset. The transcription is then dramatised using a re-writer which uses the Llama 3.1 8B instruct model. Finally, a custom tool is used to add the transcription in a text-to-speech workflow. For this, Meta is using the Parler TTS tool. Interested individuals can access all the models required to generate podcasts from the GitHub listing here.
However, the AI models mentioned above are just recommendations from the developers. Users can prefer to use smaller models for every step, however, the results may vary. Meta highlighted that to run the AI system in the recommended setup, users will require a GPU with an aggregated memory of approximately 140GB.
An X (formerly known as Twitter) user posted a sample of the generated podcast. Based on this, it appears the audio quality is not as good as the Google NotebookLM, and it sounds shrill and robotic. Further, there are instances where parts of audio get skipped over and the AI hosts end up speaking over each other.
Meta acknowledges some of the issues and plans to improve them in the next iteration of the AI product. The company highlighted, “The TTS model is the limitation of how natural this will sound. This probably be improved with a better pipeline and with the help of someone more knowledgeable.”
The tech giant is also planning to use two different LLMs to write the script, where each model will debate the other to make the podcast sound more conversational. This is also part of the developers' future pipeline. Additionally, the company is also testing the Llama 405B AI model to write the transcripts as well as increasing the support for more input and output formats.
For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.