Pleias, a French startup that builds energy-efficient giant language fashions (LLMs) for information-sensitive industries, has launched a dataset known as YouTube-Commons that accommodates over two million copyright-free video transcripts. YouTube-Commons contains full transcripts of every YouTube video, making it one of many largest collections of conversational knowledge with almost 30 billion phrases. The dataset supplies LLM builders with giant quantities of freely out there knowledge for coaching.
Get the information.
Picture credit score: Alexander Shatov