2 min readfrom Machine Learning

Nanochat vs Llama for training from scratch? [P]

Hey all - I'm engaged in a project training a model entirely on historical data, which I've posted about before on this subreddit. My last training run was done using Nanochat, and while that was very successful for pretraining and SFT of the initial model, I'm finding that while nanochat is great for getting it up and running, it's not so great for interoperability. There has been a little bit of work done to make nanochat transformers-compatible, but the latest version of nanochat (which I trained with) doesn't produce a transformers-compatible model.

So, I'm considering my next training run using the Llama architecture and the transformers 'trainer' class. I have assembled a much larger dataset for pretraining, and I want this to be an open-source project that people can access using transformers. However, I know that there are advantage to nanochat (such as the auto-scaling --depth parameter). All that said, is Llama the best potential architecture for this scenario? Or is there a better option that I could use here? Or do I just go with Nanochat again, and hope that I can build out a nanochat-to-HF export script on the other side?

submitted by /u/centerstate
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#rows.com
#financial modeling with spreadsheets
#big data management in spreadsheets
#conversational data analysis
#large dataset processing
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#Nanochat
#Llama
#training
#pretraining
#transformers