Tech

461 readers

1 users here now

A community for high quality news and discussion around technological advancements and changes

Things that fit:

New tech releases
Major tech changes
Major milestones for tech
Major tech news such as data breaches, discontinuation

Things that don't fit

Minor app updates
Government legislation
Company news
Opinion pieces

Community Wiki

founded 9 months ago

MODERATORS

[email protected]

The first GPT-4-class AI model anyone can download has arrived: Llama 405B (arstechnica.com)

submitted 3 months ago by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 3 months ago (1 children)

I've heard that performance improves offline. Is it possible to set a model loose on a project and let it iteratively work, or is there a better approach?

[–] [email protected] 4 points 3 months ago* (last edited 3 months ago)

If you are interested in code completion, I recommend taking a look at https://refact.ai/. Hosting it (last time I tried) was almost painless, setting up docker to work with your GPU takes some time, but is pretty ok-ishly documented on NVIDIA page, and then you just run a docker and it worked.

It runs a server you can connect to i.e with a VSCode plugin, that will provide code completion or a chatbot (depending on what model you run), and it also has an option to let it loose on your project. You set training hours, give it a git repo (or a zipfile with whole project), and it starts training, which should tailor it towards giving more relevant code completion in the context of the project. I'm not sure if you can do that for the chatbot models, though.

However, I was trying it on my spare gaming PC turned server, that has an unused NVIDIA 1060, and while I could run some smaller models, I wasn't able to get the training working - the 6Gb of VRAM simply aren't enough for that. I also tried running it on the PC I work on, but it kept eating like 20-30Gb of RAM for the container, which made it kind of hard to also do anything else on the PC.

However, if you have a spare PC/server with good GPU that can run it, I'd say it's one of the better ways how to get personalized code completion, that keeps your data local and secure.

As a side note, I think you can give it API keys and let it use online models, but that would kind of defeat the point.