technology

23313 readers

66 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

[email protected]

‘In awe’: scientists impressed by latest ChatGPT model o1 (www.nature.com)

submitted 1 month ago by [email protected] to c/[email protected]

46 comments fedilink hide all child comments

I know people here are very skeptical of AI in general, and there is definitely a lot of hype, but I think the progress in the last decade has been incredible.

Here are some quotes

“In my field of quantum physics, it gives significantly more detailed and coherent responses” than did the company’s last model, GPT-4o, says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany.

Strikingly, o1 has become the first large language model to beat PhD-level scholars on the hardest series of questions — the ‘diamond’ set — in a test called the Graduate-Level Google-Proof Q&A Benchmark (GPQA)1. OpenAI says that its scholars scored just under 70% on GPQA Diamond, and o1 scored 78% overall, with a particularly high score of 93% in physics

OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

Catherine Brownstein, a geneticist at Boston Children’s Hospital in Massachusetts, says the hospital is currently testing several AI systems, including o1-preview, for applications such as connecting the dots between patient characteristics and genes for rare diseases. She says o1 “is more accurate and gives options I didn’t think were possible from a chatbot”.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 7 points 1 month ago (2 children)

I feel like a broken record saying this. But AI frequently does solve coding problems for me that would've taken hours. It can't solve everything, and can't handle large amounts, but it can be genuinely useful.

[–] [email protected] 8 points 1 month ago* (last edited 1 month ago)

Same, but it has to be presented well. If you want it to work for you like a Junior Coding Assistant you need to talk to it like such; outline what you need, refine the prompt for caveats, and provide unique information for specialized use cases. I find it especially helpful for one off programming in languages I'm not familiar with or getting me past the mental block of a blank page.

Also, there's a lot of stuff being thrown at LLMs that really shouldn't be. It's not the be all end all of AI tech.

[–] [email protected] 3 points 1 month ago (1 children)

In my experience the main risks in coding are poor communication about what the thing is supposed to do and why and then translating this into a clear specification that everyone understands and can push forward on. Rarely is it about chugging away at a problem, which is mostly about typing speed and familiarity with dev tooling.

What kinds of things has it saved time on? It has only caused headaches for those around me. At best they get something that is 90% what they asked for but they then need to spend just as much time finding the 10%.

The most praise I've seen is for writing a bunch of tests, but to me this is actually the main way you defend a specification, that most important step I mentioned above. It's where you get to say, "this captures what this stupid thing is supposed to do and what the expected edge cases look like". That's where things should be most bespoke!

[–] [email protected] 2 points 1 month ago

Diagnosing networking issues, short bash/python scripts of any and all purposes, gdb debugging, finding and learning how to use appropriate libraries, are most of my use cases. It's not a one-and-done either, I often have to ask it to explain, or fix a broken aspect, or Google the documentation and try again, etc.