Google announces Sec-Gemini v1 a new experimental cybersecurity model

security.googleblog.com

161 points by ebursztein 3 months ago

qwertox 3 months ago

There is generally something about the Gemini models which feels a bit different than Claude, ChatGPT or Mistral.

I always have the feeling that I'm chatting with a model oriented towards engineering tasks. The seriousness, lack of interest of being humorous or cool.

I don't know if this is because I interact with Gemini only through AI Studio, and it may have different system instructions (apart from those one can add oneself, which I never do) than the one at gemini.google.com.

I never use gemini.google.com because of the lack of a simple export feature. And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.

AI Studio saving to Google Drive is really useful. I lets you download the chat, strip it of verbose things like the thinking process, and reuse it in a new chat.

I wish gemini.google.com had a "Save as Markdown" per answer and for the complete chat (with a toggle to include/exclude the thinking process). Then it would be a no brainer for me.

It's the same as if Google Docs would not have an "Download.." menu entry but you could only "save" the documents via Takeout.

Y_Y 3 months ago

> The seriousness, lack of interest of being humorous or cool.
I love this. When ChatGPT compliments me on my great question or tries to banter it causes me great despair.
- neodypsis 3 months ago
  
  I've noticed 4o uses a lot of emojis, and, in general, is very enthusiastic. I find it funny. If I want a more formal bot, I switch to one of the o3 family.
  - querez 3 months ago
    
    I use a very simple custom system prompt (not on my work machine at the moment, but essentially something along the lines of "for technical questions, please be concise and to the point, and when asked for code, omit explanations and emit just the code itself unless I ask for explanations"), and it does wonders.
    
    indeyets 3 months ago
    
    It’s interesting that my default prompt is exactly the opposite one: “do not write the code unless I ask for it specifically”. I like to use LLMs as a discussion partner, but writing code is trivial after a good discussion and I can do that myself
    
    querez 3 months ago
    
    I guess it depends on use-cases. I use ChatGPT a lot for "trivial" questions a la "how do I uncommit a specific file in my last git commit" or "how do I paste from one PIL.Image into another one". In the past I would have to search google, click on the StackOverflow link, and then parse that whole page. Asking ChatGPT to give me just the snippet is faster, so doesn't get me out of my flow as much.
  - jonplackett 3 months ago
    
    Every now and then 4o seems to get a bit drunk and use tonnes of emojis or start swearing when I haven’t sworn myself in the chat.
    The other day I asked a fairly innocuous question and it LOLed and said it’d give me the ‘no Bullshit answer’
    
    kfajdsl 3 months ago
    
    I've had 4o start off its response with a Smiling Face with Sunglasses emoji by the heading unprompted lol.
    edit: does hacker news filter out emojis? TIL (there should be emojis after this colon: )
    
    tomrod 3 months ago
    
    Meta: no emojis on HN. Pure emoticon. :)
- prawn 3 months ago
  
  If I am brainstorming ideas and ChatGPT gives the inevitably fawning response, it always reminds me of those friends who have never heard a side project idea before and get excited about anything.
- qwertox 3 months ago
  
  So do I. But it's not like ChatGPT isn't flexible, the code it generates for small tasks is really good, and the site is faster than AI Studio.
  For example, if I want to quickly create a Python script to list all VMs via libvirt and output their attached drives and filesystems, that's a task for ChatGPT.
  But for the things where I don't want an AI to "suck up" to me and instead "stay professional", that's Gemini.
- DanielVZ 3 months ago
  
  Sometimes it answers something along the lines of: BOOM! thats where the bug was, and here’s how to fix it…
  While being entirely wrong and I cringe a little
- cainxinth 3 months ago
  
  It was flattering for a nanosecond, and then you realize 4o will call almost anything “insightful” or “profound.”
tyushk 3 months ago
You put into words something I've been struggling to describe for a long time. Gemini gives short, succinct responses with whatever information you need and minimal anything else. ChatGPT, Claude both fill text with mannerisms, formatting, etc.
I didn't realize just how big the difference was until I tested it.
"How do I clear a directory of all executable files on Debian?"
Gemini 2.0 Flash: (responses manually formatted)
```
        find /path/to/directory -type f -executable -delete
    Replace /path/to/directory with the actual path.
```
ChatGPT: (full link [1])
```
    To clear (delete) all executable files from a directory on Debian (or any Linux system), you can use the find command. Here's a safe and effective way to do it:
    # [checkmark emoji] Command to delete all executable files in a directory (not recursively): [..]
    # [magnifying glass emoji] Want to preview before deleting? [..]
    # [caution sign emoji] Caution: [..]
```
[1] https://chatgpt.com/share/67f055c8-4cc0-8003-85a6-bc1c7eadcc...
- dartos 3 months ago
  
  Probably because both anthropic and openai were on the whole AGI train where they were trying to heavily personify their products.
  Google never seemed to personify theirs, IIRC. They always presented their AI tools in a utilitarian way.
HelenePhisher 3 months ago

> And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.
Ask Claude to generate a .md of the conversation, it will do that with the option to download that or a PDF of it. A lovely, but well hidden feature!
- qwertox 3 months ago
  
  Thanks for the tip. I tested it and this also works with Gemini and ChatGPT.
  The only drawback I see is that it requires enough free space in the context window to duplicate the visual part of the chat.
occamschainsaw 3 months ago

I have been using the Obsidian web clipper to export chats from ChatGPT and Claude web versions to nicely-formatted md files. You can save md to Obsidian or download it as a standalone file. It doesn’t support Gemini yet though.
https://github.com/obsidianmd/obsidian-clipper
asadm 3 months ago

2.5 has been amazing for programming. I just send it entire repo as context when I am lazy and then ask it for entire modified files back with the (medium sized) change. It almost always works! I wish to either start using cursor or some vscode extension to do this from ide itself.
- upcoming-sesame 3 months ago
  
  How do you send an entire repo to it ? file by file ?
  - asadm 3 months ago
    
    I have a few ways but usually https://github.com/yamadashy/repomix works
ZYbCRq22HbJ2y7 3 months ago

Is it because Google is feeding the model that information about you? It knows more of the responses you'd like? Just like Google does with search history?
gavinray 3 months ago

It doesn't seem as popular, but I've found Grok to treat you the least like a child and provide good answers. Especially with more complicated tasks.
- uejfiweun 3 months ago
  
  I think Grok is the best for asking about current events but I kind of hate how it always tries to turn everything into a conversation. But that's just my opinion! What do you think is the most annoying feature about Grok?
  ^ like that.
codelion 3 months ago

it's interesting that different models evoke such distinct personalities. i agree, sometimes the excessive enthusiasm can be distracting. a concise, focused response is often more valuable, especially for technical tasks. i find that a clear system prompt can really steer the model's behavior, like you mentioned.
tomrod 3 months ago

I uniformly call Gemini is a bash script. Really like it they way.
dilyevsky 3 months ago

When you have to justify your spend to public shareholders it makes it much more difficult to spend tokens on “great quesion!” and vocal ticks and what not
baby 3 months ago

how is that related to the post?
regulayshun 3 months ago

[dead]

jruohonen 3 months ago

> Next, in response to a question about the vulnerabilities in the Salt Typhoon description, Sec-Gemini v1 outputs not only vulnerability details (thanks to its integration with OSV data, the open-source vulnerabilities database operated by Google), but also contextualizes the vulnerabilities with respect to threat actors (using Mandiant data).

I remain still skeptical about LLMs in this space, although I might be proven wrong, as often happens. Nevertheless, OSV has already been a big advance, so it is great that it gets a further commitment.

andy99 3 months ago

Is this a "model" as in a set of transformer weights that inherently does security work or is it a system that has data lookup and or other tools along with an LLM to do the question interpretation, synthesis, and output presentation?

From the description re data integrations it sounds like the latter, unless the data mentioned is in fact used for training.

The distinction is important because a security-tuned model will have different limitations and uses than an actual pre-build security LLM app. Being an app also makes benchmarking against other "models" less straightforward.

esafak 3 months ago

It's interesting how we're seeing the emergence of specialized models, much like trained humans.

jgalt212 3 months ago

What's old is new again. Pretty much all ML and statistical models were specialized for a single task / domain.

infoSecer 3 months ago

It always blows my mind that nobody at Google thought it would be a good idea to very carefully review the answer of the AI. In the second screenshot, the prompt asks about CVE-2024-3400, and at first glance this appears ok.

But in the affected systems section it states:

> Also Hitachi Energy RTU500 firmware and Siemens Ruggedcom APE1808 firmware.

I cannot find any reference that this Hitachi device is vulnerable to that CVE. Hitachi has a nice interface to list all vulnerabilities of their devices, this CVE is not part of it. In the Mitigation section any mention of Hitachi is also missing. Almost as if this device is not vulnerable.

There is some more weirdness, like it doesn't mention the "portal" feature is also vulnerable.

ebursztein 3 months ago

Thanks for looking in-depth in our post. The Hitachi RTU500 mention is not an hallucination, we did check for those. It is mentioned in the Mandiant threat intelligence data.
- infoSecer 3 months ago
  
  Have you considered that Mandiant is wrong? I cannot find any evidence that it would be vulnerable. Hitachi doesn't even appear to be a technology partner of Palo Alto (https://technologypartners.paloaltonetworks.com/English/dire...).
  As far as I can tell, the only connection between those is, that CISA released this alert which mentions multiple unrelated advisories in one post. Which happens to be the Siemens Palo Alto and another unrelated Hitachi advisory in RTU500: https://www.cisa.gov/news-events/alerts/2024/04/25/cisa-rele...
  - fc417fc802 3 months ago
    
    Isn't the tool doing its job in that case? I wouldn't generally expect it to independently determine that an otherwise reliable source made a mistake. In fact I feel like that would be a really bad idea.
    Imagine if a relatively clueless intern left something out of a report because the textbook "seemed wrong".
    
    infoSecer 3 months ago
    
    I don't really know what its job is to be honest.
    Saying that the input data is wrong and the AI didn't hallucinate that data is also kind of a "trust me bro" statement. The Mandiant feed is not public, so I cannot check what was fed to it.
    I don't really care why its wrong. It is wrong. And using that as the example prompt in your announcement is an interesting choice.

notepad0x90 3 months ago

I'm always torn apart when it comes to LLMs and analytical tasks. When you perform an analytical task, whether it is something simple like assessing the potential risk and impact of a vulnerability or complex like analyzing an obfuscated malware sample to determine its capabilities, you have to thoroughly go over the data points available to you, and corroborate the data points or evidence you are using to come up with conclusions. LLMs can help with a lot of this, but you still have to go over their reasoning (black-box mostly) or backtrack their work before you can accept their conclusions.

In other words, even with humans, their skills and experience are never enough. they have to show the reasoning behind their conclusions and then show that reasoning is backed up by an independent source of fact. Short of that, you can still perform analysis, but then you must clearly state that your analysis is weak and requires more follow-up work and validation.

So with LLMs, I'm torn up because they kind of make your life a lot easier, but does it just feel that way or are they adding more work and uncertainty where that is intolerable?

ziddoap 3 months ago

Could be great for augmenting a cybersec professional's tasks; I'm certainly interested in trying it. However, I fear it will not be used as just one of the tools in the toolbox, and rather it will be used as something to defer (and consequently shed liability) to.

booi 3 months ago

Has anybody been able to shed liability to AI yet?
- ziddoap 3 months ago
  
  In the legal sense? I'm not sure.
  In the corporate day-to-day? Absolutely.
- walleeee 3 months ago
  
  We have practiced the art of liability displacement from living, breathing human beings to artificial constructions for a lot longer than we've had a digital substrate for such

mmooss 3 months ago

Using AI systems for high-speed security actions, proactive and reactive, seems necessary but not sufficient:

I expect attackers will also use AI systems, trained on the latest in effective attacks. What about defense would make defenders' AI systems more effective than attackers'?

I think it's necessary because, if the attackers use AI systems then the defenders need to keep up.

Also, we need to be creating far more secure systems to start with. Now it is, to a degree, security through obscurity - something is secure when attackers can't find the bugs fast enough. Security through obsurity wouldn't seem to work well when the attacker uses AI software.

ZYbCRq22HbJ2y7 3 months ago

Does it seem like a bad idea to trust something that is probablistically correct with security?

amitport 3 months ago

Like with any automatic procedure: Are humans better?
Specifically, in their own example they are just citing Mandiant, which may itself be wrong...
https://news.ycombinator.com/item?id=43595294

bn-l 3 months ago

Maybe this has something to do with the wiz acquisition.

majestik 3 months ago

I read the article, and while it’s great the model can generate relevant output- so what? The article doesn’t discuss any action being taken using that output.

So what’s the big breakthrough here?