Hacker Newsnew | past | comments | ask | show | jobs | submit | jc4p's commentslogin

Just saw your edit -- I'm afraid to open source the code before refactoring it but if you reach out at hi@kasra.codes I'll send you the full ZIP!

Sorry I don't understand, you're saying the direct providers aren't the canonical source you'd recommend?

If I was running these on my own machine or GPU wouldn't the argument then be "Well you didn't use the real providers?"

For the record I started doing this approach because the Kimi team released this which was shocking to me: https://github.com/MoonshotAI/K2-Vendor-Verifier


yeah boutique providers are dime and dozen

they host the models on their own cloud machines and you just look at tokens/sec and price of tokens

you'll have to evaluate their APIs independently but that doesn't tend to be the issue


I was using the same harness for each run, the difference is from when I was running the harness locally on my machine before I pushed up the full runs.

Great point!

When I found the original exploit in an app I researched it took me around 15 minutes and some assistance from Claude.

For this project I gave myself the weekend + parts of Monday, so around 20 hours of dev time — at my standard rate that’s ~$5,000 of dev time.


I agree fully and hope someone else is able to do this test! For me it was a matter of cost and quotas that stopped me from changing to a new account.

Also just to mention:

Claude guardrails —> that session terminated.

GPT guardrails -> your whole account is slowed down.


Thank you for your note! As I mention in the post this is not scientific at all.

I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner?


Discovering vulnerabilities is a highly creative task, it's when you explore unsual paths that you discover atttack angles. Some bugs are simple, other are a complex orchestration of many factors.

By "Working with the model", is essentially reading the ouput of prompts and pointing in a direction just to decide the next steps. You could try to increase the prompt limit and create an agent that explores multiples directions in a DFS manner.

The issue with vulnerabilities is the agent not knowing when to stop because it's hard to validade if you reach the final result or not. I get amazing result when I code with AI, letting the AI go wild is just a waste a time and tokens.

I recommend you to read the write up on the crackme (https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), I think most experience developers would need, at least, 2 months of learning reverse engineering techiques to hopefully crack this one. GLM 5.1 manage to solve it, it didn't "copy pasted" any answer from it's training data. It did a binary analysis, anti debug patching, patching binaries, debugging memory during runtime etc. It only took about 20 minutes.

After seeing what GLM did, I do believe Anthropic concerns about Mythos are real. Cracking software just became a lot easier, too easy for my taste. Video games cheats will be the norm, cracked desktop apps without licenses and infected with malware. It's not a new thing but it just became too easy.


Thank you so much for this detailed answer!! Excited to dig into this world more :)

Maybe have a second model that is configured to nudge the first model in the direction of exploration, and have the two of them work in tandem?

I do a lot of AI work and right now the story for doing LLMs on iOS is very painful (but doing Whisper or etc is pretty nice) so this is existing and the API looks Swift native and great, I can't wait to use it!

Question/feature request: Is it possible to bring my own CoreML models over and use them? I honestly end up bundling llama.cpp and doing gguf right now because I can't figure out the setup for using CoreML models, would love for all of that to be abstracted away for me :)


That’s a good suggestion, and it indeed sounds like something we’d want to support. Could you help us better understand your use case? For example, where do you usually get the models (e.g., Hugging Face)? Do you fine-tune them? Do you mostly care about LLMs (since you only mentioned llama.cpp)?


Thank you! I’ve been fine tuning tiny Llama and Gemma models using transformers then exporting from the safetensors that spits out — My main use case is LLMs but I’ve also tried getting YOLO finetuned and other PyTorch models running and ran into similar problems, just seemed very confusing to figure out how to properly use the phone for this.


Thanks for sharing the details—that makes a lot of sense. Fine-tuning and exporting models on-device can be tedious nowadays. We’re planning to look into supporting popular on-device LLM models more directly, so deployment feels much easier. We'll let you know here or reach out to you once we have something


Hi all, i'm the security researcher mentioned in the article -- just to be clear:

1. The leak Friday was from firebase's file storage service

2. This one is about their firebase database service also being open (up until Saturday morning)

The tl;dr is:

1. App signed up using Firebase Auth

2. App traded Firebase Auth token to API for API token

3. API talked to Firebase DB

The issue is you could just take the Firebase Auth key, talk to Firebase directly, and they had the read/write/update/delete permissions open to all users so it opened up an IDOR exploit.

I pulled the data Friday night to have evidence to prove the information wasn't old like the previous leak and immediately reached out to 404media.

Here is a gist of Gemini 2.5 Pro summarizing 10k random posts: https://gist.github.com/jc4p/7c8ce9a7392f2cbc227f9c6a4096111...

And to be 100% clear, the data in this second "leak" is a 300MB JSON file that (hopefully) only exists on my computer, but I did see evidence that other people were communicating with the Firebase database directly.

If anyone is interested in the how: I signed up against Firebase Auth using a dummy email and password, retrieved an idToken, sent it into the script generated by this Claude convo: https://claude.ai/share/2c53838d-4d11-466b-8617-eae1a1e84f56

And here's the output of that script (any db that has <100 rows is something another "hacker" wrote to and deleted from): https://gist.github.com/jc4p/bc35138a120715b92a1925f54a9d8bb...


Doesn't that Gemini summary gist tie usernames to pretty specific highly personal non-public stories? That seems like a significant violation of ethical hacking principles.


They're anonymous usernames the app had them make and they were told don't use anything shared elsewhere and I googled and there's not any uniquely identifiable people from any of them.

They seem generic enough that I think it's okay, but you're right there is no need in including them and I should've caught that in the AI output, thank you!!


I think including specific stories is already an ethical hacking violation.

Including the pseudonyms associated with those stories creates unnecessary risk of, and arguably incentive for those individuals.

I also just don't get the mindset of dumping something like this into an AI tool for a summary. You say "a 300MB JSON file that (hopefully) only exists on my computer" but then exposed part of that data to generate an AI summary.

Having the file on your computer is questionable enough but not treating it as something private to be professionally protected is IMHO another ethical violation.


I don't see the need for the AI output to begin with. Normally pen-testers just demonstrate breaches, this is more like exposing what users do on the app.


Are you concerned about potential CFAA issues?


Yes! haha! But hopefully I have a good enough support group and connections that I'll be ok if that happens, I just really wanted to prove that they were not being honest when they said it was data prior to 2024.


Computer Fraud and Abuse Act - "CFAA"


i've been trying to keep up with this field (image generation) so here's quick notes I took:

Claude's Summary: "Normalizing flows aren't dead, they just needed modern techniques"

My Summary: "Transformers aren't just for text"

1. SOTA model for likelihood on ImageNet 64×64, first ever sub 3.2 (Bits Per Dimension) prev was 2.99 by a hybrid diffusion model

2. Autoregressive (transformers) approach, right now diffusion is the most popular in this space (it's much faster but a diff approach)

tl;dr of autoregressive vs diffusion (there's also other approaches)

Autoregression: step based, generate a little then more then more

Diffusion: generate a lot of noise then try to clean it up

The diffusion approach that is the baseline for sota is Flow Matching from Meta: https://arxiv.org/abs/2210.02747 -- lots of fun reading material if you throw both of these into an LLM and ask it to summarize the approaches!


You have a few minor errors and I hope I can help out.

  > Diffusion: generate a lot of noise then try to clean it up
You could say this about Flows too. The history of them is shared with diffusion and goes back to the Whitening Transform. Flows work by a coordinate transform so we have an isomorphism where diffusion works through, for easier understanding, a hierarchical mixture of gaussians. Which is a lossy process (more confusing when we get into latent diffusion models, which are the primary type used). The goal of a Normalizing Flow is to turn your sampling distribution, which you don't have an explicit representation of, into a probability distribution (typically Normal Noise/Gaussian). So in effect, there are a lot of similarities here. I'd highly suggest learning about Flows if you want to better understand Diffusion Models.

  > The diffusion approach that is the baseline for sota is Flow Matching from Meta
To be clear, Flow Matching is a Normalizing Flow. Specifically, it is a Continuous and Conditional Normalizing Flow. If you want to get into the nitty gritty, Ricky has a really good tutorial on the stuff[0]

[0] https://arxiv.org/abs/2412.06264


thank you so much!!! i should’ve put that final sentence in my post!


Happy to help and if you have any questions just ask, this is my jam


Hi! I have a WIP of this over at https://talktrainer.app/ -- I just added Dutch to it.

It uses OpenAI's realtime API to simulate either a tutoring session (the speaker will revert to English to help you) or a first date or business meeting (the speaker will always speak the target language)

You can see the AI's transcriptions but not your own, limitation of the current OpenAI API but definitely something I can fix.

The prompts are like this: https://gist.github.com/jc4p/d8b9d121425ec191d62602d8720eeed... and the rest of it is a Nextjs app wrapped around the WebRTC connection.

I'm not fully in love with the app so I'd love any feedback or hearing if it works well for you -- It doesn't have a lot of features yet (including saving context) and if you bump into the time limit just open it up in incognito to keep going.


This is great! Maybe some more tourist-related scenarios, like "ordering at restaurant", "resolving dispute about rental car crash" etc? :-)

The "next level" feature would be to get it to speak even simpler, with some hints about how to reply, for the beginners. I don't know how that would ideally look, but maybe a button to pop up some "key words" or phrases that one could use? (Even so, I found myself using the little I know, so it's obviously somehow working even though my knowledge is extremely basic.)

This is one of the places where I feel LLM's can do something good for the world, giving a safe playground for getting experience with speaking new languages without the anxiety of performing badly in front of other people – and hopefully make it easier to connect with real people in that language later.


This is really impressive! Great job.

One small piece of feedback… There were a couple times where I asked to learn something, and it asked me to repeat a phrase back, which was great. But when I repeated it back, I know I didn’t quite nail it (eg perhaps said “un” instead of “una”) and rather than correcting me, it actually told me I did it perfectly. Maybe there’s some tuning with the prompts that may help turn down the natural sycophancy of the model and make sure it’s a little more strict.

Keep up the great work!


One modification I would suggest is to add a bit more to the initial prompt like:

"write as if you are a person from {{REGION}}. Modify your language to proficiency level {{PROFICIENCY_LEVEL}}"

that way I could for example, speak as if it's someone using Mexican Spanish vs Madrid Spanish vs Chilean Spanish, etc.

Secondly, you could include the user's speech transcribed as part of the conversation window


Amazing idea, do you think this should be a freeform text field the user can enter to add their own prompts to or should it be a checkbox/select on the homepage so the user can pick from a limited set?


I think a drop down when you first choose the language, and it can be optional. You can test it with a few languages at first, to see how it is.


Bit of feedback:

I've learned Japanese a while back but haven't practised in a long time.

1. it would be awesome if this could transcript what I just said in japanese to be sure that it got me

2. I don't know kanjis that well, so reading is hard, having a button to have the AI repeat the sentence would be quite useful.

Other than that, I could definitely use something like that for practice


Did you just add Dutch as per the submitter’s request or was it part of your plan prior?

Curious because I’m trying to learn Romanian, and since it’s a less common language there are fewer resources available. So I wasn’t sure if you added Dutch with minimal amount of effort following the poster’s request.

That said, I gave your app a try with Spanish and it looks pretty good! But I didn’t see a Help page to clarify how I’m “supposed” to interact. Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners. Although full immersion is much better way to learn.

I can try playing around more with it to give you some feedback.


> Eg I tried saying in English “I don’t understand” (even though I know how to say that in Spanish) and it responded in Spanish which may be hard for absolute beginners.

I tried to use ChatGPT as a "live" translator with my in laws and I noticed it is extremely bad at language "consistency" or at understanding your intent when it comes to multiple languages.

It will sometimes respond in English when you talk to it in the foreign language, it will sometimes assume that a clear instruction like "repeat the last sentence" needs to be translated, etc.

I don't know how the person above is approaching the problem but your experience is consistent with mine and I don't think GenAI models (at least OpenAI ones) are suitable for the task.


I just added Romanian for you -- here's the entire diff for adding a new language (as long as it's in OpenAI's training data) -- https://images.kasra.codes/romanian_diff.png

Please let me know if it works, and I'll definitely work on adding in instructions for the expected interactivity, thank you!


I'm a native Dutch speaker and tried this out for a bit. It works impressively well although it might be challenging for complete beginners. Maybe you can add an option for the trainer to use more simple language for beginners?

I tried practicing some verb conjugations. The trainer displayed some fill-in-the-blank sentences like "she ... home after class", asking me to conjugate "to walk" in that sentence. However, the audio actually pronounced the full sentence "she walks home after class", giving away the answer.


This is great! Well done.

I've used the realtime API for something similar (also related to practicing speaking, though not for foreign languages). I just wanted to comment that the realtime API will definitely give you the user's transcriptions -- they come back as an `server.conversation.item.input_audio_transcription.completed` event. I use it in my app for exactly that purpose.


Thank you so much!! While the transcription is technically in the API it's not a native part of the model and runs through Whisper separately, in my testing with it I often end up with a transcription that's a different language than what the user is speaking and the current API has no way to force a language on the internal Whisper call.

If the language is correct, a lot of the times the exact text isn't 100% accurate, if that's 100% accurate, it comes in slower than the audio output and not in real time. All in all not what I would consider feature ready to release in my app.

What I've been thinking about is switching to a full audio in --> transcribe --> send to LLM --> TTS pipeline, in which case I would be able to show the exact input to the model, but that's way more work than just one single OpenAI API call.


Heyo, I work on the realtime api, this is a very cool app!

With transcription I would recommend trying out "gpt-4o-transcribe" or "gpt-4o-mini-transcribe" models, which will be more accurate than "whisper-1". On any model you can set the language parameter, see docs here: https://platform.openai.com/docs/api-reference/realtime-clie.... This doesn't guarantee ordering relative to the rest of the response, but the idea is to optimize for conversational-feeling latency. Hope this is helpful.


Ah yes, I've seen that occasionally too, but it hasn't been a big enough issue for me to block adoption in a non-productized tool.

I actually implemented the STT -> LLM -> TTS pipeline, too, and I allow users to switch between them. It's far less interactive, but it also gives much higher quality responses.

Best of luck!


Just tried this for Spanish and it works incredibly well. I have been hacking on something similar for translation (it's really quite easy too, just a few prompts), but I was using Google Translate's interface for vocalizing! This is seriously good stuff, really nice work putting it together.

I will probably use something like this for language practice.


I just tried it and it works perfectly. The color scheme and font size could be touched up to look better. Just out of curiosity, is $10/month enough to cover the (unlimited) API cost? Do you estimate how many percentage of your users will use more than $10 API fee each month?


Thanks so much for trying it out! The realtime API is actually very cheap especially for short connections, for each user who uses it 30 minutes a day every day in a month it costs me ~$5 and I assume the average user is going to use it way less than that (although i have 0 users right now haha)


Please add Mandarin Chinese! :) would love to try this


What about Polish..? :-)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: