Show HN: Japanese City Name Generator – Using a Simple 3-Layer MLP

citygen.freemanjiang.com

26 points by freemanjiang 4 months ago

I trained and deployed my first model: a Japanese city name generator using just a 3-layer MLP under the hood. It runs in the browser fully locally on the onnx runtime.

Trained on <2,000 real Japanese city names, what's interesting is that on this simple task the simple MLP performed better than more complex models which tended to overfit and generate existing names.

ranger_danger 4 months ago

I'm not sure why ML is even necessary? Practically every combination of characters (kana characters, where there's always a vowel at the end of each mora unless it's an "n") is already valid and doesn't even sound weird.

Can someone explain how a random() function given a list of kana characters could not produce equally as good names?

freemanjiang 4 months ago

Hmm I'm not convinced that uniform sampling from all possible kana characters necessarily leads to Japanese-sounding city names. I think the actual distribution does have a pattern (eg. yama appearing more frequently).
Here are 50 ones I got Claude to generate from the uniform distribution: ['wamorumura', 'sohikotake', 'hiteitewau', 'romekarumu', 'nehami', 'miruyake', 'shiyuhaki', 'ahiyo', 'homaso', 'chionohoratsu', 'akusoyo', 'kiuhi', 'karoso', 'suhoheso', 'muchichi', 'mahakekanuto', 'usatsuwotoro', 'namusu', 'sokomeni', 'hakureromake', 'tosukonuka', 'haokehaso', 'nsesutemei', 'womiku', 'noereyasou', 'suyakenosu', 'ritasaifuka', 'ruremoteshi', 'yuhowotsuhie', 'torarenumeho', 'rutsueto', 'hamiakaki', 'sutsuyosano', 'yasotawaku', 'kihaso', 'koairieke', 'hosuriihiwa', 'horotowanno', 'wokiu', 'tanasochiriwo', 'otosetanu', 'rakamotorure', 'hawaniu', 'emoshiratsuhe', 'naroman', 'mohaesa', 'soniruta', 'nofuni', 'kayatakera', 'natayamume']
asukachikaru 4 months ago

Because Japanese words aren't simply a string of random characters, like a string of eight English alphabets doesn't suddenly make it meaningful city names such as Reading or Brighton.

stuartcw 4 months ago

If you used the kanji names of the cities and towns it would be a lot more realistic.

I’ve lived in Japan since 1988 and this just seems like a list of jibberish to me. Japanese city names are, like English city names, made up of meaningful components i.e. Newbridge, 新橋,しんばし, Shinbashi. So there is nothing to get a hook on. It’s just syllables.

Try it with 2000 English city names and you will get the same quality of output.

freemanjiang 4 months ago

One thing is that this is trained on an English, character-level representation of kana characters, so it's possible it generates names that are not legal in the Japanese syllabary

RestartKernel 4 months ago

Have you tried approaching this with the kanji instead? That seems like free tokenisation.

cedws 4 months ago

I got Kanegawa, which is a real place, so I'd say it's pretty accurate!

fph 4 months ago

Maybe that name already was in the training set tho?
ghfhghg 4 months ago

Can you spell it?
gammastipend 4 months ago

[dead]

kazinator 4 months ago

@freemanjiang, you might enjoy jp-hash: https://www.kylheku.com/cgit/jp-hash/about/

neuraldenis 4 months ago

Can you please share a training tutorial? Thank you!

gammastipend 4 months ago

[dead]