Show HN: Japanese City Name Generator – Using a Simple 3-Layer MLP

citygen.freemanjiang.com

17 points by freemanjiang 19 hours ago

I trained and deployed my first model: a Japanese city name generator using just a 3-layer MLP under the hood. It runs in the browser fully locally on the onnx runtime.

Trained on <2,000 real Japanese city names, what's interesting is that on this simple task the simple MLP performed better than more complex models which tended to overfit and generate existing names.

stuartcw 9 hours ago

If you used the kanji names of the cities and towns it would be a lot more realistic.

I’ve lived in Japan since 1988 and this just seems like a list of jibberish to me. Japanese city names are, like English city names, made up of meaningful components i.e. Newbridge, 新橋,しんばし, Shinbashi. So there is nothing to get a hook on. It’s just syllables.

Try it with 2000 English city names and you will get the same quality of output.

freemanjiang 11 hours ago

One thing is that this is trained on an English, character-level representation of kana characters, so it's possible it generates names that are not legal in the Japanese syllabary

  • RestartKernel 10 hours ago

    Have you tried approaching this with the kanji instead? That seems like free tokenisation.

cedws 15 hours ago

I got Kanegawa, which is a real place, so I'd say it's pretty accurate!

  • fph 14 hours ago

    Maybe that name already was in the training set tho?

ranger_danger 13 hours ago

I'm not sure why ML is even necessary? Practically every combination of characters (kana characters, where there's always a vowel at the end of each mora unless it's an "n") is already valid and doesn't even sound weird.

Can someone explain how a random() function given a list of kana characters could not produce equally as good names?

  • freemanjiang 11 hours ago

    Hmm I'm not convinced that uniform sampling from all possible kana characters necessarily leads to Japanese-sounding city names. I think the actual distribution does have a pattern (eg. yama appearing more frequently).

    Here are 50 ones I got Claude to generate from the uniform distribution: ['wamorumura', 'sohikotake', 'hiteitewau', 'romekarumu', 'nehami', 'miruyake', 'shiyuhaki', 'ahiyo', 'homaso', 'chionohoratsu', 'akusoyo', 'kiuhi', 'karoso', 'suhoheso', 'muchichi', 'mahakekanuto', 'usatsuwotoro', 'namusu', 'sokomeni', 'hakureromake', 'tosukonuka', 'haokehaso', 'nsesutemei', 'womiku', 'noereyasou', 'suyakenosu', 'ritasaifuka', 'ruremoteshi', 'yuhowotsuhie', 'torarenumeho', 'rutsueto', 'hamiakaki', 'sutsuyosano', 'yasotawaku', 'kihaso', 'koairieke', 'hosuriihiwa', 'horotowanno', 'wokiu', 'tanasochiriwo', 'otosetanu', 'rakamotorure', 'hawaniu', 'emoshiratsuhe', 'naroman', 'mohaesa', 'soniruta', 'nofuni', 'kayatakera', 'natayamume']

  • asukachikaru 10 hours ago

    Because Japanese words aren't simply a string of random characters, like a string of eight English alphabets doesn't suddenly make it meaningful city names such as Reading or Brighton.