If you used the kanji names of the cities and towns it would be a lot more realistic.
I’ve lived in Japan since 1988 and this just seems like a list of jibberish to me. Japanese city names are, like English city names, made up of meaningful components i.e. Newbridge, 新橋,しんばし, Shinbashi. So there is nothing to get a hook on. It’s just syllables.
Try it with 2000 English city names and you will get the same quality of output.
One thing is that this is trained on an English, character-level representation of kana characters, so it's possible it generates names that are not legal in the Japanese syllabary
I'm not sure why ML is even necessary? Practically every combination of characters (kana characters, where there's always a vowel at the end of each mora unless it's an "n") is already valid and doesn't even sound weird.
Can someone explain how a random() function given a list of kana characters could not produce equally as good names?
Hmm I'm not convinced that uniform sampling from all possible kana characters necessarily leads to Japanese-sounding city names. I think the actual distribution does have a pattern (eg. yama appearing more frequently).
Here are 50 ones I got Claude to generate from the uniform distribution: ['wamorumura', 'sohikotake', 'hiteitewau', 'romekarumu', 'nehami', 'miruyake', 'shiyuhaki', 'ahiyo', 'homaso', 'chionohoratsu', 'akusoyo', 'kiuhi', 'karoso', 'suhoheso', 'muchichi', 'mahakekanuto', 'usatsuwotoro', 'namusu', 'sokomeni', 'hakureromake', 'tosukonuka', 'haokehaso', 'nsesutemei', 'womiku', 'noereyasou', 'suyakenosu', 'ritasaifuka', 'ruremoteshi', 'yuhowotsuhie', 'torarenumeho', 'rutsueto', 'hamiakaki', 'sutsuyosano', 'yasotawaku', 'kihaso', 'koairieke', 'hosuriihiwa', 'horotowanno', 'wokiu', 'tanasochiriwo', 'otosetanu', 'rakamotorure', 'hawaniu', 'emoshiratsuhe', 'naroman', 'mohaesa', 'soniruta', 'nofuni', 'kayatakera', 'natayamume']
Because Japanese words aren't simply a string of random characters, like a string of eight English alphabets doesn't suddenly make it meaningful city names such as Reading or Brighton.
I’ve lived in Japan since 1988 and this just seems like a list of jibberish to me. Japanese city names are, like English city names, made up of meaningful components i.e. Newbridge, 新橋,しんばし, Shinbashi. So there is nothing to get a hook on. It’s just syllables.
Try it with 2000 English city names and you will get the same quality of output.
Can someone explain how a random() function given a list of kana characters could not produce equally as good names?
Here are 50 ones I got Claude to generate from the uniform distribution: ['wamorumura', 'sohikotake', 'hiteitewau', 'romekarumu', 'nehami', 'miruyake', 'shiyuhaki', 'ahiyo', 'homaso', 'chionohoratsu', 'akusoyo', 'kiuhi', 'karoso', 'suhoheso', 'muchichi', 'mahakekanuto', 'usatsuwotoro', 'namusu', 'sokomeni', 'hakureromake', 'tosukonuka', 'haokehaso', 'nsesutemei', 'womiku', 'noereyasou', 'suyakenosu', 'ritasaifuka', 'ruremoteshi', 'yuhowotsuhie', 'torarenumeho', 'rutsueto', 'hamiakaki', 'sutsuyosano', 'yasotawaku', 'kihaso', 'koairieke', 'hosuriihiwa', 'horotowanno', 'wokiu', 'tanasochiriwo', 'otosetanu', 'rakamotorure', 'hawaniu', 'emoshiratsuhe', 'naroman', 'mohaesa', 'soniruta', 'nofuni', 'kayatakera', 'natayamume']