Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, exactly. You want to randomize the parts that are irrelevant. For example, if you're classifying news articles, you may want to shorten them anyway. A human would be able to tell what category an article belongs to without reading the whole thing - so may do a combination of URL, headline, beginning, middle, and/or end. And if you do that, it's easy to turn one training example into 10 or more. You just vary the length of the individual parts.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: