frank devilbiss

Modeling Enthusiast & Data Scientist

  • Interval Workout Audio Tracks Automated by pydub and gTTS

    Introduction project It is one month into 2020 and I am still holding onto my new year’s resolution with a white-knuckled grasp. This resolution is simply to workout for around 10 minutes every day. Part of the reason I have yet to fall off of the horse is that it’s a relatively easy resolution. I know that I’m not going to become extremely fit with such a light routine, but I do have the confidence that it is much better to do a little of something every day than to do nothing ever.

    Read more…
  • A Command Line Optical Character Recognition Tool

    Project Link Recently, I needed to translate a set of image files into one long text file. Optical character recognition or OCR is an old technology that converts images into text and there are a large number of GUI tools that will combine extract text from images. That being said, I had difficulty finding software that would combine multiple images into one text file. While looking for this capability, a thought struck me.

    Read more…
  • Visualizing Relationships in Harry Potter Using Language Processing

    Project Page Introduction One of the capabilities that makes Natural Language Processing (NLP) thrilling to me is the potential to automatically summarize a corpus of text. When a human summarizes a document, the cognitive process that differentiates important information from the noise seems rather complex. Establishing the importance of a character or an event can feel subjective. Was the fact that Ronald Weasley had a pet rat that important in either of the first two books in the Harry Potter series?

    Read more…
  • Faster Flu Updates Using CDC Reporting and Regression Modeling

    Link to Project Introduction Thanks to having newborn with a developing immune system at home, I have been stricken by a mild case of hypochondria. This hypochondria is exacerbated by the fact the current flu season is especially ominous and has been particularly harsh on young children. To keep track of where we are in the flu season, the CDC publishes a weekly report every Friday. Thanks to my current state of mind, I am checking this weekly but am frustrated with the frequency with which the reports get updated.

    Read more…
  • Feature Selection - Feature Filtering

    Too many features? Features are the building blocks of models. Too many building blocks can be a bad thing, however. This problem is quite apparent when developing natural language processing (NLP) models. NLP features, in many cases, are composed of either word counts or normalized frequencies of words that occur in a corpus of text. If the modeled corpus contains 10,000 unique words, that can translate to 10,000 possible features.

    Read more…
  • Brewer's Dictionary of Phrase and Fable

    Github Project Introduction On Monday night, we were driving to the library and a 30-year-old recording of Casey Kasem’s American Top 40 was playing on the 80s station. To preface the next song, Stevie Wonder’s Skeletons, Casey read an excerpt from Brewer’s Dictionary about skeletons: The family skeleton, or the skeleton in the cupboard.Some domestic secret that the whole family conspires to keep to itself; every family is said to have at least one.

    Read more…