Compact Anagram Code

The anagram question is one of the most basic and popular interview coding questions around for entry level cs/data science positions. Below is a quick two-line implementation of it for both Python and R.

R
input_list = c('bat', 'tabs', 'tab', 'mouse', 'cat', 'act', 'tac')
sorted_list = sapply(input_list, function(word) paste(sort(unlist(strsplit(word,split = ""))),collapse=""))
input_list[sapply(sorted_list, function(word) sum(sorted_list == word) > 1)]

Python
input_list = ['bat', 'tabs', 'tab', 'mouse', 'cat', 'act', 'tac']
sorted_list = [sorted(word) for word in input_list]
[input_list[i] for i, word in enumerate(sorted_list) if sorted_list.count(word) > 1]

Won’t win any speed contests but suffices for compactness and readability.

Installing Redhat Linux on VirtualBox on a Windows host

I couldn’t find a definitive guide that listed all the steps for installing Redhat on a Oracle’s VirtualBox so the below is a summary of the steps I took to get things up and running. I also installed Ubuntu on my virtualbox but everything about that installation was easier, from finding the single iso file on their website to simply using the default virtual configurations.

  1. Download and install VirtualBox  – the default settings should work.
  2. Download your flavor of Redhat iso images.
    1. I don’t know much about the different flavors but I chose Red Hat Enterprise Linux AS (v. 3 for x86)
    2. There are 4 binary disks that each need to be downloaded. Some flavors have fewer disks but make sure you download all of the disks.
  3. Open up VirtualBox
  4. Click New
  5. Name your machine and choose Redhat
  6. Allocate memory size (might be better to give it more than the 512MB default)
  7. Create a virtual hard drive which will be moved later on . Keep the default type as VDI
  8. I chose dynamically allocated because it was taking a long time to create but fixed size will yield faster performance during actual use
  9. I gave myself more disk space 16GB
    1. I launched my machine at this point, which prompted for a start-up disk. I gave the location of the iso file and machine booted.
    2. But I ran into a problem of the machine not finding a hard drive. If this is ignored in the setup, a further error “No Devices Found” appeared, which shut down my machine.
  10. To solve this error, go back to the VirtualBox manager and click on settings – > storage. Notice the .vdi hard disk in the ‘Controller: SATA” Storage tree panel
  11. Delete the .vdi file
  12. Move up to the controller: IDE file and click on the add hard disk icon.
  13. Create a new VDI disk
  14. I don’t think this matters but you can also add one of the start-up disks (and only one) but clicking the add disk icon while still here in the storage section of settings.
  15. Add disk 1 from the 4 iso images
  16.  From here I just followed the default settings until I was prompted for disk 2
  17. Whats important to know is that once you are in the VirtualBox you cannot move your mouse outside of it. At the bottom right hand corner of the virtualbox there will be in writing the key to press to regain control of the mouse for your host computer. I believe it defaults to the right control key.
  18. Press the right control key and go the Devices menu -> CD/DVD Devices -> and choose the next iso disk that is needed.
  19. Repeat this for all the other disks
  20. You will have to skip over the registration parts and then it should be ready to use.
  21. For a more customized approach during the actual setup, watch this video

API for Triplet Extraction from any Sentence

While attempting to discover latent topics for an assignment at work, I ran into the field of information extraction. A simple data model for information extraction is a RDF (Resource Description Framework). The RDF relates entities by  the subject-predicate-object format where the subject and object are related to one another by the predicate. The triple is a minimal representation for information.

Here are some examples of some simple relations in subject-predicate-object format:

  • Houston – is located in – Texas
  • Ted – is the son of – Steve
  • Elvis – is buried in – Graceland

This triple format can be used to pull information from any sentence. To aid with this extraction I found a paper that explained in great detail the algorithm for extracting the triplet. To begin the process of triplet extraction it was necessary to download the Stanford Parser and then utilize python’s great NLTK package to parse the sentence in an NLTK readable format. Once the sentence was parsed, the algorithms from the paper were implemented.

The outcome generated a subject, predicate, object as well as attributes for each item in the triplet. I formatted the results into a JSON object and now have them readily for anyone to use at the following URL – my very first API.

To access the triple, use this url and enter in your sentence: http://www.newventify.com/rdf?sentence=”your sentence here”

Heres an example:

http://www.newventify.com/rdf?sentence=%22The%20man%20stood%20next%20to%20the%20refrigerator%22

The code for this miniproject can be found here on Github

D3 in wordpress