Wednesday, June 6, 2007

National Internet Safety Month: what does your IP address reveal about your location?

June is national Internet safety month. To show support for this cause I will post a series of blogs addressing issues of Internet safety. Some postings will be to promote our new safe search engine for kids, Piffany, but many will be much more general. My first post is regarding the cardinal rule of Internet safety:

Never give out personal information that might give away where you live

This is obviously a very good rule to remember, but how safe are you really? Did you know that your geographical location can be determined from your IP address? Scary, huh? It's true. An IP address is a unique identifying number used by computers to route content on the Internet from computer to computer. Do not confuse them with URLs. IP addresses look like 24.91.135.203, whereas URLs look like www.piffany.com. An IP address is like your computers phone number or street address--some cell phones have them too. Every computer that accesses the Internet is assigned an IP address.

One free service I found returned the image seen below when I visited there. Yup, that's me (red ball slightly above and to the right of Cambridge). It also listed detailed information about my city, and nearby cities and towns, including latitude and longitude. If I were, say, a high school student, how hard would it be to track down my high school? Not hard. If you knew my last name and my geographical location, an Internet phone directory will give you my my phone number, from which my address easily follows using a reverse lookup service. The reverse lookup I tried even offered a service for unlisted numbers. My number is unlisted, and it returned my old address in nearby Cambridge. I guess I'm safe for now.

Will these sites show me the location for an IP address other than my own? Yup, I tried a friends to confirm that the location was correct. By the way, I am intentionally leaving out information about the sites I used for this blog, but they are easy to find on the Internet.

How Hard is it for someone to get my IP address?

Fortunately, not just anyone can get your IP address. If you host a website on your own server then you can easily obtain the IP addresses of people that visit your site. There are also many CGI scripts available that can be installed on hosted sites. A potential predator could lure you to their server with the malicious intention of retrieving your IP address. This requires some sophistication and judging by the characters caught in the act by Chris Hansen on Dateline's To Catch a Predator, most Internet predators are not savvy enough to pull something like this off, or at least, let's hope not. The most difficult step in locating someone on the Internet is getting their IP address, and that's not too difficult. After that, relatively little work is required before a predator is knocking on your door.

What can be done to safeguard against this?

You can't spoof (fake) an IP address. You can spoof just about anything else, but not your IP address because that is the identifier by which computers on the Internet ensure content is delivered to the right computer. One possibility for hiding your IP address is by using a proxy server, but I am not aware of any services that offer proxy servers for this purpose. It would be expensive for Piffany to offer such a service, though we have discussed it in the past. In general, before committing to a chat room or any website that is social, make sure it doesn't display the IP addresses of its users. You have probably seen before entries like 'David is logged in from 24.91.135.203'.

What does Piffany plan to do about this problem?

For starters, we won't display a users IP address to the public. Piffany's CeSAR algorithm does make it unlikely to find in our search results a website set up with malicious intent, because like PageRank, HITS, and other authority based algorithms, CeSAR uses link structure to determine a sites rank. So friendly communities of sites that you trust won't link to those malicious sites. CeSAR ranks web pages by their proximity to clusters of sites on a particular topic, of interest to a particular group, or that are frequented by a particular age group. Hence, if a website is not acknowledged by an established cluster frequented by 8-10 year old kids, for example, then it will not likely be listed highly in a search performed by 8-10 year old kids. In order to make this trait more effective we only return the top 100 search results. Also, we are experimenting with systems and methods to verify that users who register as kids are actually kids; any suggestions on how to do this are welcome. Initially, we were thinking that we might ask the potential registrant a question about their school. With a lot of users, statistics can help identify legitimate answers, but initially, we will just have to verify them ourselves. The best hope we have is that responsible users will report to us when they find a suspicious user or site.

That's it for now. I hope to see some stimulating comments, so feel free to respond through the link below, and remember,
beware of web sites that publicly display your IP address.

Stay tuned to this blog for our next topic: Internet content rating systems.
Suggest a topic to me at david(at}piffany[dot]com


About Piffany
Why should a 9-year-old and a 29-year-old get exactly the same search results? Piffany is a search engine that strives to safely bring the full potential of the Internet to kids by allowing them to adjust the difficulty level of their search results. To ensure the safety of our search results, we are researching a new Internet content and safety rating system that is similar to ratings given for TV, movies, and video games. Visit us at Piffany.com.

Tuesday, June 5, 2007

18 Bugs and Climbing

As we prepare to invite beta testers to our site, the list of bugs found by the Piffany team and some of our friends (thanks to those of you who have been helping) is at 18 and climbing at last count. We have fixed six of these, and are working our way down the list. This is really a lot of work!

Most annoying bug: Browser compatibility, hands down. Since this is obviously going to be a problem, it would be great if we could see what our users are seeing. I found a tool online that looks promising. TapeFailure is a nice Web 2.0 analytics application that records screencasts (videos) of users interacting with your web page. I don't know if it will work for us, since we do not serve static pages only, but I think we will try it out. If anyone knows of a free, or cheaper, version of this then let me know.

Most Bizarre bug: I implemented the advanced logic modules over the weekend and they worked fine through my administrative portal (a separate server available only from my computer that allows me to explore the databases behind Piffany in more detail), but some of the features didn't work at all through the public front-end even though the snippets of code were the same. I fixed the problem by deleting the old back-end search server and copying the administrative server in its place. Advanced logic allows you to input logical operators like AND, OR, and NOT (quotation marks for exact phrase matching) in order to refine search results. Users can use them by typing uppercase operators alongside search tokens, e.g., Harry NOT Potter, but we are seeking ways to implement them automatically or at least in an easier fashion so kids can use them.


Well, there are probably 19 bugs by now, so I had better get back to work.