PDA

View Full Version : Attention Serious Programmers!



I.am
08-12-2003, 03:25 AM
I want to write my first web-bot. Could anyone help me out here by pointing to the right website or a tutorial or just some tips as how to create a bot.
What language is the best to write one?

I am going to write a smiple bot that will not do much but only search for a "screen name" over all the web and will send me back the websites where the screen name is found. This is just for fun. Later I want to make a search engine bot...

Help me out here!
http://www.allamericanguys.com/infopopstuff/emoticons/smokin.gif
Thanks
I.am

Xilo
08-12-2003, 03:34 AM
Here is a really good article for VB.Net http://www.msdn.microsoft.com/msdnmag/issu...ET/default.aspx (http://www.msdn.microsoft.com/msdnmag/issues/02/10/SpiderinNET/default.aspx)

chalkmongoose
08-12-2003, 06:13 AM
Look up information on the following using Google:
Content indexing, with meta-tags for optimal results
XMLHTTP parsing structure, and DOM structure
XMLHTTP component stability, WinHTTP component stability
Differences between XMLHTTP and WinHTTP parsing structure
Multi-domain channeling
Cross-referencing using stored indexes
Regular Expressions, with cursory Linux background

These are basically what you'd need to do this. First of all, I'd make a decision as to whether to use XMLHTTP and WinHTTP, as both have advantages.

I.am
08-12-2003, 06:22 AM
Thanks to both of you! If you have used to create one then you guys could probably recommend which is the best one out of the two.
Any recommendations???

I.am
08-12-2003, 06:23 AM
What would be the easiest web-bot to create? So its not so complex to do it and i learn something while i make it?

Eradication
08-12-2003, 07:53 AM
What exactly is a web bot?

I'm creating a Diablo II bot for Pindleskin.

monica_green_22
08-12-2003, 08:21 PM
I highly reccommend using C++ as the language of implementation for your bot.

This is currently your best choice, as it gives you the flexibility to design th actual program any way you want, and can interface with your OSes API natively (without using some form of high level abstraction). Not only will you be able to write better code, but it will run faster, be easier to debug and if you write it well, it will be very easy to port to other computers.

As for the design itself, please clarify what exactly you want your bot to do?
From what I can understand, you want your bot to be able to crawl the internet or an intranet and gather screen-names. If this is the case, you will have a hard time programming algorithmic heuristics to identify whether a given text contains a screen-name, especially given the nature of screen-names of most people.

Therefore, I suggest you implement a simple neural network which you can train to detect and discern the nature of screen-names. To this end, I suggest using a the Hopfiled neral network topology, with feed-forward links, and backpropogation for learning. You can then create a set of pages with known results, and train it to detect the screen-names. Obviously you will need to train it with a lot of pages (about 100) to get a good accuracy, and the more you train it with, the more it will learn and detect more accurately.

If you want, I can provide you with more information, and sample implementations on this.

For complete accuracy (as good as an intelligent human) you will have to implement fuzzy logic, and create fuzzy sets which the neural network can use to train itself with, thereby automating the entire process. Obviously,you would have to implement some form of cooperation and competition, with a genetic algorithm.

Hope that helps,

Monica

I.am
08-12-2003, 08:31 PM
Thanks Monica, that was exactly I was trying to find about.


If you want, I can provide you with more information, and sample implementations on this.

Yes, it would be really nice if you could do that.

As for the bot, I understand what you are saying. In Php forms there must be a page something like members.php, what if i train the bot only to search within the php pages and also more only within mem???.php.

Right now, I have no skills as such regarding creating one. All I have is logic, and I believe if you have the right logic you can program in anything.
I would really appreciate if you could help me out in this.
Thanks,
I.am

Xilo
08-12-2003, 08:33 PM
Ya, any of the languages in Visual Studio .NET would be good since there are a ton of added functions for the web and that...

monica_green_22
08-12-2003, 08:53 PM
Firstly, contrary to the advice given by many of the other replies to this post, I would recommend that you avoid using any .NET based language, or even Visual Studio at all for that matter. For those about to flame me, this is not born out of a personal dislike for the MS environment, but rather out of experience. While MS's libraries are great for little utility programs, their limitations soon become apparent when you try to handle large or non-trivial data. Additionally, its is a very bad idea to come to depend on those libraries as it will limit the scope of your program and consequently it will limit you too. Its a far better idea, and much more beneficial to yourself (as a learning experience) to write your own implementation, relying on nothing but the API provided by your OS (which can be abstracted from your code easily, ensuring maximum portability).

Having said that, I would like to state that using the MS libraries is a good idea if you are on a deadline and the limitations are not as big a concern as your release date.

As to the code itself, I highly recommend the following books: Neural Networks: A Comprehensive Foundation (2nd Edition) - by Simon S. Haykin
Practical Neural Network Recipes in C++ - by Timothy Masters
Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks - by Russell D. Reed, Robert J. Marks II
Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems - by Trung Tat Pham, Guanrong Chen
Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence - by Jyh-Shing Roger Jang, Chuen-Tsai Sun, Eiji Mizutani
Genetic Algorithms and Fuzzy Multiobjective Optimization - by Masatoshi Sakawa

And for sample implementations, I recommend the following (from simplest, to most complex): JNeural - http://www.voltar-confed.org/jneural/
SCN Artificial Neural Network Library - http://www.sentinelchicken.org/projects/scnANNlib/
Open Desire - http://members.aol.com/_ht_a/gatmkorn/opendesire.htm

Feel free to contact me for any more information.

I cannot be more specific to your problem without knowing more about the objective of the bot and the scale of the implementation.

Monica

I.am
08-12-2003, 09:12 PM
Thanks Xilo! I will give a try in .NET as well but I might go with C++ coz I dont want it to be dependent.

@Monica
Thanks for the links. I will go through them and will definetly contact you with tons of questions.
Right now some urgent work in hand so I have to make this one wait. There is no deadline in this, its for my own curiosity. SO i guess I will go really into to learn about it.

Thanks for all help! :)
I.am

monica_green_22
08-12-2003, 09:18 PM
I would suggest you develop it without using .NET since it can become much harder and more confusing going from .NET to standard C++.

To use a grossly oversimplified analagy, its like learning to drive a manual transmission car after learning to drive an automatic transmission car. In short, its an avoidable headache. :)

Monica