Jump to content

Best Database For 70 Million Records


whiterussian

Recommended Posts

Hello, I use large datasets, and have managed to use a combination of mysql and access to achieve what i want from the raw data.

However my latest project has 70 million records, and although MYSQL can handle it, Access is bombing... I would much rather use MS Access 2007 to organise tables etc, because I am familiar with the GUI etc. Access seems to be able to handle about 40 million large records.

Should I try SQL server, or another DB?

Anyone have any ideas?

thanks

Edited by whiterussian
Link to comment
Share on other sites

You can upscale an Access database to SQL Server backend via a wizard, of course this implies you have the data in Access already. You can connective natively from Access to SQL but requires more work. Ive worked in many a small business that have Access and upscale to SQL Server and think its the end of their performance problems, the problem is Access way of working doesnt map natively to T-SQL, and thus you'll often see locking issues at db level and the local file level, transactional issues, and with some of the filtering and predicate logic, access will request the whole 70million records back and do the filtering locally on the client. I hate to say but Access just isnt suited to databases that size (despire some people making it work by a thread).

What work do you have to do the data? Your best importing to SQL Server or mySQL, and using an analytical package to analyze the data. stackoverflow dot com will get more dev responses.

Edited by mattcodes
Link to comment
Share on other sites

The right hardware can easily handle 70 mill records on MySQL. There are many ways to optimize such a configuration, but most of the time slow response is a result of a badly designed database or 'clumsy' written queries. It also depends on the kind of data you store in the database.

Link to comment
Share on other sites

I guess OP is storing the data in mySQL and uses Access as an application frontend. Meaning Access queries the data from the mySQL database via ODBC.

I further assume that in this scenario Access should theoretically be able to handle the data since Access is only used to build the SQL queries, query the data from mySQL and then display it. Of course, depending on how exactly this is done Access might still be the bottleneck. If too much data handling logic resides in the 'client' (here: Access) the fastest database server will not help. If so, code and queries have to be optimized to move processing to the db server.

Example: If working with large datasets you usually limit the displayed data per page - this can either be done on the client - meaning fetch a large number of records from the db and then just display parts of it (bad!) or with support from the db, fetching only chunks of data. Often the db driver has to support this, too. I also don't know if maybe Access works better with MSSqlServer drivers when it comes to that.

welo

Link to comment
Share on other sites

Thanks guys!

I import the data into MYSQL - because Access just dies at about 50M records. (14kb a record)

the plan was then to export to access - to see if i could get round the program dying when i attempted a direct import into Access.

I would prefer to use access purely because the GUI is so much easier than the only one i am aware of for MYSQL (navicat)

Are there any other analytical packages better than navicat for mysql?

I havent optimized anything yet, because i cant get the data into a single access DB!

I suppose i could use navicat to query mysql, maybe if i convert the database to innodb i will get a relationship control panel?...

thanks chaps

bloody access!

Edited by whiterussian
Link to comment
Share on other sites

Thanks guys!

I import the data into MYSQL - because Access just dies at about 50M records. (14kb a record)

the plan was then to export to access - to see if i could get round the program dying when i attempted a direct import into Access.

I would prefer to use access purely because the GUI is so much easier than the only one i am aware of for MYSQL (navicat)

Are there any other analytical packages better than navicat for mysql?

I havent optimized anything yet, because i cant get the data into a single access DB!

I suppose i could use navicat to query mysql, maybe if i convert the database to innodb i will get a relationship control panel?...

thanks chaps

bloody access!

I would suggest using SQL Server 2005. On top of this, I recommend you build a small client app which can help you query your data. I dont know per se, what you 70mil records are about, but using a client could give you proper data retrieval, archival and query capabilities. If the client app is built using .net, you have the additional advantage of .net using "tabular data stream" to access sql server data, which reduces data access overheads. Access data could be easily migrated using SQL Server native DTC packages / tools.

If you need any more help on this, or help on the client app, PM me.

Link to comment
Share on other sites

Seems my assumptions were wrong and you are still stuck with storing the data.

Maybe you should consider the scenario that I described above: Keep the data in a proper DB such as MySQL or SQLServer (isn't there a limited free edition?) and use MS Access as frontend only. Coding your own little app is a valid option, too, but it might be more flexible and efficient (considering work time) to stick with MS Access since I understand that you want to build your own queries and have some knowledge of this product but no programming skills.

http://articles.techrepublic.com.com/5100-...11-5065669.html

http://www.aspfree.com/c/a/Microsoft-Acces...t-End-to-MySQL/

MS Access actually has its strength as RAD (rapid application development) tool, but its db engine is limited and not fit for complex stuff. Of course you might end up with performance issues again if your queries or code is inefficient, but this is not necessarily related to Access but might happen with other programming languages, too.

Of course, if gtm2k offers his help as a skilled .NET developer you might end up with a nifty little program that does exactly what you need and provide good solution. Don't know if there are even reporting tools that are easy to handle like Access with similar or even better features.

welo

Link to comment
Share on other sites

You might want to have a look at Cassandra (http://incubator.apache.org/cassandra/). It uses the BigTable data model. Developed by Facebook, used by large sites such as Facebook, Twitter and Digg.

Jambla

Had a quick look at this interesting project.

Not sure the software has a use-case scenario in mind that fits the OP's project. Cassandra was developed for multi-user applications with 'active' data and heavy load. I might be wrong but the OP's descriptions seems more like a data mining project to me where only one user is working on the data and no extensive write operations take place. Of course I might be wrong (again) :)

welo

Link to comment
Share on other sites

Spot on chaps!

Yes: data mining project indeed.... need the data in one program to sift and sort.. then eventually export after normalisation to mysql/xml after reducing those 70mill into maybe 20mill records.

Going to try the Access to Mysql connector whatsit... then if that fails, will try sql server, and also have a look at cassandra.

Thanks so much!

Of course i could write a script to query the tables etc, but its so much easier doing it by eye in access!

Link to comment
Share on other sites

:)

The ODBC connector looks great - thanks! for smaller DB's..

But still just too big a dataset for access, hanging the PC.

So now, before i try SQL server, i will give MYSQL another go with innodb, failing that i think i will just have to script 70mill queries and parse the data into smaller rows. Im sure SQL server can handle the data, but im wondering about the learning curve for it... arggh

Thanks for your help

Link to comment
Share on other sites

I'm late to the party here, but I will throw in my 2c anyway.

Do not even think about going to MSSQL! MySQL handles large datasets much, much better than MSSQL.

I would say your best options are either MySQL or PostgreSQL, both can handle a great deal of data and large query loads. Which one you want to use depends on the queries that will be run against the data, and the exact nature of the data itself. If you are doing uber-complex queries, with lots of sub-selects and such, PostgreSQL would be the better choice, as it has a better query planner (IMHO), and a more mature and robust stored-procedure system. MySQL on the other hand has a long track record of stability and easy replication.

Using a "real" database engine to store your data (such as MySQL or PostgreSQL), and accessing through ODBC or JDBC is really good advice.

Link to comment
Share on other sites

Any mainstream relational database server on modest hardware can handle 70 million records easily, that's small fish, thus your problem is not mySQL vs SQL Server or [insert-choice-of-rdbms]. The suggestion of moving to a document db or key/value store is crazy, the guy has choose Access for simplicity, lets not drink the kool-aid of the day... 70million is not twitter or google style, 70billion maybe.

If you can verify your queries (presuming they are not greedy) fail and its MS-SQL or mySQL fault then I'll be utterly shocked, I reguarly work with projects using all flavours of rdbms and have no issue with billions of records, correct queries and setup of indexes etc.. is something you need to consider across the board. Granted that mySQL is often faster for read operation due its weaker transactional support, by faster I mean micro-optimizations - your is bombing and I do not believe the issue lies with your RDBMS choice.

I further assume that in this scenario Access should theoretically be able to handle the data since Access is only used to build the SQL queries, query the data from mySQL and then display it.

We can assume the guy is using Access to simplify things, unfortunately Access enjoys doing silly stuff "SELECT * FROM Table" or sometimes only part of predicate and doing the filtering on the client, 70 million records goodbye network bandwidth and local memory, you'll also witness aggressive locking if using form e.g. SELECT FOR UPDATE etc... At least this is what I've seen in applications upscale by numpty sysadmins with Access 2003 as the client. To get an idea of whats causing the problem you need to profile the SQL queries generated by Access. If you are handrolling your own SQL and paging logic etc.. then you're probably find it easier to do some drag and drop asp.net webforms development.

Edited by mattcodes
Link to comment
Share on other sites

Blimey, thanks again, all this specialised DB lingo is amazing, especially "handrolling your own SQL"

LOL, hahaha!

The reason i would like to you access is simply because the query designer, and relationship tables are so easy to use. I then export the data to mysql once the tables are normalized, optimized, and generally nifty and correct.

At this stage of just getting the data into the tables, im just frustrated that access bombs, it means i will have to handroll my own queries, which means more work, which is the enemy of any decent programmer! :)

Sod it, i will try and improve my handrolling skills in mysql and see if i can get that to export the queried data via php to smaller files/tables ready for analytics/graphs etc... argghh.. bloody microsoft, bet they threw in a limit to force us to upgrade!

thanks again!

Link to comment
Share on other sites

Before using a web programming language (PHP) to do data mining I recommend you check out other options:

There are several Mysql Desktop Clients available that you can use to view the database structure, tables, etc and run SQL queries. Mysql used to offer MysqlAdministrator and MySQL Query Browser. The tools were not awesome but did the job. I think the still might ship with the Windows Mysql installation package, not sure. Of course there should be quite a view 3rd party tools, too.

You can also go for any ODBC or JDBC (java) based program, for instance this or this.

The market is huge and it might take some time to find a tool that fits your needs. And of course only some are freeware.

welo

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...